[rabbitmq-discuss] empty rabbit_serial file causes rabbitmq cluster to hang

Matthias Radestock matthias at rabbitmq.com
Wed May 29 08:49:33 BST 2013


On 28/05/13 23:42, Denisenko, Mikhail (NIH/NLM/NCBI) [C] wrote:
> We had a problem today that was caused by wiped rabbit_serial file on
> one of the nodes. Our rabbitmq cluster consisting of two nodes become
> inaccessible.
>
> There is a ticket: https://github.com/rabbitmq/rabbitmq-server/issues/17
>
> I think that rabbitmq should be able to recover from this situation by
> maybe removing invalid rabbit_serial file, what do you think?

As Emile mentioned in the ticket, rabbit cannot recover from arbitrary 
filesystem corruption.

The specific issue of rabbit_serial being empty actually has come up 
once before - see 
http://rabbitmq.1065348.n5.nabble.com/Cannot-start-guid-generator-rabbitmq-td4415.html. 
So I have filed a feature request to investigate whether we should 
handle this particular case of corruption more gracefully.

I recommend you look into what may have caused the corruption. While 
there is an easy workaround in this particular instance, the same isn't 
true for other files, and corruptions there could easily lead to data 
loss, possibly undetected.


Regards,

Matthias.


More information about the rabbitmq-discuss mailing list