[rabbitmq-discuss] RabbitMQ broker's death by one cut: robustness problem

Lev Walkin vlm at lionet.info
Wed Apr 29 22:44:38 BST 2009


Matthias Radestock wrote:
> Lev,
> 
> Lev Walkin wrote:
>> During evaluation period, our RabbitMQ node has crashed at some point, 
>> killed by OOM killer. Unfortunately, killing corrupted the log files, 
>> so RabbitMQ restart did not fix the problem. RabbitMQ just would not 
>> start:
>> [...]
>> It turns out, the broker beam was killed during a persister operation, 
>> so persister logs were broken:
>>
>> [broker at zamq ...]> ls -al | grep persister
>> -rw-r--r--  1 broker  wheel          8 Apr 24 18:48 rabbit_persister.LOG
>> -rw-r--r--  1 broker  wheel  661677171 Apr 24 18:19 
>> rabbit_persister.LOG.previous
>> [...]
>> I believe there is a way to make such error recovery more robust. Is 
>> there a solution you'd like to introduce for this kind of problem?
> 
> You should be able to just rename the .LOG.previous to .LOG.

Yes, I did just that.

> I have filed a bug to get the broker to do s.t. along these lines 
> automatically.

Right, it is better for HA software to be able to heal itself to some 
degree, rather than to require a host of a special personnel to support 
it in the field 24/7.

-- 
Lev Walkin
vlm at lionet.info




More information about the rabbitmq-discuss mailing list