[rabbitmq-discuss] RabbitMQ broker's death by one cut: robustness problem
Lev Walkin
vlm at lionet.info
Wed Apr 29 22:44:38 BST 2009
Matthias Radestock wrote:
> Lev,
>
> Lev Walkin wrote:
>> During evaluation period, our RabbitMQ node has crashed at some point,
>> killed by OOM killer. Unfortunately, killing corrupted the log files,
>> so RabbitMQ restart did not fix the problem. RabbitMQ just would not
>> start:
>> [...]
>> It turns out, the broker beam was killed during a persister operation,
>> so persister logs were broken:
>>
>> [broker at zamq ...]> ls -al | grep persister
>> -rw-r--r-- 1 broker wheel 8 Apr 24 18:48 rabbit_persister.LOG
>> -rw-r--r-- 1 broker wheel 661677171 Apr 24 18:19
>> rabbit_persister.LOG.previous
>> [...]
>> I believe there is a way to make such error recovery more robust. Is
>> there a solution you'd like to introduce for this kind of problem?
>
> You should be able to just rename the .LOG.previous to .LOG.
Yes, I did just that.
> I have filed a bug to get the broker to do s.t. along these lines
> automatically.
Right, it is better for HA software to be able to heal itself to some
degree, rather than to require a host of a special personnel to support
it in the field 24/7.
--
Lev Walkin
vlm at lionet.info
More information about the rabbitmq-discuss
mailing list