[rabbitmq-discuss] RabbitMQ broker's death by one cut: robustness problem
Lev Walkin
vlm at lionet.info
Mon Apr 27 11:46:19 BST 2009
During evaluation period, our RabbitMQ node has crashed at some point,
killed by OOM killer. Unfortunately, killing corrupted the log files, so
RabbitMQ restart did not fix the problem. RabbitMQ just would not start:
================
[broker at zamq ~]> /usr/local/bin/rabbitmq-server -kernel check_ip true
-connect_all false
...
Logging to "/.../rabbitmq/log/rabbit.log"
SASL logging to "/.../rabbitmq/log/rabbit-sasl.log"
starting database ...done
starting core processes ...done
starting recovery ...done
starting persister ...{"init terminating in
do_boot",{{nocatch,{error,{cannot_start_application,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{badmatch,{error,{{{badmatch,eof},[{rabbit_persister,internal_load_snapshot,2},{rabbit_persister,init,1},{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]},{child,undefined,rabbit_persister,{rabbit_persister,start_link,[]},transient,100,worker,[rabbit_persister]}}}},[{rabbit,start_child,1},{rabbit,'-start/2-fun-4-',0},{rabbit,'-start/2-fun-0-',1},{lists,foreach,2},{rabbit,start,2},{application_master,start_it_old,4}]}}}}}}},[{init,start_it,1},{init,start_em,1}]}}
init terminating in do_boot ()
[broker at zamq ~]>
=================
It turns out, the broker beam was killed during a persister operation,
so persister logs were broken:
[broker at zamq ...]> ls -al | grep persister
-rw-r--r-- 1 broker wheel 8 Apr 24 18:48 rabbit_persister.LOG
-rw-r--r-- 1 broker wheel 661677171 Apr 24 18:19
rabbit_persister.LOG.previous
[broker at zamq ~]> hd rabbit_persister.LOG
00000000 01 02 03 04 63 58 4d 0b |....cXM.|
00000008
[broker at zamq ...]>
I believe there is a way to make such error recovery more robust. Is
there a solution you'd like to introduce for this kind of problem?
--
Lev Walkin
vlm at lionet.info
More information about the rabbitmq-discuss
mailing list