Hi all,<br><br>My broker is crashing in a reproducible way. I'm running RabbitMQ 2.0 on a 32-bit version of Erlang on a 64-bit Windows Server 2008 R2 installation.<br><br>The management and webhooks plugins are installed. I also have some code periodically checking the broker internals via OTP.<br>
<br>The broker crashes if a queue builds up to the extent that messages have to be persisted to msg_store_transient, and then a consumer quickly starts consuming all the messages from that queue. It's as if the broker is still near the limit of its available memory then tries to reload a large chunk of the cached messages from msg_store_transient. This pushes it over some hard limit and down it goes.<br>
<br>I've turned on memory alarms and for the purposes of debugging reduced the memory high watermark to 0.05 (the system has 2gb). With these settings a cache of ~30,000 relatively small messages is sufficient to cause the problem.<br>
<br>The main log messages are as follows. I can send the dump log if necessary too.<br><br>I've seen the release notes for 2.2 and wondered if the "fix queue memory leak when using the management plugin or other consumers of queue statistics" might be a related issue?<br>
<br><a href="http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2010-November/010172.html">http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2010-November/010172.html</a><br><br>Any help would be much appreciated!<br>
<br>Thanks,<br>Scott<br><br><br><br>debug log:<br><br>Crash dump was written to: c:/rabbit/erl_crash.dump<br>eheap_alloc: Cannot allocate 583848200 bytes of memory (of type "heap").<br><br><br><br>node log (rabbit@whatever.log):<br>
<br><br>=INFO REPORT==== 20-Dec-2010::12:52:13 ===<br>vm_memory_high_watermark set. Memory used:108708800 allowed:107350835<br><br>=INFO REPORT==== 20-Dec-2010::12:52:13 ===<br> alarm_handler: {set,{vm_memory_high_watermark,[]}}<br>
<br>=INFO REPORT==== 20-Dec-2010::12:52:14 ===<br>vm_memory_high_watermark clear. Memory used:94399608 allowed:107350835<br><br>=INFO REPORT==== 20-Dec-2010::12:52:14 ===<br> alarm_handler: {clear,vm_memory_high_watermark}<br>
<br>=ERROR REPORT==== 20-Dec-2010::12:52:20 ===<br>** Generic server msg_store_transient terminating<br>** Last message in was {'$gen_cast',<br> {remove,<br> [<<207,102,51,195,33,4,26,173,150,95,132,22,<br>
186,26,230,252>>]}}<br>** When Server state == {msstate,<br> "c:/rabbit/db/RabbitMQ@VMRMQ1-mnesia/msg_store_transient",<br> rabbit_msg_store_ets_index,<br>
{state,151612,<br> "c:/rabbit/db/RabbitMQ@VMRMQ1-mnesia/msg_store_transient"},<br> 6,#Ref<0.0.0.7537>,<br> {dict,0,16,16,8,80,48,<br>
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],<br> []},<br> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],<br> [],[]}}},<br>
[],undefined,85671623,103928228,[],false,<br> <0.237.0>,159806,147515,155709,163903,<br> {set,2,16,16,8,80,48,<br> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],<br>
[]},<br> {{[<<76,144,119,142,103,20,191,166,92,52,177,<br> 137,37,253,53,198>>],<br> [],[],[],[],[],[],[],[],<br>
[<<97,19,25,60,105,158,160,68,116,178,239,<br> 127,59,191,208,161>>],<br> [],[],[],[],[],[]}}},<br> false,16777216}<br>
** Reason for termination ==<br>** {{badmatch,{error,eacces}},<br> [{rabbit_msg_store,delete_file_if_empty,2},<br> {rabbit_msg_store,remove_message,2},<br> {lists,foldl,3},<br> {rabbit_msg_store,handle_cast,2},<br>
{gen_server2,handle_msg,7},<br> {proc_lib,wake_up,3}]}<br><br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>stopped TCP Listener on <a href="http://0.0.0.0:5672">0.0.0.0:5672</a><br><br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>
closing TCP connection <0.6768.0> from <a href="http://172.20.5.35:55230">172.20.5.35:55230</a><br><br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>closing TCP connection <0.521.0> from <a href="http://172.20.5.35:55101">172.20.5.35:55101</a><br>
<br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>closing TCP connection <0.309.0> from <a href="http://172.22.1.5:56205">172.22.1.5:56205</a><br><br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>vm_memory_high_watermark set. Memory used:177922856 allowed:107350835<br>
<br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br> alarm_handler: {set,{vm_memory_high_watermark,[]}}<br><br><br><br>sasl log:<br><br><br>=CRASH REPORT==== 20-Dec-2010::12:52:23 ===<br> crasher:<br> initial call: gen:init_it/7<br>
pid: <0.235.0><br> registered_name: msg_store_transient<br> exception exit: {{badmatch,{error,eacces}},<br> [{rabbit_msg_store,delete_file_if_empty,2},<br> {rabbit_msg_store,remove_message,2},<br>
{lists,foldl,3},<br> {rabbit_msg_store,handle_cast,2},<br> {gen_server2,handle_msg,7},<br> {proc_lib,wake_up,3}]}<br> in function gen_server2:terminate/6<br>
ancestors: [rabbit_sup,<0.186.0>]<br> messages: [{'$gen_cast',<br> {remove,<br> [<<116,2,44,89,198,13,221,58,18,13,147,52,79,55,<br> 101,76>>]}},<br>
{'$gen_cast',<br> {remove,<br> [<<93,225,245,186,89,105,207,141,190,8,167,133,<br> 251,196,229,169>>]}},<br> {'$gen_cast',<br>
{remove,<br> [<<36,45,179,241,82,8,235,59,130,71,81,223,19,89,<br> 23,159>>]}},<br>[...]<br> {'$gen_cast',<br>
{remove,<br> [<<93,226,119,14,115,208,16,0,35,241,8,153,128,173,<br> 33,46>>,<br> <<102,18,229,215,103,192,214,80,241,21,240,138,35,<br>
33,210,8>>,<br> <<144,39,170,218,61,5,23,24,234,156,43,154,63,104,<br> 46,233>>,<br> <<201,4,128,74,48,234,192,77,11,64,58,124,146,64,<br>
196,108>>,<br> <<103,56,91,19,0,9,57,136,201,83,73,171,214,48,76,<br> 227>>,<br>[...]<br> <<28,5,2,76,231,115,160,56,192,101,127,47,74,105,<br>
103,102>>,<br> <<49,236,73,20,167,156,137,55,109,184,47,56,116,237,<br> 53,9>>,<br> <<"Í¿åú|¢Ü¶1á%\b4 ,%">>]}}]<br>
links: [<0.187.0>]<br> dictionary: [{fhc_age_tree,{0,nil}}]<br> trap_exit: true<br> status: running<br> heap_size: 121393<br> stack_size: 24<br> reductions: 19087946<br> neighbours:<br><br>=SUPERVISOR REPORT==== 20-Dec-2010::12:52:26 ===<br>
Supervisor: {local,rabbit_sup}<br> Context: child_terminated<br> Reason: {{badmatch,{error,eacces}},<br> [{rabbit_msg_store,delete_file_if_empty,2},<br> {rabbit_msg_store,remove_message,2},<br>
{lists,foldl,3},<br> {rabbit_msg_store,handle_cast,2},<br> {gen_server2,handle_msg,7},<br> {proc_lib,wake_up,3}]}<br> Offender: [{pid,<0.235.0>},<br>
{name,msg_store_transient},<br> {mfa,<br> {rabbit_msg_store,start_link,<br> [msg_store_transient,<br> "c:/rabbit/db/RabbitMQ@VMRMQ1-mnesia",undefined,<br>
{#Fun<rabbit_variable_queue.0.52678393>,ok}]}},<br> {restart_type,transient},<br> {shutdown,4294967295},<br> {child_type,worker}]<br><br>
<br>=SUPERVISOR REPORT==== 20-Dec-2010::12:52:26 ===<br> Supervisor: {local,rabbit_sup}<br> Context: shutdown<br> Reason: reached_max_restart_intensity<br> Offender: [{pid,<0.235.0>},<br> {name,msg_store_transient},<br>
{mfa,<br> {rabbit_msg_store,start_link,<br> [msg_store_transient,<br> "c:/rabbit/db/RabbitMQ@VMRMQ1-mnesia",undefined,<br>
{#Fun<rabbit_variable_queue.0.52678393>,ok}]}},<br> {restart_type,transient},<br> {shutdown,4294967295},<br> {child_type,worker}]<br><br>