Hi all,<br><br>My broker is crashing in a reproducible way.  I&#39;m running RabbitMQ 2.0 on a 32-bit version of Erlang on a 64-bit Windows Server 2008 R2 installation.<br><br>The management and webhooks plugins are installed.  I also have some code periodically checking the broker internals via OTP.<br>
<br>The broker crashes if a queue builds up to the extent that messages have to be persisted to msg_store_transient, and then a consumer quickly starts consuming all the messages from that queue.  It&#39;s as if the broker is still near the limit of its available memory then tries to reload a large chunk of the cached messages from msg_store_transient.  This pushes it over some hard limit and down it goes.<br>
<br>I&#39;ve turned on memory alarms and for the purposes of debugging reduced the memory high watermark to 0.05 (the system has 2gb).  With these settings a cache of ~30,000 relatively small messages is sufficient to cause the problem.<br>
<br>The main log messages are as follows.  I can send the dump log if necessary too.<br><br>I&#39;ve seen the release notes for 2.2 and wondered if the &quot;fix queue memory leak when using the management plugin or other consumers of queue statistics&quot; might be a related issue?<br>
<br><a href="http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2010-November/010172.html">http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2010-November/010172.html</a><br><br>Any help would be much appreciated!<br>
<br>Thanks,<br>Scott<br><br><br><br>debug log:<br><br>Crash dump was written to: c:/rabbit/erl_crash.dump<br>eheap_alloc: Cannot allocate 583848200 bytes of memory (of type &quot;heap&quot;).<br><br><br><br>node log (rabbit@whatever.log):<br>
<br><br>=INFO REPORT==== 20-Dec-2010::12:52:13 ===<br>vm_memory_high_watermark set. Memory used:108708800 allowed:107350835<br><br>=INFO REPORT==== 20-Dec-2010::12:52:13 ===<br>    alarm_handler: {set,{vm_memory_high_watermark,[]}}<br>
<br>=INFO REPORT==== 20-Dec-2010::12:52:14 ===<br>vm_memory_high_watermark clear. Memory used:94399608 allowed:107350835<br><br>=INFO REPORT==== 20-Dec-2010::12:52:14 ===<br>    alarm_handler: {clear,vm_memory_high_watermark}<br>
<br>=ERROR REPORT==== 20-Dec-2010::12:52:20 ===<br>** Generic server msg_store_transient terminating<br>** Last message in was {&#39;$gen_cast&#39;,<br>                           {remove,<br>                               [&lt;&lt;207,102,51,195,33,4,26,173,150,95,132,22,<br>
                                  186,26,230,252&gt;&gt;]}}<br>** When Server state == {msstate,<br>                            &quot;c:/rabbit/db/RabbitMQ@VMRMQ1-mnesia/msg_store_transient&quot;,<br>                            rabbit_msg_store_ets_index,<br>
                            {state,151612,<br>                                &quot;c:/rabbit/db/RabbitMQ@VMRMQ1-mnesia/msg_store_transient&quot;},<br>                            6,#Ref&lt;0.0.0.7537&gt;,<br>                            {dict,0,16,16,8,80,48,<br>
                                {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],<br>                                 []},<br>                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],<br>                                  [],[]}}},<br>
                            [],undefined,85671623,103928228,[],false,<br>                            &lt;0.237.0&gt;,159806,147515,155709,163903,<br>                            {set,2,16,16,8,80,48,<br>                                {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],<br>
                                 []},<br>                                {{[&lt;&lt;76,144,119,142,103,20,191,166,92,52,177,<br>                                     137,37,253,53,198&gt;&gt;],<br>                                  [],[],[],[],[],[],[],[],<br>
                                  [&lt;&lt;97,19,25,60,105,158,160,68,116,178,239,<br>                                     127,59,191,208,161&gt;&gt;],<br>                                  [],[],[],[],[],[]}}},<br>                            false,16777216}<br>
** Reason for termination ==<br>** {{badmatch,{error,eacces}},<br>    [{rabbit_msg_store,delete_file_if_empty,2},<br>     {rabbit_msg_store,remove_message,2},<br>     {lists,foldl,3},<br>     {rabbit_msg_store,handle_cast,2},<br>
     {gen_server2,handle_msg,7},<br>     {proc_lib,wake_up,3}]}<br><br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>stopped TCP Listener on <a href="http://0.0.0.0:5672">0.0.0.0:5672</a><br><br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>
closing TCP connection &lt;0.6768.0&gt; from <a href="http://172.20.5.35:55230">172.20.5.35:55230</a><br><br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>closing TCP connection &lt;0.521.0&gt; from <a href="http://172.20.5.35:55101">172.20.5.35:55101</a><br>
<br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>closing TCP connection &lt;0.309.0&gt; from <a href="http://172.22.1.5:56205">172.22.1.5:56205</a><br><br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>vm_memory_high_watermark set. Memory used:177922856 allowed:107350835<br>
<br>=INFO REPORT==== 20-Dec-2010::12:52:26 ===<br>    alarm_handler: {set,{vm_memory_high_watermark,[]}}<br><br><br><br>sasl log:<br><br><br>=CRASH REPORT==== 20-Dec-2010::12:52:23 ===<br>  crasher:<br>    initial call: gen:init_it/7<br>
    pid: &lt;0.235.0&gt;<br>    registered_name: msg_store_transient<br>    exception exit: {{badmatch,{error,eacces}},<br>                     [{rabbit_msg_store,delete_file_if_empty,2},<br>                      {rabbit_msg_store,remove_message,2},<br>
                      {lists,foldl,3},<br>                      {rabbit_msg_store,handle_cast,2},<br>                      {gen_server2,handle_msg,7},<br>                      {proc_lib,wake_up,3}]}<br>      in function  gen_server2:terminate/6<br>
    ancestors: [rabbit_sup,&lt;0.186.0&gt;]<br>    messages: [{&#39;$gen_cast&#39;,<br>                      {remove,<br>                          [&lt;&lt;116,2,44,89,198,13,221,58,18,13,147,52,79,55,<br>                             101,76&gt;&gt;]}},<br>
                  {&#39;$gen_cast&#39;,<br>                      {remove,<br>                          [&lt;&lt;93,225,245,186,89,105,207,141,190,8,167,133,<br>                             251,196,229,169&gt;&gt;]}},<br>                  {&#39;$gen_cast&#39;,<br>
                      {remove,<br>                          [&lt;&lt;36,45,179,241,82,8,235,59,130,71,81,223,19,89,<br>                             23,159&gt;&gt;]}},<br>[...]<br>                  {&#39;$gen_cast&#39;,<br>
                      {remove,<br>                          [&lt;&lt;93,226,119,14,115,208,16,0,35,241,8,153,128,173,<br>                             33,46&gt;&gt;,<br>                           &lt;&lt;102,18,229,215,103,192,214,80,241,21,240,138,35,<br>
                             33,210,8&gt;&gt;,<br>                           &lt;&lt;144,39,170,218,61,5,23,24,234,156,43,154,63,104,<br>                             46,233&gt;&gt;,<br>                           &lt;&lt;201,4,128,74,48,234,192,77,11,64,58,124,146,64,<br>
                             196,108&gt;&gt;,<br>                           &lt;&lt;103,56,91,19,0,9,57,136,201,83,73,171,214,48,76,<br>                             227&gt;&gt;,<br>[...]<br>                           &lt;&lt;28,5,2,76,231,115,160,56,192,101,127,47,74,105,<br>
                             103,102&gt;&gt;,<br>                           &lt;&lt;49,236,73,20,167,156,137,55,109,184,47,56,116,237,<br>                             53,9&gt;&gt;,<br>                           &lt;&lt;&quot;Í¿åú|¢Ü¶1á%\b4 ,%&quot;&gt;&gt;]}}]<br>
    links: [&lt;0.187.0&gt;]<br>    dictionary: [{fhc_age_tree,{0,nil}}]<br>    trap_exit: true<br>    status: running<br>    heap_size: 121393<br>    stack_size: 24<br>    reductions: 19087946<br>  neighbours:<br><br>=SUPERVISOR REPORT==== 20-Dec-2010::12:52:26 ===<br>
     Supervisor: {local,rabbit_sup}<br>     Context:    child_terminated<br>     Reason:     {{badmatch,{error,eacces}},<br>                  [{rabbit_msg_store,delete_file_if_empty,2},<br>                   {rabbit_msg_store,remove_message,2},<br>
                   {lists,foldl,3},<br>                   {rabbit_msg_store,handle_cast,2},<br>                   {gen_server2,handle_msg,7},<br>                   {proc_lib,wake_up,3}]}<br>     Offender:   [{pid,&lt;0.235.0&gt;},<br>
                  {name,msg_store_transient},<br>                  {mfa,<br>                      {rabbit_msg_store,start_link,<br>                          [msg_store_transient,<br>                           &quot;c:/rabbit/db/RabbitMQ@VMRMQ1-mnesia&quot;,undefined,<br>
                           {#Fun&lt;rabbit_variable_queue.0.52678393&gt;,ok}]}},<br>                  {restart_type,transient},<br>                  {shutdown,4294967295},<br>                  {child_type,worker}]<br><br>
<br>=SUPERVISOR REPORT==== 20-Dec-2010::12:52:26 ===<br>     Supervisor: {local,rabbit_sup}<br>     Context:    shutdown<br>     Reason:     reached_max_restart_intensity<br>     Offender:   [{pid,&lt;0.235.0&gt;},<br>                  {name,msg_store_transient},<br>
                  {mfa,<br>                      {rabbit_msg_store,start_link,<br>                          [msg_store_transient,<br>                           &quot;c:/rabbit/db/RabbitMQ@VMRMQ1-mnesia&quot;,undefined,<br>
                           {#Fun&lt;rabbit_variable_queue.0.52678393&gt;,ok}]}},<br>                  {restart_type,transient},<br>                  {shutdown,4294967295},<br>                  {child_type,worker}]<br><br>