[rabbitmq-discuss] RabbitMQ 2.0 broker crashes

Scott Dupoy scottdupoy at gmail.com
Mon Dec 20 13:28:16 GMT 2010


Hi all,

My broker is crashing in a reproducible way.  I'm running RabbitMQ 2.0 on a
32-bit version of Erlang on a 64-bit Windows Server 2008 R2 installation.

The management and webhooks plugins are installed.  I also have some code
periodically checking the broker internals via OTP.

The broker crashes if a queue builds up to the extent that messages have to
be persisted to msg_store_transient, and then a consumer quickly starts
consuming all the messages from that queue.  It's as if the broker is still
near the limit of its available memory then tries to reload a large chunk of
the cached messages from msg_store_transient.  This pushes it over some hard
limit and down it goes.

I've turned on memory alarms and for the purposes of debugging reduced the
memory high watermark to 0.05 (the system has 2gb).  With these settings a
cache of ~30,000 relatively small messages is sufficient to cause the
problem.

The main log messages are as follows.  I can send the dump log if necessary
too.

I've seen the release notes for 2.2 and wondered if the "fix queue memory
leak when using the management plugin or other consumers of queue
statistics" might be a related issue?

http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2010-November/010172.html

Any help would be much appreciated!

Thanks,
Scott



debug log:

Crash dump was written to: c:/rabbit/erl_crash.dump
eheap_alloc: Cannot allocate 583848200 bytes of memory (of type "heap").



node log (rabbit at whatever.log):


=INFO REPORT==== 20-Dec-2010::12:52:13 ===
vm_memory_high_watermark set. Memory used:108708800 allowed:107350835

=INFO REPORT==== 20-Dec-2010::12:52:13 ===
    alarm_handler: {set,{vm_memory_high_watermark,[]}}

=INFO REPORT==== 20-Dec-2010::12:52:14 ===
vm_memory_high_watermark clear. Memory used:94399608 allowed:107350835

=INFO REPORT==== 20-Dec-2010::12:52:14 ===
    alarm_handler: {clear,vm_memory_high_watermark}

=ERROR REPORT==== 20-Dec-2010::12:52:20 ===
** Generic server msg_store_transient terminating
** Last message in was {'$gen_cast',
                           {remove,
                               [<<207,102,51,195,33,4,26,173,150,95,132,22,
                                  186,26,230,252>>]}}
** When Server state == {msstate,
                            "c:/rabbit/db/RabbitMQ at VMRMQ1-mnesia
/msg_store_transient",
                            rabbit_msg_store_ets_index,
                            {state,151612,
                                "c:/rabbit/db/RabbitMQ at VMRMQ1-mnesia
/msg_store_transient"},
                            6,#Ref<0.0.0.7537>,
                            {dict,0,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            [],undefined,85671623,103928228,[],false,
                            <0.237.0>,159806,147515,155709,163903,
                            {set,2,16,16,8,80,48,

{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},

{{[<<76,144,119,142,103,20,191,166,92,52,177,
                                     137,37,253,53,198>>],
                                  [],[],[],[],[],[],[],[],
                                  [<<97,19,25,60,105,158,160,68,116,178,239,
                                     127,59,191,208,161>>],
                                  [],[],[],[],[],[]}}},
                            false,16777216}
** Reason for termination ==
** {{badmatch,{error,eacces}},
    [{rabbit_msg_store,delete_file_if_empty,2},
     {rabbit_msg_store,remove_message,2},
     {lists,foldl,3},
     {rabbit_msg_store,handle_cast,2},
     {gen_server2,handle_msg,7},
     {proc_lib,wake_up,3}]}

=INFO REPORT==== 20-Dec-2010::12:52:26 ===
stopped TCP Listener on 0.0.0.0:5672

=INFO REPORT==== 20-Dec-2010::12:52:26 ===
closing TCP connection <0.6768.0> from 172.20.5.35:55230

=INFO REPORT==== 20-Dec-2010::12:52:26 ===
closing TCP connection <0.521.0> from 172.20.5.35:55101

=INFO REPORT==== 20-Dec-2010::12:52:26 ===
closing TCP connection <0.309.0> from 172.22.1.5:56205

=INFO REPORT==== 20-Dec-2010::12:52:26 ===
vm_memory_high_watermark set. Memory used:177922856 allowed:107350835

=INFO REPORT==== 20-Dec-2010::12:52:26 ===
    alarm_handler: {set,{vm_memory_high_watermark,[]}}



sasl log:


=CRASH REPORT==== 20-Dec-2010::12:52:23 ===
  crasher:
    initial call: gen:init_it/7
    pid: <0.235.0>
    registered_name: msg_store_transient
    exception exit: {{badmatch,{error,eacces}},
                     [{rabbit_msg_store,delete_file_if_empty,2},
                      {rabbit_msg_store,remove_message,2},
                      {lists,foldl,3},
                      {rabbit_msg_store,handle_cast,2},
                      {gen_server2,handle_msg,7},
                      {proc_lib,wake_up,3}]}
      in function  gen_server2:terminate/6
    ancestors: [rabbit_sup,<0.186.0>]
    messages: [{'$gen_cast',
                      {remove,
                          [<<116,2,44,89,198,13,221,58,18,13,147,52,79,55,
                             101,76>>]}},
                  {'$gen_cast',
                      {remove,
                          [<<93,225,245,186,89,105,207,141,190,8,167,133,
                             251,196,229,169>>]}},
                  {'$gen_cast',
                      {remove,
                          [<<36,45,179,241,82,8,235,59,130,71,81,223,19,89,
                             23,159>>]}},
[...]
                  {'$gen_cast',
                      {remove,

[<<93,226,119,14,115,208,16,0,35,241,8,153,128,173,
                             33,46>>,

<<102,18,229,215,103,192,214,80,241,21,240,138,35,
                             33,210,8>>,

<<144,39,170,218,61,5,23,24,234,156,43,154,63,104,
                             46,233>>,
                           <<201,4,128,74,48,234,192,77,11,64,58,124,146,64,
                             196,108>>,

<<103,56,91,19,0,9,57,136,201,83,73,171,214,48,76,
                             227>>,
[...]
                           <<28,5,2,76,231,115,160,56,192,101,127,47,74,105,
                             103,102>>,

<<49,236,73,20,167,156,137,55,109,184,47,56,116,237,
                             53,9>>,
                           <<"Í¿åú|¢Ü¶1á%\b4 ,%">>]}}]
    links: [<0.187.0>]
    dictionary: [{fhc_age_tree,{0,nil}}]
    trap_exit: true
    status: running
    heap_size: 121393
    stack_size: 24
    reductions: 19087946
  neighbours:

=SUPERVISOR REPORT==== 20-Dec-2010::12:52:26 ===
     Supervisor: {local,rabbit_sup}
     Context:    child_terminated
     Reason:     {{badmatch,{error,eacces}},
                  [{rabbit_msg_store,delete_file_if_empty,2},
                   {rabbit_msg_store,remove_message,2},
                   {lists,foldl,3},
                   {rabbit_msg_store,handle_cast,2},
                   {gen_server2,handle_msg,7},
                   {proc_lib,wake_up,3}]}
     Offender:   [{pid,<0.235.0>},
                  {name,msg_store_transient},
                  {mfa,
                      {rabbit_msg_store,start_link,
                          [msg_store_transient,
                           "c:/rabbit/db/RabbitMQ at VMRMQ1-mnesia",undefined,
                           {#Fun<rabbit_variable_queue.0.52678393>,ok}]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]


=SUPERVISOR REPORT==== 20-Dec-2010::12:52:26 ===
     Supervisor: {local,rabbit_sup}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.235.0>},
                  {name,msg_store_transient},
                  {mfa,
                      {rabbit_msg_store,start_link,
                          [msg_store_transient,
                           "c:/rabbit/db/RabbitMQ at VMRMQ1-mnesia",undefined,
                           {#Fun<rabbit_variable_queue.0.52678393>,ok}]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20101220/c118d218/attachment.htm>


More information about the rabbitmq-discuss mailing list