[rabbitmq-discuss] Help understanding a crash report

Jerry Kuch jerryk at rbcon.com
Fri Feb 22 20:48:04 GMT 2013


Hi, Simone...

I am relatively new to Rabbitmq and would appreciate help in
> troubleshooting a recurring issue on a cluster, apologies for the long
> email.
>

No problem!  Details are usually helpful...


> I run a 3 instances cluster in ec2, 2 disk nodes (A and B) 1 ram node (C),
> exchanges and queues are static and limited in number (less than 50), the
> volume of messages can reach a few thousands per second and the queues can
> occasionally grow up to a few hundred thousands until the processes manage
> to catch up, but this is well within our memory/disk high watermark.
> Rabbitmq is v. 2.8.4 on ubuntu 12.
>

2.8.4 is now getting a bit old, and was from the middle of a sequence of
bug fix releases, during which many things improved...  you might want to
consider upgrading.  Before you do, see remarks below.


> I would like to better understand the crash report and perhaps have some
> ideas on what went wrong and how to more effectively troubleshoot issues
> (what more info should I collect before restarting the nodes, erlang
> processes list, mnesia tables, ets tables etc).
>
> =CRASH REPORT==== 14-Feb-2013::10:36:54 ===
>   crasher:
>     initial call: rabbit_reader:init/4
>     pid: <*0.29283.387*>
>     registered_name: []
>     exception error: bad argument
>       in function  port_close/1
>          called as port_close(#Port<0.746540>)
>       in call from rabbit_net:maybe_fast_close/1
>       in call from rabbit_reader:start_connection/7
>     ancestors: [<0.29280.387>,rabbit_tcp_client_sup,rabbit_sup,<0.161.0>]
>     messages: []
>     links: [<0.29280.387>]
>     dictionary: [{{channel,10},
>                    {<0.29364.387>,{method,rabbit_framing_amqp_0_9_1}}},
>                   {{ch_pid,<0.29338.387>},{7,#Ref<0.0.2158.60093>}},
>                   {{ch_pid,<0.29333.387>},{6,#Ref<0.0.2158.60085>}},
>                   {{ch_pid,<0.29325.387>},{5,#Ref<0.0.2158.60053>}},
>                   {{channel,3},
>                    {<0.29313.387>,{method,rabbit_framing_amqp_0_9_1}}},
>                   {{ch_pid,<0.29305.387>},{2,#Ref<0.0.2158.60002>}},
>                   {{channel,4},
>                    {<0.29321.387>,{method,rabbit_framing_amqp_0_9_1}}},
>                   {{channel,11},
>                    {<0.29370.387>,{method,rabbit_framing_amqp_0_9_1}}},
>                   {{ch_pid,<0.29313.387>},{3,#Ref<0.0.2158.60017>}},
>                   {{ch_pid,<0.29299.387>},{1,#Ref<0.0.2158.59976>}},
>                   {{ch_pid,<0.29346.387>},{8,#Ref<0.0.2158.60112>}},
>                   {{ch_pid,<0.29370.387>},{11,#Ref<0.0.2158.60189>}},
>                   {{channel,7},
>                    {<0.29338.387>,{method,rabbit_framing_amqp_0_9_1}}},
>                   {{channel,9},
>                    {<0.29356.387>,{method,rabbit_framing_amqp_0_9_1}}},
>                   {{ch_pid,<0.29321.387>},{4,#Ref<0.0.2158.60034>}},
>                   {{ch_pid,<0.29364.387>},{10,#Ref<0.0.2158.60166>}},
>                   {{ch_pid,<0.29356.387>},{9,#Ref<0.0.2158.60140>}},
>                   {{channel,8},
>                    {<0.29346.387>,{method,rabbit_framing_amqp_0_9_1}}},
>                   {{channel,5},
>                    {<0.29325.387>,{method,rabbit_framing_amqp_0_9_1}}},
>                   {{channel,1},
>                    {<0.29299.387>,
>                     {content_body,
>                         {'basic.publish',0,<<"some_exchange">>,<<>>,false,
>                             false},
>                         1048189,
>                         {content,60,none,
>                             <<BYTES IN HERE>>,   --> this showed which
> process was sending the message
>                             rabbit_framing_amqp_0_9_1,
>                             [<<MORE BYTES IN HERE>>]  --> This I haven't
> been able to decode, it is fairly big, is it truncated?
>

Unfortunately, nothing springs to mind immediately to pursue from this...


> And in the logs we can find the pid *0.29283.387 *right before the crash:
>
> =INFO REPORT==== 14-Feb-2013::10:31:46 ===
> accepting AMQP connection <*0.29283.387*> (10.xx.xx.xx:58622 -> 10.
> xx.xx.xx:5672)
>
> =INFO REPORT==== 14-Feb-2013::10:31:46 ===
> accepting AMQP connection <0.29287.387> (10.xx.xx.xx:58623 -> 10.xx.xx.xx
> :5672)
>
> =WARNING REPORT==== 14-Feb-2013::10:32:27 ===
> closing AMQP connection <0.27107.387> (10.xx.xx.xx:50882 -> 10.xx.xx.xx
> :5672):
> connection_closed_abruptly
>

This could be pretty much anything, from client misbehavior to connection
disruption...

Looking at the rabbitmqctl report I have not been able to map the memory
> consumption to something specific yet.
>

I'd proceed with an upgrade to a more recent Rabbit...  check here first:

http://www.rabbitmq.com/blog/2012/11/19/breaking-things-with-rabbitmq-3-0/

And if none of the changes in 3.0.x are going to provide you short term
inconvenience, then try going straight to the latest 3.0.2; if there are
3.0 changes that you think will bother you or require changes to your apps
or infrastructure, then jump to 2.8.6 for now... it was the seventh and
last of the 2.8.x series and contains a pile of incremental fixes that may
help with this (otherwise tricky to diagnose from what we have available
right now) problem.

Best regards,
Jerry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130222/7e6e08aa/attachment.htm>


More information about the rabbitmq-discuss mailing list