[rabbitmq-discuss] Rabbit MQ Crash

Michael Sander mes65 at cornell.edu
Wed Apr 9 09:15:29 BST 2014


Anything I can do to move the ball forward here? This keeps on happening.

Michael Sander
mes65 at cornell.edu
607-227-9859


On Tue, Apr 8, 2014 at 3:49 AM, Michael Sander <mes65 at cornell.edu> wrote:

> Unfortunately, I had to restart rabbitmq because it's on a production
> machine.  Attached is screenshot after the restart.  Notice that this time,
> free disk space limit is available.
>
> Please let me know what additional information you would like.  If it goes
> down again, I will collect it.
>
> Michael Sander
>
>
>
> On Tue, Apr 8, 2014 at 3:42 AM, Michael Sander <mes65 at cornell.edu> wrote:
>
>> Hi Matthias,
>>
>> I just checked rabbit the second after sending that, and it appears to
>> have crashed.  Here is some output, that you may find useful.  Notice that
>> erlang appears to be alive even though rabbitmq is not.
>>
>> ~$ ps aux|grep rabbit
>> rabbitmq  1761  0.0  0.0  10836   160 ?        S    Apr07   0:01
>> /usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
>> 1001     26076  0.0  0.0   6308   600 pts/1    S+   07:34   0:00 grep
>> rabbit
>> ~$ ps aux|grep erlan
>> rabbitmq  1761  0.0  0.0  10836   160 ?        S    Apr07   0:01
>> /usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
>> 1001     26078  0.0  0.0   6308   600 pts/1    S+   07:34   0:00 grep
>> erlan
>> ~$ df -h
>> Filesystem                                              Size  Used Avail
>> Use% Mounted on
>> rootfs                                                  9.9G  6.1G  3.3G
>>  65% /
>> udev                                                     10M     0   10M
>>   0% /dev
>> tmpfs                                                   181M  128K  181M
>>   1% /run
>> /dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1  9.9G  6.1G  3.3G
>>  65% /
>> tmpfs                                                   5.0M     0  5.0M
>>   0% /run/lock
>> tmpfs                                                   362M     0  362M
>>   0% /run/shm
>> ~$ sudo df -h
>> Filesystem                                              Size  Used Avail
>> Use% Mounted on
>> rootfs                                                  9.9G  6.1G  3.3G
>>  65% /
>> udev                                                     10M     0   10M
>>   0% /dev
>> tmpfs                                                   181M  128K  181M
>>   1% /run
>> /dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1  9.9G  6.1G  3.3G
>>  65% /
>> tmpfs                                                   5.0M     0  5.0M
>>   0% /run/lock
>> tmpfs                                                   362M     0  362M
>>   0% /run/shm
>> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
>> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log.1
>>   crasher:
>>     initial call: rabbit_disk_monitor:init/1
>>     pid: <0.19499.0>
>>     registered_name: []
>>     exception exit: unsupported_platform
>>       in function  gen_server:init_it/6 (gen_server.erl, line 320)
>>     ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
>>     messages: []
>>     links: [<0.180.0>]
>>     dictionary: []
>>     trap_exit: false
>>     status: running
>>     heap_size: 6765
>>     stack_size: 24
>>     reductions: 13592
>>   neighbours:
>>
>> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>>      Supervisor: {local,rabbit_disk_monitor_sup}
>>      Context:    start_error
>>      Reason:     unsupported_platform
>>      Offender:   [{pid,{restarting,<0.5000.0>}},
>>                   {name,rabbit_disk_monitor},
>>                   {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>>                   {restart_type,transient},
>>                   {shutdown,4294967295},
>>                   {child_type,worker}]
>>
>>
>> =CRASH REPORT==== 8-Apr-2014::00:38:08 ===
>>   crasher:
>>     initial call: rabbit_disk_monitor:init/1
>>     pid: <0.19502.0>
>>     registered_name: []
>>     exception exit: unsupported_platform
>>       in function  gen_server:init_it/6 (gen_server.erl, line 320)
>>     ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
>>     messages: []
>>     links: [<0.180.0>]
>>     dictionary: []
>>     trap_exit: false
>>     status: running
>>     heap_size: 6765
>>      stack_size: 24
>>     reductions: 13592
>>   neighbours:
>>
>> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>>      Supervisor: {local,rabbit_disk_monitor_sup}
>>      Context:    start_error
>>      Reason:     unsupported_platform
>>      Offender:   [{pid,{restarting,<0.5000.0>}},
>>                   {name,rabbit_disk_monitor},
>>                   {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>>                   {restart_type,transient},
>>                   {shutdown,4294967295},
>>                   {child_type,worker}]
>>
>>
>> =CRASH REPORT==== 8-Apr-2014::00:38:08 ===
>>   crasher:
>>     initial call: rabbit_disk_monitor:init/1
>>     pid: <0.19505.0>
>>     registered_name: []
>>     exception exit: unsupported_platform
>>       in function  gen_server:init_it/6 (gen_server.erl, line 320)
>>     ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
>>     messages: []
>>     links: [<0.180.0>]
>>     dictionary: []
>>     trap_exit: false
>>     status: running
>>     heap_size: 6765
>>      stack_size: 24
>>     reductions: 13592
>>   neighbours:
>>
>> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>>      Supervisor: {local,rabbit_disk_monitor_sup}
>>      Context:    start_error
>>      Reason:     unsupported_platform
>>      Offender:   [{pid,{restarting,<0.5000.0>}},
>>                   {name,rabbit_disk_monitor},
>>                   {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>>                   {restart_type,transient},
>>                   {shutdown,4294967295},
>>                   {child_type,worker}]
>>
>>
>> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>>      Supervisor: {local,rabbit_disk_monitor_sup}
>>      Context:    shutdown
>>      Reason:     reached_max_restart_intensity
>>      Offender:   [{pid,{restarting,<0.5000.0>}},
>>                   {name,rabbit_disk_monitor},
>>                   {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>>                   {restart_type,transient},
>>                   {shutdown,4294967295},
>>                   {child_type,worker}]
>> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2.log
>> =WARNING REPORT==== 8-Apr-2014::07:29:47 ===
>> closing AMQP connection <0.20361.1> (127.0.0.1:48568 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =WARNING REPORT==== 8-Apr-2014::07:29:47 ===
>> closing AMQP connection <0.20392.1> (127.0.0.1:48586 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =WARNING REPORT==== 8-Apr-2014::07:29:49 ===
>> closing AMQP connection <0.20401.1> (127.0.0.1:48589 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =WARNING REPORT==== 8-Apr-2014::07:29:50 ===
>> closing AMQP connection <0.22633.1> (127.0.0.1:50329 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =WARNING REPORT==== 8-Apr-2014::07:29:51 ===
>> closing AMQP connection <0.16156.1> (127.0.0.1:44692 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
>> accepting AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672)
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:11 ===
>> closing AMQP connection <0.22608.1> (127.0.0.1:50316 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
>> accepting AMQP connection <0.22774.1> (127.0.0.1:50371 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
>> accepting AMQP connection <0.22777.1> (127.0.0.1:50372 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
>> accepting AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
>> accepting AMQP connection <0.22805.1> (127.0.0.1:50384 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
>> accepting AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:27 ===
>> accepting AMQP connection <0.22825.1> (127.0.0.1:50386 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:29 ===
>> accepting AMQP connection <0.22834.1> (127.0.0.1:50387 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:30 ===
>> accepting AMQP connection <0.22843.1> (127.0.0.1:50388 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:31 ===
>> accepting AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:34 ===
>> accepting AMQP connection <0.22863.1> (127.0.0.1:50394 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:34 ===
>> accepting AMQP connection <0.22866.1> (127.0.0.1:50395 -> 127.0.0.1:5672)
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:36 ===
>> closing AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:36 ===
>> accepting AMQP connection <0.22883.1> (127.0.0.1:50399 -> 127.0.0.1:5672)
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:37 ===
>> closing AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:38 ===
>> closing AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:39 ===
>> accepting AMQP connection <0.22893.1> (127.0.0.1:50403 -> 127.0.0.1:5672)
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:39 ===
>> closing AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:41 ===
>> accepting AMQP connection <0.22902.1> (127.0.0.1:50409 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:43 ===
>> accepting AMQP connection <0.22913.1> (127.0.0.1:50411 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:49 ===
>> accepting AMQP connection <0.22925.1> (127.0.0.1:50420 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:49 ===
>> accepting AMQP connection <0.22928.1> (127.0.0.1:50421 -> 127.0.0.1:5672)
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:50 ===
>> closing AMQP connection <0.22660.1> (127.0.0.1:50332 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:51 ===
>> accepting AMQP connection <0.22945.1> (127.0.0.1:50423 -> 127.0.0.1:5672)
>> ~$ tail -n 100 /var/log/rabbitmq/shutdown_err
>> /usr/lib/rabbitmq/bin/rabbitmqctl: 1: /etc/rabbitmq/rabbitmq-env.conf:
>> ocr-proc-2=rabbit at localhost: not found
>> ~$ tail -n 100 /var/log/rabbitmq/shutdown_log
>> Stopping and halting node 'rabbit at ocr-proc-2' ...
>> ...done.
>> ~$ tail -n 100 /var/log/rabbitmq/startup_err
>> /usr/lib/rabbitmq/bin/rabbitmq-server: 1:
>> /etc/rabbitmq/rabbitmq-env.conf: ocr-proc-2=rabbit at localhost: not found
>> Killed
>> ~$ tail -n 100 /var/log/rabbitmq/startup_log
>>
>>               RabbitMQ 3.2.4. Copyright (C) 2007-2013 GoPivotal, Inc.
>>   ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
>>   ##  ##
>>   ##########  Logs: /var/log/rabbitmq/rabbit at ocr-proc-2.log
>>   ######  ##        /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
>>   ##########
>>               Starting broker... completed with 6 plugins.
>>
>>
>>
>>
>> Michael Sander
>> mes65 at cornell.edu
>> 607-227-9859
>>
>>
>> On Tue, Apr 8, 2014 at 3:33 AM, Michael Sander <mes65 at cornell.edu> wrote:
>>
>>> Hi Matthais,
>>>
>>> What I sent you was everything I had. However, I did check ps -aux after
>>> the crash and rabbitmq-server was definitely not in there. I will turn off
>>> the cron jobs that automatically restart rabbitmq, and I'll let you know if
>>> I see it again.
>>>
>>> Here is the output of the command.
>>>
>>> $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>> /var/lib/rabbitmq/mnes
>>> ia/").'
>>> "Filesystem                                             1024-blocks
>>>  Used Available Capacity Mounted
>>> on\n/dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1    10320184
>>> 6348300   3447648      65% /\n"
>>> ...done.
>>>
>>>
>>> Also, I'm not sure whether it will help you, but attached is a
>>> screenshot of the rabbitmq console.  If you see the start of the top chart
>>> at 19:00, there is a sharp increase in the queued messages.  That's when I
>>> restarted rabbitmq after the crash.  Everything before that was flat.
>>>  Another point to note is that it currently says that the disk space is
>>> unavailable.  I definitely remember seeing a value there at some point
>>> before, I don't know what causes that to occur.
>>>
>>> I've turned off my rabbimq auto-start cron jobs, I'll let you know if I
>>> see the crash again.
>>>
>>> Thanks again.
>>>
>>> Best,
>>>
>>> Michael Sander
>>>
>>>
>>>
>>> On Tue, Apr 8, 2014 at 1:18 AM, Matthias Radestock <
>>> matthias at rabbitmq.com> wrote:
>>>
>>>> Michael,
>>>>
>>>>
>>>> On 08/04/14 02:50, Michael Sander wrote:
>>>>
>>>>> Full logs are attached.  You'll notice that it crashes pretty often
>>>>> now.
>>>>>
>>>>
>>>> The disk_monitor is crashing frequently, yes, but in none of the
>>>> instances in the logs that actually took down rabbit (notice that there are
>>>> no rabbit starts recorded in the rabbit.log); the disk_monitor restarts
>>>> just fine and the bunny lives on.
>>>>
>>>> Do you have the logs covering the time period around the crash?
>>>>
>>>>
>>>>  Here are the output of the commands
>>>>>
>>>>>     $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>>>>     /var/lib/rabbitmq/mnesia/")'
>>>>>     Error: syntax error before:
>>>>>
>>>>
>>>> Ah, sorry, missed a full stop. Should be
>>>>
>>>>     sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>>> /var/lib/rabbitmq/mnesia/").'
>>>>
>>>>
>>>> Matthias.
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140409/5f7126a3/attachment.html>


More information about the rabbitmq-discuss mailing list