[rabbitmq-discuss] Rabbit MQ Crash

Michael Sander mes65 at cornell.edu
Tue Apr 8 08:49:31 BST 2014


Unfortunately, I had to restart rabbitmq because it's on a production
machine.  Attached is screenshot after the restart.  Notice that this time,
free disk space limit is available.

Please let me know what additional information you would like.  If it goes
down again, I will collect it.

Michael Sander



On Tue, Apr 8, 2014 at 3:42 AM, Michael Sander <mes65 at cornell.edu> wrote:

> Hi Matthias,
>
> I just checked rabbit the second after sending that, and it appears to
> have crashed.  Here is some output, that you may find useful.  Notice that
> erlang appears to be alive even though rabbitmq is not.
>
> ~$ ps aux|grep rabbit
> rabbitmq  1761  0.0  0.0  10836   160 ?        S    Apr07   0:01
> /usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
> 1001     26076  0.0  0.0   6308   600 pts/1    S+   07:34   0:00 grep
> rabbit
> ~$ ps aux|grep erlan
> rabbitmq  1761  0.0  0.0  10836   160 ?        S    Apr07   0:01
> /usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
> 1001     26078  0.0  0.0   6308   600 pts/1    S+   07:34   0:00 grep erlan
> ~$ df -h
> Filesystem                                              Size  Used Avail
> Use% Mounted on
> rootfs                                                  9.9G  6.1G  3.3G
>  65% /
> udev                                                     10M     0   10M
> 0% /dev
> tmpfs                                                   181M  128K  181M
> 1% /run
> /dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1  9.9G  6.1G  3.3G
>  65% /
> tmpfs                                                   5.0M     0  5.0M
> 0% /run/lock
> tmpfs                                                   362M     0  362M
> 0% /run/shm
> ~$ sudo df -h
> Filesystem                                              Size  Used Avail
> Use% Mounted on
> rootfs                                                  9.9G  6.1G  3.3G
>  65% /
> udev                                                     10M     0   10M
> 0% /dev
> tmpfs                                                   181M  128K  181M
> 1% /run
> /dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1  9.9G  6.1G  3.3G
>  65% /
> tmpfs                                                   5.0M     0  5.0M
> 0% /run/lock
> tmpfs                                                   362M     0  362M
> 0% /run/shm
> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log.1
>   crasher:
>     initial call: rabbit_disk_monitor:init/1
>     pid: <0.19499.0>
>     registered_name: []
>     exception exit: unsupported_platform
>       in function  gen_server:init_it/6 (gen_server.erl, line 320)
>     ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
>     messages: []
>     links: [<0.180.0>]
>     dictionary: []
>     trap_exit: false
>     status: running
>     heap_size: 6765
>     stack_size: 24
>     reductions: 13592
>   neighbours:
>
> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>      Supervisor: {local,rabbit_disk_monitor_sup}
>      Context:    start_error
>      Reason:     unsupported_platform
>      Offender:   [{pid,{restarting,<0.5000.0>}},
>                   {name,rabbit_disk_monitor},
>                   {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>                   {restart_type,transient},
>                   {shutdown,4294967295},
>                   {child_type,worker}]
>
>
> =CRASH REPORT==== 8-Apr-2014::00:38:08 ===
>   crasher:
>     initial call: rabbit_disk_monitor:init/1
>     pid: <0.19502.0>
>     registered_name: []
>     exception exit: unsupported_platform
>       in function  gen_server:init_it/6 (gen_server.erl, line 320)
>     ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
>     messages: []
>     links: [<0.180.0>]
>     dictionary: []
>     trap_exit: false
>     status: running
>     heap_size: 6765
>      stack_size: 24
>     reductions: 13592
>   neighbours:
>
> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>      Supervisor: {local,rabbit_disk_monitor_sup}
>      Context:    start_error
>      Reason:     unsupported_platform
>      Offender:   [{pid,{restarting,<0.5000.0>}},
>                   {name,rabbit_disk_monitor},
>                   {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>                   {restart_type,transient},
>                   {shutdown,4294967295},
>                   {child_type,worker}]
>
>
> =CRASH REPORT==== 8-Apr-2014::00:38:08 ===
>   crasher:
>     initial call: rabbit_disk_monitor:init/1
>     pid: <0.19505.0>
>     registered_name: []
>     exception exit: unsupported_platform
>       in function  gen_server:init_it/6 (gen_server.erl, line 320)
>     ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
>     messages: []
>     links: [<0.180.0>]
>     dictionary: []
>     trap_exit: false
>     status: running
>     heap_size: 6765
>      stack_size: 24
>     reductions: 13592
>   neighbours:
>
> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>      Supervisor: {local,rabbit_disk_monitor_sup}
>      Context:    start_error
>      Reason:     unsupported_platform
>      Offender:   [{pid,{restarting,<0.5000.0>}},
>                   {name,rabbit_disk_monitor},
>                   {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>                   {restart_type,transient},
>                   {shutdown,4294967295},
>                   {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>      Supervisor: {local,rabbit_disk_monitor_sup}
>      Context:    shutdown
>      Reason:     reached_max_restart_intensity
>      Offender:   [{pid,{restarting,<0.5000.0>}},
>                   {name,rabbit_disk_monitor},
>                   {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>                   {restart_type,transient},
>                   {shutdown,4294967295},
>                   {child_type,worker}]
> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2.log
> =WARNING REPORT==== 8-Apr-2014::07:29:47 ===
> closing AMQP connection <0.20361.1> (127.0.0.1:48568 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =WARNING REPORT==== 8-Apr-2014::07:29:47 ===
> closing AMQP connection <0.20392.1> (127.0.0.1:48586 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =WARNING REPORT==== 8-Apr-2014::07:29:49 ===
> closing AMQP connection <0.20401.1> (127.0.0.1:48589 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =WARNING REPORT==== 8-Apr-2014::07:29:50 ===
> closing AMQP connection <0.22633.1> (127.0.0.1:50329 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =WARNING REPORT==== 8-Apr-2014::07:29:51 ===
> closing AMQP connection <0.16156.1> (127.0.0.1:44692 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
> accepting AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672)
>
> =WARNING REPORT==== 8-Apr-2014::07:30:11 ===
> closing AMQP connection <0.22608.1> (127.0.0.1:50316 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
> accepting AMQP connection <0.22774.1> (127.0.0.1:50371 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
> accepting AMQP connection <0.22777.1> (127.0.0.1:50372 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
> accepting AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
> accepting AMQP connection <0.22805.1> (127.0.0.1:50384 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
> accepting AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:27 ===
> accepting AMQP connection <0.22825.1> (127.0.0.1:50386 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:29 ===
> accepting AMQP connection <0.22834.1> (127.0.0.1:50387 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:30 ===
> accepting AMQP connection <0.22843.1> (127.0.0.1:50388 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:31 ===
> accepting AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:34 ===
> accepting AMQP connection <0.22863.1> (127.0.0.1:50394 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:34 ===
> accepting AMQP connection <0.22866.1> (127.0.0.1:50395 -> 127.0.0.1:5672)
>
> =WARNING REPORT==== 8-Apr-2014::07:30:36 ===
> closing AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:36 ===
> accepting AMQP connection <0.22883.1> (127.0.0.1:50399 -> 127.0.0.1:5672)
>
> =WARNING REPORT==== 8-Apr-2014::07:30:37 ===
> closing AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =WARNING REPORT==== 8-Apr-2014::07:30:38 ===
> closing AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:39 ===
> accepting AMQP connection <0.22893.1> (127.0.0.1:50403 -> 127.0.0.1:5672)
>
> =WARNING REPORT==== 8-Apr-2014::07:30:39 ===
> closing AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:41 ===
> accepting AMQP connection <0.22902.1> (127.0.0.1:50409 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:43 ===
> accepting AMQP connection <0.22913.1> (127.0.0.1:50411 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:49 ===
> accepting AMQP connection <0.22925.1> (127.0.0.1:50420 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:49 ===
> accepting AMQP connection <0.22928.1> (127.0.0.1:50421 -> 127.0.0.1:5672)
>
> =WARNING REPORT==== 8-Apr-2014::07:30:50 ===
> closing AMQP connection <0.22660.1> (127.0.0.1:50332 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:51 ===
> accepting AMQP connection <0.22945.1> (127.0.0.1:50423 -> 127.0.0.1:5672)
> ~$ tail -n 100 /var/log/rabbitmq/shutdown_err
> /usr/lib/rabbitmq/bin/rabbitmqctl: 1: /etc/rabbitmq/rabbitmq-env.conf:
> ocr-proc-2=rabbit at localhost: not found
> ~$ tail -n 100 /var/log/rabbitmq/shutdown_log
> Stopping and halting node 'rabbit at ocr-proc-2' ...
> ...done.
> ~$ tail -n 100 /var/log/rabbitmq/startup_err
> /usr/lib/rabbitmq/bin/rabbitmq-server: 1: /etc/rabbitmq/rabbitmq-env.conf:
> ocr-proc-2=rabbit at localhost: not found
> Killed
> ~$ tail -n 100 /var/log/rabbitmq/startup_log
>
>               RabbitMQ 3.2.4. Copyright (C) 2007-2013 GoPivotal, Inc.
>   ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
>   ##  ##
>   ##########  Logs: /var/log/rabbitmq/rabbit at ocr-proc-2.log
>   ######  ##        /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
>   ##########
>               Starting broker... completed with 6 plugins.
>
>
>
>
> Michael Sander
> mes65 at cornell.edu
> 607-227-9859
>
>
> On Tue, Apr 8, 2014 at 3:33 AM, Michael Sander <mes65 at cornell.edu> wrote:
>
>> Hi Matthais,
>>
>> What I sent you was everything I had. However, I did check ps -aux after
>> the crash and rabbitmq-server was definitely not in there. I will turn off
>> the cron jobs that automatically restart rabbitmq, and I'll let you know if
>> I see it again.
>>
>> Here is the output of the command.
>>
>> $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>> /var/lib/rabbitmq/mnes
>> ia/").'
>> "Filesystem                                             1024-blocks
>>  Used Available Capacity Mounted
>> on\n/dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1    10320184
>> 6348300   3447648      65% /\n"
>> ...done.
>>
>>
>> Also, I'm not sure whether it will help you, but attached is a screenshot
>> of the rabbitmq console.  If you see the start of the top chart at 19:00,
>> there is a sharp increase in the queued messages.  That's when I restarted
>> rabbitmq after the crash.  Everything before that was flat.  Another point
>> to note is that it currently says that the disk space is unavailable.  I
>> definitely remember seeing a value there at some point before, I don't know
>> what causes that to occur.
>>
>> I've turned off my rabbimq auto-start cron jobs, I'll let you know if I
>> see the crash again.
>>
>> Thanks again.
>>
>> Best,
>>
>> Michael Sander
>>
>>
>>
>> On Tue, Apr 8, 2014 at 1:18 AM, Matthias Radestock <matthias at rabbitmq.com
>> > wrote:
>>
>>> Michael,
>>>
>>>
>>> On 08/04/14 02:50, Michael Sander wrote:
>>>
>>>> Full logs are attached.  You'll notice that it crashes pretty often now.
>>>>
>>>
>>> The disk_monitor is crashing frequently, yes, but in none of the
>>> instances in the logs that actually took down rabbit (notice that there are
>>> no rabbit starts recorded in the rabbit.log); the disk_monitor restarts
>>> just fine and the bunny lives on.
>>>
>>> Do you have the logs covering the time period around the crash?
>>>
>>>
>>>  Here are the output of the commands
>>>>
>>>>     $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>>>     /var/lib/rabbitmq/mnesia/")'
>>>>     Error: syntax error before:
>>>>
>>>
>>> Ah, sorry, missed a full stop. Should be
>>>
>>>     sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>> /var/lib/rabbitmq/mnesia/").'
>>>
>>>
>>> Matthias.
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140408/ddef1524/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screenshot2.png
Type: image/png
Size: 219630 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140408/ddef1524/attachment.png>


More information about the rabbitmq-discuss mailing list