[rabbitmq-discuss] Rabbit MQ Crash

Michael Sander mes65 at cornell.edu
Tue Apr 8 08:42:28 BST 2014


Hi Matthias,

I just checked rabbit the second after sending that, and it appears to have
crashed.  Here is some output, that you may find useful.  Notice that
erlang appears to be alive even though rabbitmq is not.

~$ ps aux|grep rabbit
rabbitmq  1761  0.0  0.0  10836   160 ?        S    Apr07   0:01
/usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
1001     26076  0.0  0.0   6308   600 pts/1    S+   07:34   0:00 grep rabbit
~$ ps aux|grep erlan
rabbitmq  1761  0.0  0.0  10836   160 ?        S    Apr07   0:01
/usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
1001     26078  0.0  0.0   6308   600 pts/1    S+   07:34   0:00 grep erlan
~$ df -h
Filesystem                                              Size  Used Avail
Use% Mounted on
rootfs                                                  9.9G  6.1G  3.3G
 65% /
udev                                                     10M     0   10M
0% /dev
tmpfs                                                   181M  128K  181M
1% /run
/dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1  9.9G  6.1G  3.3G
 65% /
tmpfs                                                   5.0M     0  5.0M
0% /run/lock
tmpfs                                                   362M     0  362M
0% /run/shm
~$ sudo df -h
Filesystem                                              Size  Used Avail
Use% Mounted on
rootfs                                                  9.9G  6.1G  3.3G
 65% /
udev                                                     10M     0   10M
0% /dev
tmpfs                                                   181M  128K  181M
1% /run
/dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1  9.9G  6.1G  3.3G
 65% /
tmpfs                                                   5.0M     0  5.0M
0% /run/lock
tmpfs                                                   362M     0  362M
0% /run/shm
~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log.1
  crasher:
    initial call: rabbit_disk_monitor:init/1
    pid: <0.19499.0>
    registered_name: []
    exception exit: unsupported_platform
      in function  gen_server:init_it/6 (gen_server.erl, line 320)
    ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
    messages: []
    links: [<0.180.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 6765
    stack_size: 24
    reductions: 13592
  neighbours:

=SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
     Supervisor: {local,rabbit_disk_monitor_sup}
     Context:    start_error
     Reason:     unsupported_platform
     Offender:   [{pid,{restarting,<0.5000.0>}},
                  {name,rabbit_disk_monitor},
                  {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]


=CRASH REPORT==== 8-Apr-2014::00:38:08 ===
  crasher:
    initial call: rabbit_disk_monitor:init/1
    pid: <0.19502.0>
    registered_name: []
    exception exit: unsupported_platform
      in function  gen_server:init_it/6 (gen_server.erl, line 320)
    ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
    messages: []
    links: [<0.180.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 6765
    stack_size: 24
    reductions: 13592
  neighbours:

=SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
     Supervisor: {local,rabbit_disk_monitor_sup}
     Context:    start_error
     Reason:     unsupported_platform
     Offender:   [{pid,{restarting,<0.5000.0>}},
                  {name,rabbit_disk_monitor},
                  {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]


=CRASH REPORT==== 8-Apr-2014::00:38:08 ===
  crasher:
    initial call: rabbit_disk_monitor:init/1
    pid: <0.19505.0>
    registered_name: []
    exception exit: unsupported_platform
      in function  gen_server:init_it/6 (gen_server.erl, line 320)
    ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
    messages: []
    links: [<0.180.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 6765
    stack_size: 24
    reductions: 13592
  neighbours:

=SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
     Supervisor: {local,rabbit_disk_monitor_sup}
     Context:    start_error
     Reason:     unsupported_platform
     Offender:   [{pid,{restarting,<0.5000.0>}},
                  {name,rabbit_disk_monitor},
                  {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]


=SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
     Supervisor: {local,rabbit_disk_monitor_sup}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,{restarting,<0.5000.0>}},
                  {name,rabbit_disk_monitor},
                  {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
                  {restart_type,transient},
                  {shutdown,4294967295},
                  {child_type,worker}]
~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2.log
=WARNING REPORT==== 8-Apr-2014::07:29:47 ===
closing AMQP connection <0.20361.1> (127.0.0.1:48568 -> 127.0.0.1:5672):
connection_closed_abruptly

=WARNING REPORT==== 8-Apr-2014::07:29:47 ===
closing AMQP connection <0.20392.1> (127.0.0.1:48586 -> 127.0.0.1:5672):
connection_closed_abruptly

=WARNING REPORT==== 8-Apr-2014::07:29:49 ===
closing AMQP connection <0.20401.1> (127.0.0.1:48589 -> 127.0.0.1:5672):
connection_closed_abruptly

=WARNING REPORT==== 8-Apr-2014::07:29:50 ===
closing AMQP connection <0.22633.1> (127.0.0.1:50329 -> 127.0.0.1:5672):
connection_closed_abruptly

=WARNING REPORT==== 8-Apr-2014::07:29:51 ===
closing AMQP connection <0.16156.1> (127.0.0.1:44692 -> 127.0.0.1:5672):
connection_closed_abruptly

=INFO REPORT==== 8-Apr-2014::07:30:11 ===
accepting AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672)

=WARNING REPORT==== 8-Apr-2014::07:30:11 ===
closing AMQP connection <0.22608.1> (127.0.0.1:50316 -> 127.0.0.1:5672):
connection_closed_abruptly

=INFO REPORT==== 8-Apr-2014::07:30:11 ===
accepting AMQP connection <0.22774.1> (127.0.0.1:50371 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:11 ===
accepting AMQP connection <0.22777.1> (127.0.0.1:50372 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:26 ===
accepting AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:26 ===
accepting AMQP connection <0.22805.1> (127.0.0.1:50384 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:26 ===
accepting AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:27 ===
accepting AMQP connection <0.22825.1> (127.0.0.1:50386 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:29 ===
accepting AMQP connection <0.22834.1> (127.0.0.1:50387 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:30 ===
accepting AMQP connection <0.22843.1> (127.0.0.1:50388 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:31 ===
accepting AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:34 ===
accepting AMQP connection <0.22863.1> (127.0.0.1:50394 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:34 ===
accepting AMQP connection <0.22866.1> (127.0.0.1:50395 -> 127.0.0.1:5672)

=WARNING REPORT==== 8-Apr-2014::07:30:36 ===
closing AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672):
connection_closed_abruptly

=INFO REPORT==== 8-Apr-2014::07:30:36 ===
accepting AMQP connection <0.22883.1> (127.0.0.1:50399 -> 127.0.0.1:5672)

=WARNING REPORT==== 8-Apr-2014::07:30:37 ===
closing AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672):
connection_closed_abruptly

=WARNING REPORT==== 8-Apr-2014::07:30:38 ===
closing AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672):
connection_closed_abruptly

=INFO REPORT==== 8-Apr-2014::07:30:39 ===
accepting AMQP connection <0.22893.1> (127.0.0.1:50403 -> 127.0.0.1:5672)

=WARNING REPORT==== 8-Apr-2014::07:30:39 ===
closing AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672):
connection_closed_abruptly

=INFO REPORT==== 8-Apr-2014::07:30:41 ===
accepting AMQP connection <0.22902.1> (127.0.0.1:50409 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:43 ===
accepting AMQP connection <0.22913.1> (127.0.0.1:50411 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:49 ===
accepting AMQP connection <0.22925.1> (127.0.0.1:50420 -> 127.0.0.1:5672)

=INFO REPORT==== 8-Apr-2014::07:30:49 ===
accepting AMQP connection <0.22928.1> (127.0.0.1:50421 -> 127.0.0.1:5672)

=WARNING REPORT==== 8-Apr-2014::07:30:50 ===
closing AMQP connection <0.22660.1> (127.0.0.1:50332 -> 127.0.0.1:5672):
connection_closed_abruptly

=INFO REPORT==== 8-Apr-2014::07:30:51 ===
accepting AMQP connection <0.22945.1> (127.0.0.1:50423 -> 127.0.0.1:5672)
~$ tail -n 100 /var/log/rabbitmq/shutdown_err
/usr/lib/rabbitmq/bin/rabbitmqctl: 1: /etc/rabbitmq/rabbitmq-env.conf:
ocr-proc-2=rabbit at localhost: not found
~$ tail -n 100 /var/log/rabbitmq/shutdown_log
Stopping and halting node 'rabbit at ocr-proc-2' ...
...done.
~$ tail -n 100 /var/log/rabbitmq/startup_err
/usr/lib/rabbitmq/bin/rabbitmq-server: 1: /etc/rabbitmq/rabbitmq-env.conf:
ocr-proc-2=rabbit at localhost: not found
Killed
~$ tail -n 100 /var/log/rabbitmq/startup_log

              RabbitMQ 3.2.4. Copyright (C) 2007-2013 GoPivotal, Inc.
  ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
  ##  ##
  ##########  Logs: /var/log/rabbitmq/rabbit at ocr-proc-2.log
  ######  ##        /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
  ##########
              Starting broker... completed with 6 plugins.



ᐧ

Michael Sander
mes65 at cornell.edu
607-227-9859


On Tue, Apr 8, 2014 at 3:33 AM, Michael Sander <mes65 at cornell.edu> wrote:

> Hi Matthais,
>
> What I sent you was everything I had. However, I did check ps -aux after
> the crash and rabbitmq-server was definitely not in there. I will turn off
> the cron jobs that automatically restart rabbitmq, and I'll let you know if
> I see it again.
>
> Here is the output of the command.
>
> $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
> /var/lib/rabbitmq/mnes
> ia/").'
> "Filesystem                                             1024-blocks
>  Used Available Capacity Mounted
> on\n/dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1    10320184
> 6348300   3447648      65% /\n"
> ...done.
>
>
> Also, I'm not sure whether it will help you, but attached is a screenshot
> of the rabbitmq console.  If you see the start of the top chart at 19:00,
> there is a sharp increase in the queued messages.  That's when I restarted
> rabbitmq after the crash.  Everything before that was flat.  Another point
> to note is that it currently says that the disk space is unavailable.  I
> definitely remember seeing a value there at some point before, I don't know
> what causes that to occur.
>
> I've turned off my rabbimq auto-start cron jobs, I'll let you know if I
> see the crash again.
>
> Thanks again.
>
> Best,
>
> Michael Sander
>
>
>
> On Tue, Apr 8, 2014 at 1:18 AM, Matthias Radestock <matthias at rabbitmq.com>wrote:
>
>> Michael,
>>
>>
>> On 08/04/14 02:50, Michael Sander wrote:
>>
>>> Full logs are attached.  You'll notice that it crashes pretty often now.
>>>
>>
>> The disk_monitor is crashing frequently, yes, but in none of the
>> instances in the logs that actually took down rabbit (notice that there are
>> no rabbit starts recorded in the rabbit.log); the disk_monitor restarts
>> just fine and the bunny lives on.
>>
>> Do you have the logs covering the time period around the crash?
>>
>>
>>  Here are the output of the commands
>>>
>>>     $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>>     /var/lib/rabbitmq/mnesia/")'
>>>     Error: syntax error before:
>>>
>>
>> Ah, sorry, missed a full stop. Should be
>>
>>     sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>> /var/lib/rabbitmq/mnesia/").'
>>
>>
>> Matthias.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140408/dc644dfc/attachment.html>


More information about the rabbitmq-discuss mailing list