[rabbitmq-discuss] Rabbit MQ Crash
Michael Sander
mes65 at cornell.edu
Tue Apr 8 08:42:28 BST 2014
Hi Matthias,
I just checked rabbit the second after sending that, and it appears to have
crashed. Here is some output, that you may find useful. Notice that
erlang appears to be alive even though rabbitmq is not.
~$ ps aux|grep rabbit
rabbitmq 1761 0.0 0.0 10836 160 ? S Apr07 0:01
/usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
1001 26076 0.0 0.0 6308 600 pts/1 S+ 07:34 0:00 grep rabbit
~$ ps aux|grep erlan
rabbitmq 1761 0.0 0.0 10836 160 ? S Apr07 0:01
/usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
1001 26078 0.0 0.0 6308 600 pts/1 S+ 07:34 0:00 grep erlan
~$ df -h
Filesystem Size Used Avail
Use% Mounted on
rootfs 9.9G 6.1G 3.3G
65% /
udev 10M 0 10M
0% /dev
tmpfs 181M 128K 181M
1% /run
/dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1 9.9G 6.1G 3.3G
65% /
tmpfs 5.0M 0 5.0M
0% /run/lock
tmpfs 362M 0 362M
0% /run/shm
~$ sudo df -h
Filesystem Size Used Avail
Use% Mounted on
rootfs 9.9G 6.1G 3.3G
65% /
udev 10M 0 10M
0% /dev
tmpfs 181M 128K 181M
1% /run
/dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1 9.9G 6.1G 3.3G
65% /
tmpfs 5.0M 0 5.0M
0% /run/lock
tmpfs 362M 0 362M
0% /run/shm
~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log.1
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.19499.0>
registered_name: []
exception exit: unsupported_platform
in function gen_server:init_it/6 (gen_server.erl, line 320)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
messages: []
links: [<0.180.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 6765
stack_size: 24
reductions: 13592
neighbours:
=SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: start_error
Reason: unsupported_platform
Offender: [{pid,{restarting,<0.5000.0>}},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,transient},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 8-Apr-2014::00:38:08 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.19502.0>
registered_name: []
exception exit: unsupported_platform
in function gen_server:init_it/6 (gen_server.erl, line 320)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
messages: []
links: [<0.180.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 6765
stack_size: 24
reductions: 13592
neighbours:
=SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: start_error
Reason: unsupported_platform
Offender: [{pid,{restarting,<0.5000.0>}},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,transient},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 8-Apr-2014::00:38:08 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.19505.0>
registered_name: []
exception exit: unsupported_platform
in function gen_server:init_it/6 (gen_server.erl, line 320)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
messages: []
links: [<0.180.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 6765
stack_size: 24
reductions: 13592
neighbours:
=SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: start_error
Reason: unsupported_platform
Offender: [{pid,{restarting,<0.5000.0>}},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,transient},
{shutdown,4294967295},
{child_type,worker}]
=SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,{restarting,<0.5000.0>}},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,transient},
{shutdown,4294967295},
{child_type,worker}]
~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2.log
=WARNING REPORT==== 8-Apr-2014::07:29:47 ===
closing AMQP connection <0.20361.1> (127.0.0.1:48568 -> 127.0.0.1:5672):
connection_closed_abruptly
=WARNING REPORT==== 8-Apr-2014::07:29:47 ===
closing AMQP connection <0.20392.1> (127.0.0.1:48586 -> 127.0.0.1:5672):
connection_closed_abruptly
=WARNING REPORT==== 8-Apr-2014::07:29:49 ===
closing AMQP connection <0.20401.1> (127.0.0.1:48589 -> 127.0.0.1:5672):
connection_closed_abruptly
=WARNING REPORT==== 8-Apr-2014::07:29:50 ===
closing AMQP connection <0.22633.1> (127.0.0.1:50329 -> 127.0.0.1:5672):
connection_closed_abruptly
=WARNING REPORT==== 8-Apr-2014::07:29:51 ===
closing AMQP connection <0.16156.1> (127.0.0.1:44692 -> 127.0.0.1:5672):
connection_closed_abruptly
=INFO REPORT==== 8-Apr-2014::07:30:11 ===
accepting AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672)
=WARNING REPORT==== 8-Apr-2014::07:30:11 ===
closing AMQP connection <0.22608.1> (127.0.0.1:50316 -> 127.0.0.1:5672):
connection_closed_abruptly
=INFO REPORT==== 8-Apr-2014::07:30:11 ===
accepting AMQP connection <0.22774.1> (127.0.0.1:50371 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:11 ===
accepting AMQP connection <0.22777.1> (127.0.0.1:50372 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:26 ===
accepting AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:26 ===
accepting AMQP connection <0.22805.1> (127.0.0.1:50384 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:26 ===
accepting AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:27 ===
accepting AMQP connection <0.22825.1> (127.0.0.1:50386 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:29 ===
accepting AMQP connection <0.22834.1> (127.0.0.1:50387 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:30 ===
accepting AMQP connection <0.22843.1> (127.0.0.1:50388 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:31 ===
accepting AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:34 ===
accepting AMQP connection <0.22863.1> (127.0.0.1:50394 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:34 ===
accepting AMQP connection <0.22866.1> (127.0.0.1:50395 -> 127.0.0.1:5672)
=WARNING REPORT==== 8-Apr-2014::07:30:36 ===
closing AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672):
connection_closed_abruptly
=INFO REPORT==== 8-Apr-2014::07:30:36 ===
accepting AMQP connection <0.22883.1> (127.0.0.1:50399 -> 127.0.0.1:5672)
=WARNING REPORT==== 8-Apr-2014::07:30:37 ===
closing AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672):
connection_closed_abruptly
=WARNING REPORT==== 8-Apr-2014::07:30:38 ===
closing AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672):
connection_closed_abruptly
=INFO REPORT==== 8-Apr-2014::07:30:39 ===
accepting AMQP connection <0.22893.1> (127.0.0.1:50403 -> 127.0.0.1:5672)
=WARNING REPORT==== 8-Apr-2014::07:30:39 ===
closing AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672):
connection_closed_abruptly
=INFO REPORT==== 8-Apr-2014::07:30:41 ===
accepting AMQP connection <0.22902.1> (127.0.0.1:50409 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:43 ===
accepting AMQP connection <0.22913.1> (127.0.0.1:50411 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:49 ===
accepting AMQP connection <0.22925.1> (127.0.0.1:50420 -> 127.0.0.1:5672)
=INFO REPORT==== 8-Apr-2014::07:30:49 ===
accepting AMQP connection <0.22928.1> (127.0.0.1:50421 -> 127.0.0.1:5672)
=WARNING REPORT==== 8-Apr-2014::07:30:50 ===
closing AMQP connection <0.22660.1> (127.0.0.1:50332 -> 127.0.0.1:5672):
connection_closed_abruptly
=INFO REPORT==== 8-Apr-2014::07:30:51 ===
accepting AMQP connection <0.22945.1> (127.0.0.1:50423 -> 127.0.0.1:5672)
~$ tail -n 100 /var/log/rabbitmq/shutdown_err
/usr/lib/rabbitmq/bin/rabbitmqctl: 1: /etc/rabbitmq/rabbitmq-env.conf:
ocr-proc-2=rabbit at localhost: not found
~$ tail -n 100 /var/log/rabbitmq/shutdown_log
Stopping and halting node 'rabbit at ocr-proc-2' ...
...done.
~$ tail -n 100 /var/log/rabbitmq/startup_err
/usr/lib/rabbitmq/bin/rabbitmq-server: 1: /etc/rabbitmq/rabbitmq-env.conf:
ocr-proc-2=rabbit at localhost: not found
Killed
~$ tail -n 100 /var/log/rabbitmq/startup_log
RabbitMQ 3.2.4. Copyright (C) 2007-2013 GoPivotal, Inc.
## ## Licensed under the MPL. See http://www.rabbitmq.com/
## ##
########## Logs: /var/log/rabbitmq/rabbit at ocr-proc-2.log
###### ## /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
##########
Starting broker... completed with 6 plugins.
ᐧ
Michael Sander
mes65 at cornell.edu
607-227-9859
On Tue, Apr 8, 2014 at 3:33 AM, Michael Sander <mes65 at cornell.edu> wrote:
> Hi Matthais,
>
> What I sent you was everything I had. However, I did check ps -aux after
> the crash and rabbitmq-server was definitely not in there. I will turn off
> the cron jobs that automatically restart rabbitmq, and I'll let you know if
> I see it again.
>
> Here is the output of the command.
>
> $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
> /var/lib/rabbitmq/mnes
> ia/").'
> "Filesystem 1024-blocks
> Used Available Capacity Mounted
> on\n/dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1 10320184
> 6348300 3447648 65% /\n"
> ...done.
>
>
> Also, I'm not sure whether it will help you, but attached is a screenshot
> of the rabbitmq console. If you see the start of the top chart at 19:00,
> there is a sharp increase in the queued messages. That's when I restarted
> rabbitmq after the crash. Everything before that was flat. Another point
> to note is that it currently says that the disk space is unavailable. I
> definitely remember seeing a value there at some point before, I don't know
> what causes that to occur.
>
> I've turned off my rabbimq auto-start cron jobs, I'll let you know if I
> see the crash again.
>
> Thanks again.
>
> Best,
>
> Michael Sander
>
>
>
> On Tue, Apr 8, 2014 at 1:18 AM, Matthias Radestock <matthias at rabbitmq.com>wrote:
>
>> Michael,
>>
>>
>> On 08/04/14 02:50, Michael Sander wrote:
>>
>>> Full logs are attached. You'll notice that it crashes pretty often now.
>>>
>>
>> The disk_monitor is crashing frequently, yes, but in none of the
>> instances in the logs that actually took down rabbit (notice that there are
>> no rabbit starts recorded in the rabbit.log); the disk_monitor restarts
>> just fine and the bunny lives on.
>>
>> Do you have the logs covering the time period around the crash?
>>
>>
>> Here are the output of the commands
>>>
>>> $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>> /var/lib/rabbitmq/mnesia/")'
>>> Error: syntax error before:
>>>
>>
>> Ah, sorry, missed a full stop. Should be
>>
>> sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>> /var/lib/rabbitmq/mnesia/").'
>>
>>
>> Matthias.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140408/dc644dfc/attachment.html>
More information about the rabbitmq-discuss
mailing list