[rabbitmq-discuss] Rabbit MQ Crash
Michael Sander
mes65 at cornell.edu
Tue Apr 8 08:49:31 BST 2014
Unfortunately, I had to restart rabbitmq because it's on a production
machine. Attached is screenshot after the restart. Notice that this time,
free disk space limit is available.
Please let me know what additional information you would like. If it goes
down again, I will collect it.
Michael Sander
On Tue, Apr 8, 2014 at 3:42 AM, Michael Sander <mes65 at cornell.edu> wrote:
> Hi Matthias,
>
> I just checked rabbit the second after sending that, and it appears to
> have crashed. Here is some output, that you may find useful. Notice that
> erlang appears to be alive even though rabbitmq is not.
>
> ~$ ps aux|grep rabbit
> rabbitmq 1761 0.0 0.0 10836 160 ? S Apr07 0:01
> /usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
> 1001 26076 0.0 0.0 6308 600 pts/1 S+ 07:34 0:00 grep
> rabbit
> ~$ ps aux|grep erlan
> rabbitmq 1761 0.0 0.0 10836 160 ? S Apr07 0:01
> /usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
> 1001 26078 0.0 0.0 6308 600 pts/1 S+ 07:34 0:00 grep erlan
> ~$ df -h
> Filesystem Size Used Avail
> Use% Mounted on
> rootfs 9.9G 6.1G 3.3G
> 65% /
> udev 10M 0 10M
> 0% /dev
> tmpfs 181M 128K 181M
> 1% /run
> /dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1 9.9G 6.1G 3.3G
> 65% /
> tmpfs 5.0M 0 5.0M
> 0% /run/lock
> tmpfs 362M 0 362M
> 0% /run/shm
> ~$ sudo df -h
> Filesystem Size Used Avail
> Use% Mounted on
> rootfs 9.9G 6.1G 3.3G
> 65% /
> udev 10M 0 10M
> 0% /dev
> tmpfs 181M 128K 181M
> 1% /run
> /dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1 9.9G 6.1G 3.3G
> 65% /
> tmpfs 5.0M 0 5.0M
> 0% /run/lock
> tmpfs 362M 0 362M
> 0% /run/shm
> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log.1
> crasher:
> initial call: rabbit_disk_monitor:init/1
> pid: <0.19499.0>
> registered_name: []
> exception exit: unsupported_platform
> in function gen_server:init_it/6 (gen_server.erl, line 320)
> ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
> messages: []
> links: [<0.180.0>]
> dictionary: []
> trap_exit: false
> status: running
> heap_size: 6765
> stack_size: 24
> reductions: 13592
> neighbours:
>
> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
> Supervisor: {local,rabbit_disk_monitor_sup}
> Context: start_error
> Reason: unsupported_platform
> Offender: [{pid,{restarting,<0.5000.0>}},
> {name,rabbit_disk_monitor},
> {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
> {restart_type,transient},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =CRASH REPORT==== 8-Apr-2014::00:38:08 ===
> crasher:
> initial call: rabbit_disk_monitor:init/1
> pid: <0.19502.0>
> registered_name: []
> exception exit: unsupported_platform
> in function gen_server:init_it/6 (gen_server.erl, line 320)
> ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
> messages: []
> links: [<0.180.0>]
> dictionary: []
> trap_exit: false
> status: running
> heap_size: 6765
> stack_size: 24
> reductions: 13592
> neighbours:
>
> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
> Supervisor: {local,rabbit_disk_monitor_sup}
> Context: start_error
> Reason: unsupported_platform
> Offender: [{pid,{restarting,<0.5000.0>}},
> {name,rabbit_disk_monitor},
> {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
> {restart_type,transient},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =CRASH REPORT==== 8-Apr-2014::00:38:08 ===
> crasher:
> initial call: rabbit_disk_monitor:init/1
> pid: <0.19505.0>
> registered_name: []
> exception exit: unsupported_platform
> in function gen_server:init_it/6 (gen_server.erl, line 320)
> ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
> messages: []
> links: [<0.180.0>]
> dictionary: []
> trap_exit: false
> status: running
> heap_size: 6765
> stack_size: 24
> reductions: 13592
> neighbours:
>
> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
> Supervisor: {local,rabbit_disk_monitor_sup}
> Context: start_error
> Reason: unsupported_platform
> Offender: [{pid,{restarting,<0.5000.0>}},
> {name,rabbit_disk_monitor},
> {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
> {restart_type,transient},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
> Supervisor: {local,rabbit_disk_monitor_sup}
> Context: shutdown
> Reason: reached_max_restart_intensity
> Offender: [{pid,{restarting,<0.5000.0>}},
> {name,rabbit_disk_monitor},
> {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
> {restart_type,transient},
> {shutdown,4294967295},
> {child_type,worker}]
> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2.log
> =WARNING REPORT==== 8-Apr-2014::07:29:47 ===
> closing AMQP connection <0.20361.1> (127.0.0.1:48568 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =WARNING REPORT==== 8-Apr-2014::07:29:47 ===
> closing AMQP connection <0.20392.1> (127.0.0.1:48586 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =WARNING REPORT==== 8-Apr-2014::07:29:49 ===
> closing AMQP connection <0.20401.1> (127.0.0.1:48589 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =WARNING REPORT==== 8-Apr-2014::07:29:50 ===
> closing AMQP connection <0.22633.1> (127.0.0.1:50329 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =WARNING REPORT==== 8-Apr-2014::07:29:51 ===
> closing AMQP connection <0.16156.1> (127.0.0.1:44692 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
> accepting AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672)
>
> =WARNING REPORT==== 8-Apr-2014::07:30:11 ===
> closing AMQP connection <0.22608.1> (127.0.0.1:50316 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
> accepting AMQP connection <0.22774.1> (127.0.0.1:50371 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
> accepting AMQP connection <0.22777.1> (127.0.0.1:50372 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
> accepting AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
> accepting AMQP connection <0.22805.1> (127.0.0.1:50384 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
> accepting AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:27 ===
> accepting AMQP connection <0.22825.1> (127.0.0.1:50386 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:29 ===
> accepting AMQP connection <0.22834.1> (127.0.0.1:50387 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:30 ===
> accepting AMQP connection <0.22843.1> (127.0.0.1:50388 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:31 ===
> accepting AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:34 ===
> accepting AMQP connection <0.22863.1> (127.0.0.1:50394 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:34 ===
> accepting AMQP connection <0.22866.1> (127.0.0.1:50395 -> 127.0.0.1:5672)
>
> =WARNING REPORT==== 8-Apr-2014::07:30:36 ===
> closing AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:36 ===
> accepting AMQP connection <0.22883.1> (127.0.0.1:50399 -> 127.0.0.1:5672)
>
> =WARNING REPORT==== 8-Apr-2014::07:30:37 ===
> closing AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =WARNING REPORT==== 8-Apr-2014::07:30:38 ===
> closing AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:39 ===
> accepting AMQP connection <0.22893.1> (127.0.0.1:50403 -> 127.0.0.1:5672)
>
> =WARNING REPORT==== 8-Apr-2014::07:30:39 ===
> closing AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:41 ===
> accepting AMQP connection <0.22902.1> (127.0.0.1:50409 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:43 ===
> accepting AMQP connection <0.22913.1> (127.0.0.1:50411 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:49 ===
> accepting AMQP connection <0.22925.1> (127.0.0.1:50420 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 8-Apr-2014::07:30:49 ===
> accepting AMQP connection <0.22928.1> (127.0.0.1:50421 -> 127.0.0.1:5672)
>
> =WARNING REPORT==== 8-Apr-2014::07:30:50 ===
> closing AMQP connection <0.22660.1> (127.0.0.1:50332 -> 127.0.0.1:5672):
> connection_closed_abruptly
>
> =INFO REPORT==== 8-Apr-2014::07:30:51 ===
> accepting AMQP connection <0.22945.1> (127.0.0.1:50423 -> 127.0.0.1:5672)
> ~$ tail -n 100 /var/log/rabbitmq/shutdown_err
> /usr/lib/rabbitmq/bin/rabbitmqctl: 1: /etc/rabbitmq/rabbitmq-env.conf:
> ocr-proc-2=rabbit at localhost: not found
> ~$ tail -n 100 /var/log/rabbitmq/shutdown_log
> Stopping and halting node 'rabbit at ocr-proc-2' ...
> ...done.
> ~$ tail -n 100 /var/log/rabbitmq/startup_err
> /usr/lib/rabbitmq/bin/rabbitmq-server: 1: /etc/rabbitmq/rabbitmq-env.conf:
> ocr-proc-2=rabbit at localhost: not found
> Killed
> ~$ tail -n 100 /var/log/rabbitmq/startup_log
>
> RabbitMQ 3.2.4. Copyright (C) 2007-2013 GoPivotal, Inc.
> ## ## Licensed under the MPL. See http://www.rabbitmq.com/
> ## ##
> ########## Logs: /var/log/rabbitmq/rabbit at ocr-proc-2.log
> ###### ## /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
> ##########
> Starting broker... completed with 6 plugins.
>
>
>
>
> Michael Sander
> mes65 at cornell.edu
> 607-227-9859
>
>
> On Tue, Apr 8, 2014 at 3:33 AM, Michael Sander <mes65 at cornell.edu> wrote:
>
>> Hi Matthais,
>>
>> What I sent you was everything I had. However, I did check ps -aux after
>> the crash and rabbitmq-server was definitely not in there. I will turn off
>> the cron jobs that automatically restart rabbitmq, and I'll let you know if
>> I see it again.
>>
>> Here is the output of the command.
>>
>> $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>> /var/lib/rabbitmq/mnes
>> ia/").'
>> "Filesystem 1024-blocks
>> Used Available Capacity Mounted
>> on\n/dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1 10320184
>> 6348300 3447648 65% /\n"
>> ...done.
>>
>>
>> Also, I'm not sure whether it will help you, but attached is a screenshot
>> of the rabbitmq console. If you see the start of the top chart at 19:00,
>> there is a sharp increase in the queued messages. That's when I restarted
>> rabbitmq after the crash. Everything before that was flat. Another point
>> to note is that it currently says that the disk space is unavailable. I
>> definitely remember seeing a value there at some point before, I don't know
>> what causes that to occur.
>>
>> I've turned off my rabbimq auto-start cron jobs, I'll let you know if I
>> see the crash again.
>>
>> Thanks again.
>>
>> Best,
>>
>> Michael Sander
>>
>>
>>
>> On Tue, Apr 8, 2014 at 1:18 AM, Matthias Radestock <matthias at rabbitmq.com
>> > wrote:
>>
>>> Michael,
>>>
>>>
>>> On 08/04/14 02:50, Michael Sander wrote:
>>>
>>>> Full logs are attached. You'll notice that it crashes pretty often now.
>>>>
>>>
>>> The disk_monitor is crashing frequently, yes, but in none of the
>>> instances in the logs that actually took down rabbit (notice that there are
>>> no rabbit starts recorded in the rabbit.log); the disk_monitor restarts
>>> just fine and the bunny lives on.
>>>
>>> Do you have the logs covering the time period around the crash?
>>>
>>>
>>> Here are the output of the commands
>>>>
>>>> $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>>> /var/lib/rabbitmq/mnesia/")'
>>>> Error: syntax error before:
>>>>
>>>
>>> Ah, sorry, missed a full stop. Should be
>>>
>>> sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>> /var/lib/rabbitmq/mnesia/").'
>>>
>>>
>>> Matthias.
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140408/ddef1524/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screenshot2.png
Type: image/png
Size: 219630 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140408/ddef1524/attachment.png>
More information about the rabbitmq-discuss
mailing list