[rabbitmq-discuss] Rabbit MQ Crash
Michael Sander
mes65 at cornell.edu
Wed Apr 9 09:15:29 BST 2014
Anything I can do to move the ball forward here? This keeps on happening.
Michael Sander
mes65 at cornell.edu
607-227-9859
On Tue, Apr 8, 2014 at 3:49 AM, Michael Sander <mes65 at cornell.edu> wrote:
> Unfortunately, I had to restart rabbitmq because it's on a production
> machine. Attached is screenshot after the restart. Notice that this time,
> free disk space limit is available.
>
> Please let me know what additional information you would like. If it goes
> down again, I will collect it.
>
> Michael Sander
>
>
>
> On Tue, Apr 8, 2014 at 3:42 AM, Michael Sander <mes65 at cornell.edu> wrote:
>
>> Hi Matthias,
>>
>> I just checked rabbit the second after sending that, and it appears to
>> have crashed. Here is some output, that you may find useful. Notice that
>> erlang appears to be alive even though rabbitmq is not.
>>
>> ~$ ps aux|grep rabbit
>> rabbitmq 1761 0.0 0.0 10836 160 ? S Apr07 0:01
>> /usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
>> 1001 26076 0.0 0.0 6308 600 pts/1 S+ 07:34 0:00 grep
>> rabbit
>> ~$ ps aux|grep erlan
>> rabbitmq 1761 0.0 0.0 10836 160 ? S Apr07 0:01
>> /usr/lib/erlang/erts-5.9.1/bin/epmd -daemon
>> 1001 26078 0.0 0.0 6308 600 pts/1 S+ 07:34 0:00 grep
>> erlan
>> ~$ df -h
>> Filesystem Size Used Avail
>> Use% Mounted on
>> rootfs 9.9G 6.1G 3.3G
>> 65% /
>> udev 10M 0 10M
>> 0% /dev
>> tmpfs 181M 128K 181M
>> 1% /run
>> /dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1 9.9G 6.1G 3.3G
>> 65% /
>> tmpfs 5.0M 0 5.0M
>> 0% /run/lock
>> tmpfs 362M 0 362M
>> 0% /run/shm
>> ~$ sudo df -h
>> Filesystem Size Used Avail
>> Use% Mounted on
>> rootfs 9.9G 6.1G 3.3G
>> 65% /
>> udev 10M 0 10M
>> 0% /dev
>> tmpfs 181M 128K 181M
>> 1% /run
>> /dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1 9.9G 6.1G 3.3G
>> 65% /
>> tmpfs 5.0M 0 5.0M
>> 0% /run/lock
>> tmpfs 362M 0 362M
>> 0% /run/shm
>> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
>> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log.1
>> crasher:
>> initial call: rabbit_disk_monitor:init/1
>> pid: <0.19499.0>
>> registered_name: []
>> exception exit: unsupported_platform
>> in function gen_server:init_it/6 (gen_server.erl, line 320)
>> ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
>> messages: []
>> links: [<0.180.0>]
>> dictionary: []
>> trap_exit: false
>> status: running
>> heap_size: 6765
>> stack_size: 24
>> reductions: 13592
>> neighbours:
>>
>> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>> Supervisor: {local,rabbit_disk_monitor_sup}
>> Context: start_error
>> Reason: unsupported_platform
>> Offender: [{pid,{restarting,<0.5000.0>}},
>> {name,rabbit_disk_monitor},
>> {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>> {restart_type,transient},
>> {shutdown,4294967295},
>> {child_type,worker}]
>>
>>
>> =CRASH REPORT==== 8-Apr-2014::00:38:08 ===
>> crasher:
>> initial call: rabbit_disk_monitor:init/1
>> pid: <0.19502.0>
>> registered_name: []
>> exception exit: unsupported_platform
>> in function gen_server:init_it/6 (gen_server.erl, line 320)
>> ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
>> messages: []
>> links: [<0.180.0>]
>> dictionary: []
>> trap_exit: false
>> status: running
>> heap_size: 6765
>> stack_size: 24
>> reductions: 13592
>> neighbours:
>>
>> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>> Supervisor: {local,rabbit_disk_monitor_sup}
>> Context: start_error
>> Reason: unsupported_platform
>> Offender: [{pid,{restarting,<0.5000.0>}},
>> {name,rabbit_disk_monitor},
>> {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>> {restart_type,transient},
>> {shutdown,4294967295},
>> {child_type,worker}]
>>
>>
>> =CRASH REPORT==== 8-Apr-2014::00:38:08 ===
>> crasher:
>> initial call: rabbit_disk_monitor:init/1
>> pid: <0.19505.0>
>> registered_name: []
>> exception exit: unsupported_platform
>> in function gen_server:init_it/6 (gen_server.erl, line 320)
>> ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.157.0>]
>> messages: []
>> links: [<0.180.0>]
>> dictionary: []
>> trap_exit: false
>> status: running
>> heap_size: 6765
>> stack_size: 24
>> reductions: 13592
>> neighbours:
>>
>> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>> Supervisor: {local,rabbit_disk_monitor_sup}
>> Context: start_error
>> Reason: unsupported_platform
>> Offender: [{pid,{restarting,<0.5000.0>}},
>> {name,rabbit_disk_monitor},
>> {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>> {restart_type,transient},
>> {shutdown,4294967295},
>> {child_type,worker}]
>>
>>
>> =SUPERVISOR REPORT==== 8-Apr-2014::00:38:08 ===
>> Supervisor: {local,rabbit_disk_monitor_sup}
>> Context: shutdown
>> Reason: reached_max_restart_intensity
>> Offender: [{pid,{restarting,<0.5000.0>}},
>> {name,rabbit_disk_monitor},
>> {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
>> {restart_type,transient},
>> {shutdown,4294967295},
>> {child_type,worker}]
>> ~$ tail -n 100 /var/log/rabbitmq/rabbit at ocr-proc-2.log
>> =WARNING REPORT==== 8-Apr-2014::07:29:47 ===
>> closing AMQP connection <0.20361.1> (127.0.0.1:48568 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =WARNING REPORT==== 8-Apr-2014::07:29:47 ===
>> closing AMQP connection <0.20392.1> (127.0.0.1:48586 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =WARNING REPORT==== 8-Apr-2014::07:29:49 ===
>> closing AMQP connection <0.20401.1> (127.0.0.1:48589 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =WARNING REPORT==== 8-Apr-2014::07:29:50 ===
>> closing AMQP connection <0.22633.1> (127.0.0.1:50329 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =WARNING REPORT==== 8-Apr-2014::07:29:51 ===
>> closing AMQP connection <0.16156.1> (127.0.0.1:44692 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
>> accepting AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672)
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:11 ===
>> closing AMQP connection <0.22608.1> (127.0.0.1:50316 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
>> accepting AMQP connection <0.22774.1> (127.0.0.1:50371 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:11 ===
>> accepting AMQP connection <0.22777.1> (127.0.0.1:50372 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
>> accepting AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
>> accepting AMQP connection <0.22805.1> (127.0.0.1:50384 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:26 ===
>> accepting AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:27 ===
>> accepting AMQP connection <0.22825.1> (127.0.0.1:50386 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:29 ===
>> accepting AMQP connection <0.22834.1> (127.0.0.1:50387 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:30 ===
>> accepting AMQP connection <0.22843.1> (127.0.0.1:50388 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:31 ===
>> accepting AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:34 ===
>> accepting AMQP connection <0.22863.1> (127.0.0.1:50394 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:34 ===
>> accepting AMQP connection <0.22866.1> (127.0.0.1:50395 -> 127.0.0.1:5672)
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:36 ===
>> closing AMQP connection <0.22852.1> (127.0.0.1:50389 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:36 ===
>> accepting AMQP connection <0.22883.1> (127.0.0.1:50399 -> 127.0.0.1:5672)
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:37 ===
>> closing AMQP connection <0.22761.1> (127.0.0.1:50370 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:38 ===
>> closing AMQP connection <0.22796.1> (127.0.0.1:50383 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:39 ===
>> accepting AMQP connection <0.22893.1> (127.0.0.1:50403 -> 127.0.0.1:5672)
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:39 ===
>> closing AMQP connection <0.22810.1> (127.0.0.1:50385 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:41 ===
>> accepting AMQP connection <0.22902.1> (127.0.0.1:50409 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:43 ===
>> accepting AMQP connection <0.22913.1> (127.0.0.1:50411 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:49 ===
>> accepting AMQP connection <0.22925.1> (127.0.0.1:50420 -> 127.0.0.1:5672)
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:49 ===
>> accepting AMQP connection <0.22928.1> (127.0.0.1:50421 -> 127.0.0.1:5672)
>>
>> =WARNING REPORT==== 8-Apr-2014::07:30:50 ===
>> closing AMQP connection <0.22660.1> (127.0.0.1:50332 -> 127.0.0.1:5672):
>> connection_closed_abruptly
>>
>> =INFO REPORT==== 8-Apr-2014::07:30:51 ===
>> accepting AMQP connection <0.22945.1> (127.0.0.1:50423 -> 127.0.0.1:5672)
>> ~$ tail -n 100 /var/log/rabbitmq/shutdown_err
>> /usr/lib/rabbitmq/bin/rabbitmqctl: 1: /etc/rabbitmq/rabbitmq-env.conf:
>> ocr-proc-2=rabbit at localhost: not found
>> ~$ tail -n 100 /var/log/rabbitmq/shutdown_log
>> Stopping and halting node 'rabbit at ocr-proc-2' ...
>> ...done.
>> ~$ tail -n 100 /var/log/rabbitmq/startup_err
>> /usr/lib/rabbitmq/bin/rabbitmq-server: 1:
>> /etc/rabbitmq/rabbitmq-env.conf: ocr-proc-2=rabbit at localhost: not found
>> Killed
>> ~$ tail -n 100 /var/log/rabbitmq/startup_log
>>
>> RabbitMQ 3.2.4. Copyright (C) 2007-2013 GoPivotal, Inc.
>> ## ## Licensed under the MPL. See http://www.rabbitmq.com/
>> ## ##
>> ########## Logs: /var/log/rabbitmq/rabbit at ocr-proc-2.log
>> ###### ## /var/log/rabbitmq/rabbit at ocr-proc-2-sasl.log
>> ##########
>> Starting broker... completed with 6 plugins.
>>
>>
>>
>>
>> Michael Sander
>> mes65 at cornell.edu
>> 607-227-9859
>>
>>
>> On Tue, Apr 8, 2014 at 3:33 AM, Michael Sander <mes65 at cornell.edu> wrote:
>>
>>> Hi Matthais,
>>>
>>> What I sent you was everything I had. However, I did check ps -aux after
>>> the crash and rabbitmq-server was definitely not in there. I will turn off
>>> the cron jobs that automatically restart rabbitmq, and I'll let you know if
>>> I see it again.
>>>
>>> Here is the output of the command.
>>>
>>> $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>> /var/lib/rabbitmq/mnes
>>> ia/").'
>>> "Filesystem 1024-blocks
>>> Used Available Capacity Mounted
>>> on\n/dev/disk/by-uuid/36fd30d4-ea87-419f-a6a4-a1a3cf290ff1 10320184
>>> 6348300 3447648 65% /\n"
>>> ...done.
>>>
>>>
>>> Also, I'm not sure whether it will help you, but attached is a
>>> screenshot of the rabbitmq console. If you see the start of the top chart
>>> at 19:00, there is a sharp increase in the queued messages. That's when I
>>> restarted rabbitmq after the crash. Everything before that was flat.
>>> Another point to note is that it currently says that the disk space is
>>> unavailable. I definitely remember seeing a value there at some point
>>> before, I don't know what causes that to occur.
>>>
>>> I've turned off my rabbimq auto-start cron jobs, I'll let you know if I
>>> see the crash again.
>>>
>>> Thanks again.
>>>
>>> Best,
>>>
>>> Michael Sander
>>>
>>>
>>>
>>> On Tue, Apr 8, 2014 at 1:18 AM, Matthias Radestock <
>>> matthias at rabbitmq.com> wrote:
>>>
>>>> Michael,
>>>>
>>>>
>>>> On 08/04/14 02:50, Michael Sander wrote:
>>>>
>>>>> Full logs are attached. You'll notice that it crashes pretty often
>>>>> now.
>>>>>
>>>>
>>>> The disk_monitor is crashing frequently, yes, but in none of the
>>>> instances in the logs that actually took down rabbit (notice that there are
>>>> no rabbit starts recorded in the rabbit.log); the disk_monitor restarts
>>>> just fine and the bunny lives on.
>>>>
>>>> Do you have the logs covering the time period around the crash?
>>>>
>>>>
>>>> Here are the output of the commands
>>>>>
>>>>> $ sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>>>> /var/lib/rabbitmq/mnesia/")'
>>>>> Error: syntax error before:
>>>>>
>>>>
>>>> Ah, sorry, missed a full stop. Should be
>>>>
>>>> sudo rabbitmqctl eval 'rabbit_misc:os_cmd("/bin/df -kP
>>>> /var/lib/rabbitmq/mnesia/").'
>>>>
>>>>
>>>> Matthias.
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140409/5f7126a3/attachment.html>
More information about the rabbitmq-discuss
mailing list