[rabbitmq-discuss] Strange start error issue

Samuel Chen samuel.net at gmail.com
Wed Sep 19 17:10:12 BST 2012


Hi all

I'm new to RabbitMQ. We are facing a strange issue against rabbit_mgmt_db.
I could not find any similar issues by searching on google or
stackoverflow. So I wonder if anyone could help to diagnose this problem.
Thanks in advance.

We have a 2-node clustered RabbitMQ integrated with Celery. It worked well
for several months.
The issue occurred the first time on Jul 4th. After restarted, it worked
for about 2 months. Yesterday the issue occurred twice (one is after
restarted).
The stat was that a child (rabbit_mgmt_db??) was killed automatically.  By
some failures of restarting automatically, it reached the max restart
intensity. Eventually it was shutdown.
(Anther situation is that we deployed to 2-node cluster from 1 node server
at the end of June. Note sure if it caused this issue.)

The hosts are virtual servers with 8/12G ram and 30G disk.
One node is disc node and the other is ram.
The load balance was very low (around 100M ram, few tasks) . Disk has 2.5G
free space.
Log as below.

Thanks for any help.

SUPERVISOR REPORT==== 18-Sep-2012::14:53:13 ===
>
>      Supervisor: {<0.28802.1777>,mirrored_supervisor}
>
>      Context:    child_terminated
>
>      Reason:     killed
>
>      Offender:   [{pid,<0.28804.1777>},
>
>                   {name,rabbit_mgmt_db},
>
>                   {mfa,{rabbit_mgmt_db,start_link,[]}},
>
>                   {restart_type,permanent},
>
>                   {shutdown,4294967295},
>
>                   {child_type,worker}]
>
>
>>
>> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>
>      Supervisor: {<0.28802.1777>,mirrored_supervisor}
>
>      Context:    start_error
>
>      Reason:     {already_started,<19704.13906.2335>}
>
>      Offender:   [{pid,<0.28804.1777>},
>
>                   {name,rabbit_mgmt_db},
>
>                   {mfa,{rabbit_mgmt_db,start_link,[]}},
>
>                   {restart_type,permanent},
>
>                   {shutdown,4294967295},
>
>                   {child_type,worker}]
>
>
>>
>> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>
>      Supervisor: {<0.28802.1777>,mirrored_supervisor}
>
>      Context:    start_error
>
>      Reason:     {already_started,<19704.13906.2335>}
>
>      Offender:   [{pid,<0.28804.1777>},
>
>                   {name,rabbit_mgmt_db},
>
>                   {mfa,{rabbit_mgmt_db,start_link,[]}},
>
>                   {restart_type,permanent},
>
>                   {shutdown,4294967295},
>
>                   {child_type,worker}]
>
>
>>
>> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>
>      Supervisor: {<0.28802.1777>,mirrored_supervisor}
>
>      Context:    start_error
>
>      Reason:     {already_started,<19704.13906.2335>}
>
>      Offender:   [{pid,<0.28804.1777>},
>
>                   {name,rabbit_mgmt_db},
>
>                   {mfa,{rabbit_mgmt_db,start_link,[]}},
>
>                   {restart_type,permanent},
>
>                   {shutdown,4294967295},
>
>                   {child_type,worker}]
>
>
>>
>> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>
>      Supervisor: {<0.28802.1777>,mirrored_supervisor}
>
>      Context:    start_error
>
>      Reason:     {already_started,<19704.13906.2335>}
>
>      Offender:   [{pid,<0.28804.1777>},
>
>                   {name,rabbit_mgmt_db},
>
>                   {mfa,{rabbit_mgmt_db,start_link,[]}},
>
>                   {restart_type,permanent},
>
>                   {shutdown,4294967295},
>
>                   {child_type,worker}]
>
>
>>
>> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>
>      Supervisor: {<0.28802.1777>,mirrored_supervisor}
>
>      Context:    start_error
>
>      Reason:     {already_started,<19704.13906.2335>}
>
>      Offender:   [{pid,<0.28804.1777>},
>
>                   {name,rabbit_mgmt_db},
>
>                   {mfa,{rabbit_mgmt_db,start_link,[]}},
>
>                   {restart_type,permanent},
>
>                   {shutdown,4294967295},
>
>                   {child_type,worker}]
>
>
>>
>> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>
>      Supervisor: {local,rabbit_mgmt_sup}
>
>      Context:    shutdown
>
>      Reason:     reached_max_restart_intensity
>
>      Offender:   [{pid,<0.28803.1777>},
>
>                   {name,mirroring},
>
>                   {mfa,
>
>                       {mirrored_supervisor,start_internal,
>
>                           [rabbit_mgmt_sup,
>
>                            [{rabbit_mgmt_db,
>
>                                 {rabbit_mgmt_db,start_link,[]},
>
>                                 permanent,4294967295,worker,
>
>                                 [rabbit_mgmt_db]}]]}},
>
>                   {restart_type,permanent},
>
>                   {shutdown,4294967295},
>
>                   {child_type,worker}]
>
>

- Sam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120920/e2b15b1c/attachment.htm>


More information about the rabbitmq-discuss mailing list