[rabbitmq-discuss] rabbitmq fake-death && no breath. key word in log : mirrored_supervisor rabbit_mgmt_db

Simon MacMullen simon at rabbitmq.com
Mon Oct 1 14:49:48 BST 2012


First of all: the fix for the bug referred to in previous emails was 
released in RabbitMQ 2.8.7. So you should consider an upgrade.

Secondly: it looks like your nodes are seeing a netsplit. Is that the 
case? You should be aware that RabbitMQ clusters don't handle netsplits 
well.

Cheers, Simon

On 27/09/12 14:54, Chao Liu wrote:
> hello,  Simon MacMullen.
> today, rabbitmq down again.
> the output of the sasl.log is the same with the last one. but there are
> something new in the .log.
> Could u help me to find out the problem ?  thanks
>
> *disc node's log*
> *
> *
> =INFO REPORT==== 27-Sep-2012::19:59:12 ===
> rabbit on node rabbit at zw_124_156 down
>
> =WARNING REPORT==== 27-Sep-2012::19:59:13 ===
> Mnesia(rabbit at zw_124_177): ** WARNING ** Mnesia is overloaded: {dump_log,
>
> time_threshold}
>
> =INFO REPORT==== 27-Sep-2012::19:59:13 ===
>      application: rabbitmq_management
>      exited: shutdown
>      type: temporary
>
> -------------------------------------------------------
> *ram node's log*
> *
> *
> =ERROR REPORT==== 27-Sep-2012::19:54:11 ===
> ** Node rabbit at zw_124_177 not responding **
> ** Removing (timedout) connection **
>
> =INFO REPORT==== 27-Sep-2012::19:54:11 ===
> rabbit on node rabbit at zw_124_177 down
>
> =INFO REPORT==== 27-Sep-2012::19:54:53 ===
> Statistics database started.
>
> =INFO REPORT==== 27-Sep-2012::19:59:13 ===
> global: Name conflict terminating {rabbit_mgmt_db,<7123.24214.2613>}
>
> =INFO REPORT==== 27-Sep-2012::19:59:13 ===
>      application: rabbitmq_management
>      exited: shutdown
>      type: temporary
>
>
> thanks.
>
>
>
>
>
> 2012/9/21 Simon MacMullen <simon at rabbitmq.com <mailto:simon at rabbitmq.com>>
>
>     Hi.
>
>     This looks like a race with management DB failover that we've
>     already fixed, but which has not yet made it into any release. This
>     will be fixed in the next bugfix release. Until then if you are
>     affected you can work around it by running the management plugin on
>     only one node in the cluster (and just have
>     rabbitmq_management_agent on the others).
>
>     Cheers, Simon
>
>     On 20/09/12 08:45, liubida wrote:
>
>         hi all,
>
>         dose anybody could point me just the direction for finding the
>         reason
>         for a fake-dead with the RabbitMQ.
>         it seem the rabbitmq could not receive messages in some point,
>         without
>         any warning.
>         i have constructed a cluster with 2 nodes, one disc and one ram.
>         if i type the command "rabbitmqctl stop_app && rabbitmqctl
>         start_app",
>         the troubleshoot disappered.
>
>
>         here is the disc node's sasl.log
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:13 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: child_terminated
>         Reason: killed
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: start_error
>         Reason: {already_started,<19704.13906.__2335>}
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: start_error
>         Reason: {already_started,<19704.13906.__2335>}
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: start_error
>         Reason: {already_started,<19704.13906.__2335>}
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: start_error
>         Reason: {already_started,<19704.13906.__2335>}
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: start_error
>         Reason: {already_started,<19704.13906.__2335>}
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: start_error
>         Reason: {already_started,<19704.13906.__2335>}
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: start_error
>         Reason: {already_started,<19704.13906.__2335>}
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: start_error
>         Reason: {already_started,<19704.13906.__2335>}
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: start_error
>         Reason: {already_started,<19704.13906.__2335>}
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: start_error
>         Reason: {already_started,<19704.13906.__2335>}
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {<0.28802.1777>,mirrored___supervisor}
>         Context: shutdown
>         Reason: reached_max_restart_intensity
>         Offender: [{pid,<0.28804.1777>},
>         {name,rabbit_mgmt_db},
>         {mfa,{rabbit_mgmt_db,start___link,[]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {local,rabbit_mgmt_sup}
>         Context: child_terminated
>         Reason: shutdown
>         Offender: [{pid,<0.28803.1777>},
>         {name,mirroring},
>         {mfa,
>         {mirrored_supervisor,start___internal,
>         [rabbit_mgmt_sup,
>         [{rabbit_mgmt_db,
>         {rabbit_mgmt_db,start_link,[]}__,
>         permanent,4294967295,worker,
>         [rabbit_mgmt_db]}]]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
>         Supervisor: {local,rabbit_mgmt_sup}
>         Context: shutdown
>         Reason: reached_max_restart_intensity
>         Offender: [{pid,<0.28803.1777>},
>         {name,mirroring},
>         {mfa,
>         {mirrored_supervisor,start___internal,
>         [rabbit_mgmt_sup,
>         [{rabbit_mgmt_db,
>         {rabbit_mgmt_db,start_link,[]}__,
>         permanent,4294967295,worker,
>         [rabbit_mgmt_db]}]]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         the ram node's sasl.log
>         =SUPERVISOR REPORT==== 18-Sep-2012::17:24:50 ===
>         Supervisor: {local,rabbit_mgmt_sup}
>         Context: child_terminated
>         Reason: shutdown
>         Offender: [{pid,<0.14161.2335>},
>         {name,mirroring},
>         {mfa,
>         {mirrored_supervisor,start___internal,
>         [rabbit_mgmt_sup,
>         [{rabbit_mgmt_db,
>         {rabbit_mgmt_db,start_link,[]}__,
>         permanent,4294967295,worker,
>         [rabbit_mgmt_db]}]]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>
>         =SUPERVISOR REPORT==== 18-Sep-2012::17:24:50 ===
>         Supervisor: {local,rabbit_mgmt_sup}
>         Context: shutdown
>         Reason: reached_max_restart_intensity
>         Offender: [{pid,<0.14161.2335>},
>         {name,mirroring},
>         {mfa,
>         {mirrored_supervisor,start___internal,
>         [rabbit_mgmt_sup,
>         [{rabbit_mgmt_db,
>         {rabbit_mgmt_db,start_link,[]}__,
>         permanent,4294967295,worker,
>         [rabbit_mgmt_db]}]]}},
>         {restart_type,permanent},
>         {shutdown,4294967295},
>         {child_type,worker}]
>
>         Thanks for any help.
>
>         --bidaliu
>
>
>
>         _________________________________________________
>         rabbitmq-discuss mailing list
>         rabbitmq-discuss at lists.__rabbitmq.com
>         <mailto:rabbitmq-discuss at lists.rabbitmq.com>
>         https://lists.rabbitmq.com/__cgi-bin/mailman/listinfo/__rabbitmq-discuss
>         <https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
>
>
>
>     --
>     Simon MacMullen
>     RabbitMQ, VMware
>
>


-- 
Simon MacMullen
RabbitMQ, VMware


More information about the rabbitmq-discuss mailing list