[rabbitmq-discuss] rabbitmq fake-death && no breath. key word in log : mirrored_supervisor rabbit_mgmt_db
Simon MacMullen
simon at rabbitmq.com
Mon Oct 1 14:49:48 BST 2012
First of all: the fix for the bug referred to in previous emails was
released in RabbitMQ 2.8.7. So you should consider an upgrade.
Secondly: it looks like your nodes are seeing a netsplit. Is that the
case? You should be aware that RabbitMQ clusters don't handle netsplits
well.
Cheers, Simon
On 27/09/12 14:54, Chao Liu wrote:
> hello, Simon MacMullen.
> today, rabbitmq down again.
> the output of the sasl.log is the same with the last one. but there are
> something new in the .log.
> Could u help me to find out the problem ? thanks
>
> *disc node's log*
> *
> *
> =INFO REPORT==== 27-Sep-2012::19:59:12 ===
> rabbit on node rabbit at zw_124_156 down
>
> =WARNING REPORT==== 27-Sep-2012::19:59:13 ===
> Mnesia(rabbit at zw_124_177): ** WARNING ** Mnesia is overloaded: {dump_log,
>
> time_threshold}
>
> =INFO REPORT==== 27-Sep-2012::19:59:13 ===
> application: rabbitmq_management
> exited: shutdown
> type: temporary
>
> -------------------------------------------------------
> *ram node's log*
> *
> *
> =ERROR REPORT==== 27-Sep-2012::19:54:11 ===
> ** Node rabbit at zw_124_177 not responding **
> ** Removing (timedout) connection **
>
> =INFO REPORT==== 27-Sep-2012::19:54:11 ===
> rabbit on node rabbit at zw_124_177 down
>
> =INFO REPORT==== 27-Sep-2012::19:54:53 ===
> Statistics database started.
>
> =INFO REPORT==== 27-Sep-2012::19:59:13 ===
> global: Name conflict terminating {rabbit_mgmt_db,<7123.24214.2613>}
>
> =INFO REPORT==== 27-Sep-2012::19:59:13 ===
> application: rabbitmq_management
> exited: shutdown
> type: temporary
>
>
> thanks.
>
>
>
>
>
> 2012/9/21 Simon MacMullen <simon at rabbitmq.com <mailto:simon at rabbitmq.com>>
>
> Hi.
>
> This looks like a race with management DB failover that we've
> already fixed, but which has not yet made it into any release. This
> will be fixed in the next bugfix release. Until then if you are
> affected you can work around it by running the management plugin on
> only one node in the cluster (and just have
> rabbitmq_management_agent on the others).
>
> Cheers, Simon
>
> On 20/09/12 08:45, liubida wrote:
>
> hi all,
>
> dose anybody could point me just the direction for finding the
> reason
> for a fake-dead with the RabbitMQ.
> it seem the rabbitmq could not receive messages in some point,
> without
> any warning.
> i have constructed a cluster with 2 nodes, one disc and one ram.
> if i type the command "rabbitmqctl stop_app && rabbitmqctl
> start_app",
> the troubleshoot disappered.
>
>
> here is the disc node's sasl.log
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:13 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: child_terminated
> Reason: killed
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: start_error
> Reason: {already_started,<19704.13906.__2335>}
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: start_error
> Reason: {already_started,<19704.13906.__2335>}
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: start_error
> Reason: {already_started,<19704.13906.__2335>}
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: start_error
> Reason: {already_started,<19704.13906.__2335>}
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: start_error
> Reason: {already_started,<19704.13906.__2335>}
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: start_error
> Reason: {already_started,<19704.13906.__2335>}
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: start_error
> Reason: {already_started,<19704.13906.__2335>}
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: start_error
> Reason: {already_started,<19704.13906.__2335>}
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: start_error
> Reason: {already_started,<19704.13906.__2335>}
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: start_error
> Reason: {already_started,<19704.13906.__2335>}
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {<0.28802.1777>,mirrored___supervisor}
> Context: shutdown
> Reason: reached_max_restart_intensity
> Offender: [{pid,<0.28804.1777>},
> {name,rabbit_mgmt_db},
> {mfa,{rabbit_mgmt_db,start___link,[]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {local,rabbit_mgmt_sup}
> Context: child_terminated
> Reason: shutdown
> Offender: [{pid,<0.28803.1777>},
> {name,mirroring},
> {mfa,
> {mirrored_supervisor,start___internal,
> [rabbit_mgmt_sup,
> [{rabbit_mgmt_db,
> {rabbit_mgmt_db,start_link,[]}__,
> permanent,4294967295,worker,
> [rabbit_mgmt_db]}]]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===
> Supervisor: {local,rabbit_mgmt_sup}
> Context: shutdown
> Reason: reached_max_restart_intensity
> Offender: [{pid,<0.28803.1777>},
> {name,mirroring},
> {mfa,
> {mirrored_supervisor,start___internal,
> [rabbit_mgmt_sup,
> [{rabbit_mgmt_db,
> {rabbit_mgmt_db,start_link,[]}__,
> permanent,4294967295,worker,
> [rabbit_mgmt_db]}]]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> the ram node's sasl.log
> =SUPERVISOR REPORT==== 18-Sep-2012::17:24:50 ===
> Supervisor: {local,rabbit_mgmt_sup}
> Context: child_terminated
> Reason: shutdown
> Offender: [{pid,<0.14161.2335>},
> {name,mirroring},
> {mfa,
> {mirrored_supervisor,start___internal,
> [rabbit_mgmt_sup,
> [{rabbit_mgmt_db,
> {rabbit_mgmt_db,start_link,[]}__,
> permanent,4294967295,worker,
> [rabbit_mgmt_db]}]]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
>
> =SUPERVISOR REPORT==== 18-Sep-2012::17:24:50 ===
> Supervisor: {local,rabbit_mgmt_sup}
> Context: shutdown
> Reason: reached_max_restart_intensity
> Offender: [{pid,<0.14161.2335>},
> {name,mirroring},
> {mfa,
> {mirrored_supervisor,start___internal,
> [rabbit_mgmt_sup,
> [{rabbit_mgmt_db,
> {rabbit_mgmt_db,start_link,[]}__,
> permanent,4294967295,worker,
> [rabbit_mgmt_db]}]]}},
> {restart_type,permanent},
> {shutdown,4294967295},
> {child_type,worker}]
>
> Thanks for any help.
>
> --bidaliu
>
>
>
> _________________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.__rabbitmq.com
> <mailto:rabbitmq-discuss at lists.rabbitmq.com>
> https://lists.rabbitmq.com/__cgi-bin/mailman/listinfo/__rabbitmq-discuss
> <https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
>
>
>
> --
> Simon MacMullen
> RabbitMQ, VMware
>
>
--
Simon MacMullen
RabbitMQ, VMware
More information about the rabbitmq-discuss
mailing list