<div dir="ltr"><div class="gmail_default" style="font-family:courier new,monospace">Sorry, reposting this since the original one was sent using unregistered email address<br clear="all"></div><br clear="all"><div>Johanes</div>
<br><br><div class="gmail_quote">---------- Forwarded message ----------<br>Date: 6 February 2014 09:17<br>Subject: RabbitMQ management down after cluster issue<br>To: <div class="gmail_default" style="font-family:courier new,monospace;display:inline">
</div><div class="gmail_default" style="font-family:courier new,monospace;display:inline"></div><a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br><br><br><div dir="ltr"><div style="font-family:courier new,monospace">
Hi all,<br><br></div><div style="font-family:courier new,monospace">We are having problem with our RabbitMQ cluster which we are not quite sure how to debug the root cause. We have 2 nodes in a cluster and one of the node (MQ1) had its management console down after clustering issue/network partition. (I assume this is what happen because we ping the management console from zabbix every minute)<br>
<br>Last month we experience network partition and the same node's management plugin was down as well. Since then that node had been reinstalled with the same RabbitMQ version and we use "auto-heal" policy when a network partition happen.<br>
<br></div><div style="font-family:courier new,monospace">Looking at both nodes' log here is some critical information i can gather<br></div><div style="font-family:courier new,monospace">
<br></div><div style="font-family:courier new,monospace">MQ1<br>=INFO REPORT==== 5-Feb-2014::11:14:49 ===<br>rabbit on node rabbit@mq2 down<br>=INFO REPORT==== 5-Feb-2014::11:14:51 ===<br>Statistics database started.<br>
=INFO REPORT==== 5-Feb-2014::11:14:51 ===<br>Mirrored-queue (queue 'email.out.5' in vhost '/'): Slave <rabbit@mq1.3.10297.0> saw deaths of mirrors <rabbit@mq2.2.<br>289.0><br>.......<br>=ERROR REPORT==== 5-Feb-2014::11:14:52 ===<br>
Mnesia(rabbit@mq1): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@mq2}<br>=INFO REPORT==== 5-Feb-2014::11:14:52 ===<br>Autoheal request sent to rabbit@mq1<br>=INFO REPORT==== 5-Feb-2014::11:14:52 ===<br>
Autoheal request received from rabbit@mq1<br>=INFO REPORT==== 5-Feb-2014::11:14:52 ===<br>global: Name conflict terminating {rabbit_mgmt_db,<2705.453.0>}<br>=ERROR REPORT==== 5-Feb-2014::11:14:52 ===<br>** Generic server <0.10360.0> terminating<br>
** Last message in was {mnesia_locker,rabbit@mq2,granted}<br>** When Server state == {state,<0.10358.0>,<0.10359.0>,rabbit_mgmt_sup,<br> [{rabbit_mgmt_db,<br> {rabbit_mgmt_db,start_link,[]},<br>
permanent,4294967295,worker,<br> [rabbit_mgmt_db]}]}<br>** Reason for termination == <br>** {unexpected_info,{mnesia_locker,rabbit@mq2,granted}}<br>=INFO REPORT==== 5-Feb-2014::11:14:52 ===<br>
Autoheal decision<br> * Partitions: [[rabbit@mq1],[rabbit@mq2]]<br> * Winner: rabbit@mq1<br> * Losers: [rabbit@mq2]<br>=INFO REPORT==== 5-Feb-2014::11:14:52 ===<br>Autoheal: I am the winner, waiting for [rabbit@mq2] to stop<br>
=INFO REPORT==== 5-Feb-2014::11:14:53 ===<br>rabbit on node rabbit@mq2 down<br>=INFO REPORT==== 5-Feb-2014::11:14:58 ===<br>Autoheal: final node has stopped, starting...<br>=INFO REPORT==== 5-Feb-2014::11:16:14 ===<br>rabbit on node rabbit@mq2 up<br>
<br><br></div><div style="font-family:courier new,monospace">MQ2<br>=ERROR REPORT==== 5-Feb-2014::11:14:49 ===<br>** Node rabbit@mq1 not responding **<br>** Removing (timedout) connection **<br>=INFO REPORT==== 5-Feb-2014::11:14:49 ===<br>
rabbit on node rabbit@mq1 down<br>=INFO REPORT==== 5-Feb-2014::11:14:51 ===<br>Mirrored-queue (queue 'managedAmqpOutboundSms5' in vhost '/'): Master <rabbit@mq2.2.277.0> saw deaths of mirrors <rabbit@mq1.3.10289.0> <br>
...<br>=ERROR REPORT==== 5-Feb-2014::11:14:52 ===<br>Mnesia(rabbit@mq2): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@mq1}<br>...<br>=INFO REPORT==== 5-Feb-2014::11:14:52 ===<br>
Statistics database started.<br>=WARNING REPORT==== 5-Feb-2014::11:14:52 ===<br>Autoheal: we were selected to restart; winner is rabbit@mq1<br>=INFO REPORT==== 5-Feb-2014::11:14:52 ===<br>Stopping RabbitMQ<br>=INFO REPORT==== 5-Feb-2014::11:14:53 ===<br>
stopped TCP Listener on [::]:5672<br>=ERROR REPORT==== 5-Feb-2014::11:14:59 ===<br>Mnesia(rabbit@mq2): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@mq1}<br>=INFO REPORT==== 5-Feb-2014::11:15:13 ===<br>
Starting RabbitMQ 3.1.5 on Erlang R14B04<br>Copyright (C) 2007-2013 GoPivotal, Inc.<br>Licensed under the MPL. See <a href="http://www.rabbitmq.com/" target="_blank">http://www.rabbitmq.com/</a><br>...<br><br></div><div style="font-family:courier new,monospace">
<br></div><div style="font-family:courier new,monospace">Is there some obvious error I should be looking at from RabbitMQ log to find out what's happening? because the other log files do not seem to provide meaningful information.<br>
</div><div style="font-family:courier new,monospace"> <br><br>Here's our system setup (in case it may help)<br></div><div style="font-family:courier new,monospace">- 2 nodes cluster on Linode<br>
</div><div style="font-family:courier new,monospace">- Ubuntu 12.04.3 instance<br></div><div style="font-family:courier new,monospace">- RabbitMQ 3.1.5<br></div><div style="font-family:courier new,monospace">
- management plugin enabled<br></div><div style="font-family:courier new,monospace">- both nodes communicating on private LAN ip address<br></div><div style="font-family:courier new,monospace">
- both nodes is used by our app servers using spring-amqp to communicate (might be unrelated information)<br></div><div style="font-family:courier new,monospace"><br><br><br></div><div style="font-family:courier new,monospace">
Any hints or help to debug the issue will be appreciated. <br><br><br>Thanks<span class="HOEnZb"><font color="#888888"><br><br clear="all"></font></span></div><span class="HOEnZb"><font color="#888888"><div>Johanes</div>
</font></span></div>
</div><br></div>