[rabbitmq-discuss] RabbitMQ Cluster - Please Help !
ran mizrachi
ranmizrachi at gmail.com
Tue Dec 27 14:20:36 GMT 2011
My two nodes cluster in production are breaking with these error messages:
=ERROR REPORT==== 23-Dec-2011::04:21:34 ===
** Node rabbit at rabbitmq02 not responding **
** Removing (timedout) connection **
=INFO REPORT==== 23-Dec-2011::04:21:35 ===
node rabbit at rabbitmq02 lost 'rabbit'
=ERROR REPORT==== 23-Dec-2011::04:21:49 ===
Mnesia(rabbit at rabbitmq01): ** ERROR ** mnesia_event got
{inconsistent_database, running_partitioned_network, rabbit at rabbitmq02}
I tried to simulate the problem by killing the connection between the two
nodes using "tcpkill",
the cluster has disconnected,and surprisingly the two nodes are not trying
to reconnect !
When the cluster breaks, haproxy load balancer still marks both nodes as
active and send request to both of them,
although they are not in a cluster.
My Questions:
1. If the nodes are configured to work as a cluster, when I get a network
failure , why aren't they trying to reconnect after ?
2. How can I identify broken cluster and automatic shutdown one of the
nodes ?
(I have consistency problems when working with the two nodes separately)
Urgent, please help !
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20111227/3e5bc5ee/attachment.htm>
More information about the rabbitmq-discuss
mailing list