[rabbitmq-discuss] RabbitMQ Cluster - Please Help !

ran mizrachi ranmizrachi at gmail.com
Tue Dec 27 14:20:36 GMT 2011


My two nodes cluster in production are breaking with these error messages:

=ERROR REPORT==== 23-Dec-2011::04:21:34 ===
** Node rabbit at rabbitmq02 not responding **
** Removing (timedout) connection **

=INFO REPORT==== 23-Dec-2011::04:21:35 ===
node rabbit at rabbitmq02 lost 'rabbit'

=ERROR REPORT==== 23-Dec-2011::04:21:49 ===
Mnesia(rabbit at rabbitmq01): ** ERROR ** mnesia_event got
{inconsistent_database, running_partitioned_network, rabbit at rabbitmq02}


I tried to simulate the problem by killing the connection between the two
nodes using "tcpkill",
the cluster has disconnected,and surprisingly the two nodes are not trying
to reconnect !

When the cluster breaks, haproxy load balancer still marks both nodes as
active and send request to both of them,
although they are not in a cluster.

My Questions:

1. If the nodes are configured to work as a cluster, when I get a network
failure , why aren't they trying to reconnect after ?

2. How can I identify broken cluster and automatic shutdown one of the
nodes ?
(I have consistency problems when working with the two nodes separately)


Urgent, please help !
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20111227/3e5bc5ee/attachment.htm>


More information about the rabbitmq-discuss mailing list