[rabbitmq-discuss] Problem with Clustering

Matthew Sackman matthew at lshift.net
Thu Jan 7 10:38:30 GMT 2010


Hi David,

You're using the latest default code, yes?

We've just changed the behaviour in this case - previously, yes, the
queue would be recreated. This is incorrect because it then causes many
problems when the failed node comes back up - you end up with the same
queue on both nodes with different concepts of what should be in the
queue. Exciting if undesireable things happen in this case.

Thus if a queue is declared but it's found that the queue does already
exist but is on a downed node, we return a 404, because it's really
saying "the node on which this queue exists can't be found".

>From your code, I see you're declaring the queue durable. This really
reenforces the issue because if its durable, then persistent messages
shouldn't be lost, and yet if you want to be able to recreate the queue
on the other node, then it'll start empty, at which point, logically,
the contents of the queue have been lost.

In general, clustering should not be used for HA purposes. If you wish
to achive HA, then active/passive HA can be achieved by using shared
disk storage, heartbeat/pacemaker, maybe a tcp load balancer on the
front, and make sure you set the node names to localhost, and point both
rabbit instances at the same mnesia dir on the shared storage. When the
passive node comes up, it will recover everything from the storage.

Matthew




More information about the rabbitmq-discuss mailing list