[rabbitmq-discuss] 3.0.4 - Losing messages in a HA cluster

Emile Joubert emile at rabbitmq.com
Tue Apr 30 11:30:13 BST 2013


Hi Jason,

On 26/04/13 15:38, Jason McIntosh wrote:
> So to confirm my understanding here.  Message 1 was synched to both
> queues before I stopped the disc node.  So when I shut down the disc
> node, the ram node become the "master".  As such, it should have had
> message 1 and 2.  When I brought up the disc node it is a slave and had
> it's messages essentially reset.  Per the docs: "As such, when a slave
> rejoins a mirrored-queue, it throws away any durable local contents it
> already has and starts empty.".So when I now publish message 2, that
> goes to the ram node, which is now the master, and we have 2 messages
> total (disc node still off).  

That all sounds correct, as per your first message.

> The question becomes then - is there any way to recover the messages the
> RAM node had if the disc node comes back and the ram node subsequently
> failed?

Nodes joining the cluster have their queue contents reset. These
messages cannot be recovered.

To avoid message loss you will need to make sure that there is a
synchronised slave. Either add another slave or make sure at least one
of the existing slaves are synchronised. Slaves can be synchronised by
waiting for the queue contents to be replaced by the actions of
producers and consumers, or a future version of the broker will allow
slaves to be synchronised manually.

> Do I need to remove the HA policy, then bring up the RAM node to get
> the RAM node messages (since it was master prior to it's restart)?

No. See http://www.rabbitmq.com/ha.html#start-stop

> if I had 3 nodes in a cluster, and 1 node went down, but
> the other 2 nodes each got an even distribution of the queues

If you had 3 nodes and all queues were mirrored to all nodes then you
could shut down two of those nodes without losing messages, provided all
slaves were synchronised to start with.

> I've started testing on the new 3.1 nightly
This version allows slave nodes to be synchronised manually.




-Emile







More information about the rabbitmq-discuss mailing list