[rabbitmq-discuss] 3.0.4 - Losing messages in a HA cluster

Fri Apr 26 15:38:17 BST 2013

So to confirm my understanding here.  Message 1 was synched to both queues
before I stopped the disc node.  So when I shut down the disc node, the ram
node become the "master".  As such, it should have had message 1 and 2.
 When I brought up the disc node it is a slave and had it's messages
essentially reset.  Per the docs: "As such, when a slave rejoins a
mirrored-queue, it throws away any durable local contents it already has
and starts empty.".  So when I now publish message 2, that goes to the ram
node, which is now the master, and we have 2 messages total (disc node
still off).

The question becomes then - is there any way to recover the messages the
RAM node had if the disc node comes back and the ram node subsequently
failed?  Granted this is an unlikely situation, but I'm trying to plan for
recover scenarios here :)  Do I need to remove the HA policy, then bring up
the RAM node to get the RAM node messages (since it was master prior to
it's restart)?  Since I can't shut off the disc node in this case, just
trying to figure out how to recover.  I can see taking this a step further
- if I had 3 nodes in a cluster, and 1 node went down, but the other 2
nodes each got an even distribution of the queues... hrmm, would make it
somewhat interesting to recover.  I'd almost have to shutdown yet one more
node, start up the original node, make sure that messages get consumed,
then start the third node it sounds like.

I've started testing on the new 3.1 nightly  - I'm hoping it makes this a
lot cleaner!
Thanks!
Jason

On Fri, Apr 26, 2013 at 3:44 AM, Emile Joubert <emile at rabbitmq.com> wrote:

>
>
> Hi Jason,
>
> On 25/04/13 21:09, Jason McIntosh wrote:
> > I have 2 nodes in an rabbit cluster, one disc, one ram node.  I'm seeing
> > messages get lost
>
> The explanation can be found here:
> http://www.rabbitmq.com/ha.html#behaviour
>
>  "should there be no slave that is synchronised with the master,
>   messages that only the master held will be lost."
>
> Messages can get lost if you recycle nodes in the cluster faster than it
> takes for the queue contents to be replaced entirely with new messages.
> The solution is to wait for slaves to become synchronised before
> restarting each node.
>
> The next version of RabbitMQ will feature manual eager synchronisation
> which will allow slaves to catch up on old messages held only on the
> master node. You can obtain this feature now by installing a nightly
> build of the broker, or wait for the next release.
>
>
> -Emile
>
>
>
>
>
>

-- 
Jason McIntosh
http://mcintosh.poetshome.com/blog/
573-424-7612
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130426/39c9c98e/attachment.htm>