[rabbitmq-discuss] HA queue disappears when a node rejoins the cluster

Wed Aug 29 18:08:04 BST 2012

Hi Francesco,

Argh... I was so happy that this bug appeared to have been reproduced.

I would happily give you a script to repro, except that the logic is
intertwined with our own infrastructure. So instead, here's a brain dump of
everything I can fathom would be relevant.

Each node (rabbit at play, rabbit at play2, rabbit at util) is run on it's own
separate Ubuntu 10.04 64-bit VM. The management and tracing plugin are
enabled on all nodes. All nodes are disc nodes.

Clustering is performed by a config file that's the same on all nodes:

----
[
    {rabbit, [{cluster_nodes, [rabbit at play,rabbit at play2,rabbit at util] },
              {disk_free_limit, 104857600}
             ]
    },
    {mnesia, [{debug, trace}
             ]
    }
].
----

All queues are created as HA and durable.

Starting from a fully operational cluster with all nodes running and a set
of ~6 empty queues.

   - Pick a queue to work with (say, "foo")
   - Using the management UI, select the queue tab and publish 4 messages
   to "foo", delivery-mode = persistent, but  message content doesn't matter.
   I use "abc" for the content.
   - Using rabbitmqctl, bring the first node (play) down. In the management
   UI, note that the queues are now backed by only two nodes (play2, util)
   - Using rabbitmqctl, bring the first node (play) back up. In the
   management UI, note that the "foo" queue is synched on (play2, util), but
   not on play. (UI shows "+1 +1")
   - Using the management UI, retrieve all the messages from the queue in
   question, but with the requeue flag enabled. After this, the queue appears
   synced on all nodes ("+2).
   - Using rabbitmqctl, bring down the 2nd node (play2). If you wait long
   enough, you should see that the "foo" queue has vanished. However, all
   other queues remain. You should also have logs similar to what I included
   previously.
   - On the off chance that "foo" doesn't disappear within 10-15 seconds,
   use rabbitmqctl to start the second node up again. See if "foo" has
   vanished.

Regarding the resync logic I have (HTTP api to retrieve all messages, with
requeue), I'm not wedded to that. I'm just looking for some way to quickly
resynch all contents, preferably without myself reading and re-publishing
all messages. If there's some way to let the broker do most (all?) of the
work, I'm all ears. My observation was that the "retrieve with requeue"
seemed to work as intended, but you're obviously the expert on this, so I'm
all ears.

On Wed, Aug 29, 2012 at 3:56 AM, Francesco Mazzoli
<francesco at rabbitmq.com>wrote:

> Hi Matt,
>
> At Wed, 22 Aug 2012 15:06:42 -0700,
> Matt Pietrek wrote:
> > I then take down the play node and start it back up. Afterwards, I force
> > everything to be synchronized by doing a management API 'get messages"
> with
> > requeue=True. When this completes, everything shows up synched as
> expected.
>
> This should not happen.  Messages are requeued at the original position in
> the
> queue, see <http://www.rabbitmq.com/semantics.html>, and thus that has no
> effect
> to the syncing of slaves.  Republishing would.
>
> I tried to reproduce the problem following exactly what you did, with no
> success.  Can you describe, in detail, your setup and the steps you're
> taking to
> reproduce that?  The most convenient thing would be to automate the
> procedure in
> a script.
>
> --
> Francesco * Often in error, never in doubt
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120829/bd5df0f8/attachment.htm>