[rabbitmq-discuss] Cluster Pathology
Dmitriy Samovskiy
dmitriy.samovskiy at cohesiveft.com
Wed Feb 11 01:39:08 GMT 2009
Hi Jason,
Jason J. W. Williams wrote:
>
> Setup A:
>
> * Consumer 1 attached to MQ node A and creates queue and binding.
> * Consumer 2 attaches to MQ node B and creates queue and binding (same
> as Consumer 1 and therefore no-op'd).
> * Producer 1 attaches to MQ node B (it also creates queue and binding
> same as Consumer 1...no-op'd) and publishes messages. Connection to MQ
> node B is persistent.
> * Consumer 1 dies.
> * MQ node A dies. Queues are not recreated on node B, and produced
> messages are black holed. Queues are not re-created because Consumer 2
> and Producer 1 are not notified by node B to reconnect or any other
> error (therefore their reconnect/recreate queue code never gets
> triggered).
Have you tried restoring node A? When it went down, it might have had some messages in the
queue. And since contents of queues are not replicated, nobody knows about this fact
except for node A itself.
When you restore it, maybe rabbit will magically detect that a queue has now been
re-declared on another node and will migrate unconsumed messages there? Or not...
> Setup B:
>
> * Same as Setup A except:
> * Producer 1 attaches to MQ node A.
> * MQ node A and Consumer 1 fail. Producer 1 reconnects to node B and
> recreates queues and bindings. Messages Producer 1 publishes are
> placed in the recreated queue. However, Consumer 2 never is handed
> messages by node B (which it has been persistently attached to) after
> the recreation of the queue.
Maybe because consumer 2 is still attached to a queue that is on a node that is down? I
suspect that when you create a binding by name, rabbit resolves the name string to its
internal locator for a specific queue, which in this case is on node A which is down? I
would guess that if you restart the consumer it will attach to newly created queue.
But again, the question will remain how one can get messages from an old queue named "foo"
when a new queue "foo" now exists on another node.
- Dmitriy
More information about the rabbitmq-discuss
mailing list