[rabbitmq-discuss] Active/Active failover and lost messages

Matthew Sackman matthew at rabbitmq.com
Fri Nov 4 16:10:23 GMT 2011


Hi Konstantin,

On Thu, Nov 03, 2011 at 02:19:51PM -0700, Konstantin Kalin wrote:
> I'm playing with RabbitMQ in Active/Active mode (Mirrored queues). I
> think it's very good product. Currently I have an issue and I could
> not find a reason for that. I would like to understand if I use
> RabbitMQ not correctly.
> My setup is:
>  a) 3 cluster nodes (2 disc, 1 ram)
>  b) 50 publishers/consumers. All publishers and consumers are
> distributed between nodes.
>  c) Message publishing rate is about 15 messages per second per
> publisher.
>  d) Publishers work with "Publisher confirms" mode
>  e) Consumers work "autoack=false"
>  f) A publisher or client knows about all nodes in the cluster and
> does failover to another node if there is an issue with current
> connection.

That all sounds fine, but you don't mention whether all 50 publishers
publish to the same queue and you have 50 consumers consuming from that
queue, or whether each pair of publisher+consumer have an individual
queue between them.

> If I stop a node some consumers don't get all messages (even a
> consumer is connected with another node). About 2-3 messages are
> lost.

Without understanding your topology better, I'm not quite sure how to
interpret that.

> I did several tests and found a correlation. Only consumers getting
> "ConsumerCancelledException" lose the messages. Other consumers don't
> lose any messages even if they are connected with the node I stop.
> Could you please advise what I need to check to find a reason of the
> issue?

Ok, so that looks like you're using the Java client? Are you also using
the QueueingConsumer? It's possible that if not, you've been sent some
messages but they've being overtaken by the exception in some way. If
you use the QueueingConsumer, that shouldn't happen. However, that said,
for other reasons, if this was occurring, I'd expect you to be resent
such messages when you reconsumed from the queue.

I've recently improved our MulticastMain java example so that it copes
transparently with the ConsumerCancelledException (though I've actually
not used it to verify the absence of message loss).

http://hg.rabbitmq.com/rabbitmq-java-client/file/15f36113ffd3/test/src/com/rabbitmq/examples/MulticastMain.java
from line 442 onwards may be of use. Oh yes, I discovered when doing
that QueueingConsumers are not reusable - you really do have to create a
new one whenever you resubscribe. That bit me for a while...


Best wishes,

Matthew


More information about the rabbitmq-discuss mailing list