[rabbitmq-discuss] Pause minority cluster with publisher confirms losing messages
Miguel Araujo Pérez
miguel.araujo.perez at gmail.com
Wed Jun 4 08:40:28 BST 2014
Hello,
We've setup a RabbitMQ 3.3.1-1 cluster of 3 nodes in pause minority mode.
We are making some tests to make sure we don't lose any messages when a
node of the cluster goes down.
So I've setup a little Python script that uses py-amqp to queue messages,
It uses publisher confirms for doing so. The queue is durable and mirrored
through a policy to all nodes. I use the script to push to the 3 different
nodes in a loop, running 3 separate processes, one message every second,
each message containing information of the publisher that produced it. Once
I am publishing to the 3 nodes separated I enter node3 and write iptables
rules to close connection with the other 2 rabbitmq nodes. It takes the
cluster around a minute to decide that one node is down and node3 to stop
Rabbit process. publishers to nodes 1 and 2 keep working without issues,
however publisher3 blocks right after node3 blocks connections as I would
expect as node3 cannot confirm the message as it doesn't see the other 2
nodes.
The issue is that sometimes after a while publisher3 resumes and continues
pushing messages and according to the library receiving acks for them, that
goes for a period of 6-8 seconds until an exception is raised because
connection is closed (node3 stops Rabbit). Those "acked messages" aren't
however in the queue when I consume it later to see what's inside. However,
other times it works as i would expect and doesn't enqueue any other
message after iptables takes place.
So I thought this could be a library issue, and ported the code to PHP
using official php-amqplib and exact same thing happens. My theory is that
sometimes node3 after trying to coordinate with other 2 nodes goes into a
partition for some seconds, in those seconds it confirms messages and then
pause minority cluster policy kicks in and stops Rabbit.
To be honest, I'm open to suggestions on what to try. We cannot afford
losing messages in any situation.
Thanks, Cheers
Miguel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140604/b3781723/attachment.html>
More information about the rabbitmq-discuss
mailing list