[rabbitmq-discuss] Pause minority cluster with publisher confirms losing messages

Miguel Araujo Pérez miguel.araujo.perez at gmail.com
Wed Jun 4 08:40:28 BST 2014


Hello,

We've setup a RabbitMQ 3.3.1-1 cluster of 3 nodes in pause minority mode. 
We are making some tests to make sure we don't lose any messages when a 
node of the cluster goes down. 

So I've setup a little Python script that uses py-amqp to queue messages, 
It uses publisher confirms for doing so. The queue is durable and mirrored 
through a policy to all nodes. I use the script to push to the 3 different 
nodes in a loop, running 3 separate processes, one message every second, 
each message containing information of the publisher that produced it. Once 
I am publishing to the 3 nodes separated I enter node3 and write iptables 
rules to close connection with the other 2 rabbitmq nodes. It takes the 
cluster around a minute to decide that one node is down and node3 to stop 
Rabbit process. publishers to nodes 1 and 2 keep working without issues, 
however publisher3 blocks right after node3 blocks connections as I would 
expect as node3 cannot confirm the message as it doesn't see the other 2 
nodes. 

The issue is that sometimes after a while publisher3 resumes and continues 
pushing messages and according to the library receiving acks for them, that 
goes for a period of 6-8 seconds until an exception is raised because 
connection is closed (node3 stops Rabbit). Those "acked messages" aren't 
however in the queue when I consume it later to see what's inside. However, 
other times it works as i would expect and doesn't enqueue any other 
message after iptables takes place.

So I thought this could be a library issue, and ported the code to PHP 
using official php-amqplib and exact same thing happens. My theory is that 
sometimes node3 after trying to coordinate with other 2 nodes goes into a 
partition for some seconds, in those seconds it confirms messages and then 
pause minority cluster policy kicks in and stops Rabbit. 

To be honest, I'm open to suggestions on what to try. We cannot afford 
losing messages in any situation. 

Thanks, Cheers
Miguel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140604/b3781723/attachment.html>


More information about the rabbitmq-discuss mailing list