[rabbitmq-discuss] Network partition queue issues

Simon MacMullen simon at rabbitmq.com
Mon May 12 10:12:13 BST 2014


There are definitely several bugs which have been fixed in mirrored 
queues and in autoheal since 3.2.4 which might account for what you saw. 
If you send me logs privately I can look to see if there are signs of 
any of the bugs we have fixed.

Cheers, Simon

On 09/05/2014 12:26, Jon Bergli Heier wrote:
> Hi,
>
> Yesterday we had two unfortunate network partitions on one of our two-node
> clusters, about an hour apart. After the second partition, when we restarted
> one of the nodes, we encountered some issues:
>
> 1) one of the queues disappeared, including any queued messages.
> 2) two other queues appears hung and doesn't respond to anything.
>
> We've managed to resolve 1) by recreating the queue and bindings and
> republishing the messages, but 2) is still a problem, as we can't do anything
> with these queues, not even delete them (the management interface and API just
> hangs when trying to delete). Any consumers also appears to hang when
> interacting with these queues. Restarting the entire cluster also didn't help.
>
> Is there any way to prevent 1), and can we somehow solve 2) without resetting
> the entire cluster? We're currently running RabbitMQ 3.2.4. I have logs
> available, but I'd rather not post these publicly since there's some sensitive
> data in there.
>
> FTR we're using autoheal and all queues on the cluster have ha-mode=all and
> ha-sync-mode=automatic. Also, according to heartbeat, the nodes only appears to
> have lost contact for a few seconds (nothing seems to be logged during the
> second split).
>
> Thanks,
> Jon
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>



More information about the rabbitmq-discuss mailing list