[rabbitmq-discuss] loss o messages on a cluster + durable queues + mirrored queues

Thu Jan 10 23:29:24 GMT 2013

Alex, I believe what you are seeing is because mirrored queues do not 
automatically sync.

   - When you restart node1 it has 0 messages and is not synced with node2. 
    Since node1 was offline, the durable messages in it's queue are considered 
   out dated, since node2 is more likely to be up to date, and are discarded. 
    Node1 is now a slave of node2, but the messages in node2 are not copied to 
   node1.
   - Then when you restart node2, node1 with 0 messages becomes the master.

This can feel odd, because often you bounce one box and then the other for 
maintenance or similar.  You need to make sure that all messages in the 
queue when one node is bounced have been drained before bouncing the other 
node.  You can verify this in a few ways, but the easiest is to just peek 
at the Admin console and your queue will show -1 (or -n for n nodes) rather 
than +1.  This tells you that the mirrors are not synced.

On Wednesday, January 9, 2013 1:57:54 PM UTC-5, Alexandre Bunn wrote:
>
> Good afternoon
>
> I have a 2 nodes cluster with mirrored queues and I'm setting 
> delivery_mode = 2 when I'm publishing messages to the cluster. I'm missing 
> messages when I follow these steps below (for test issues).
>
> Step 0 - Node1 (master) Node2 (slave)
>
> # rabbitmqctl cluster_status
> Cluster status of node 'rabbit at vid-mq01-mia' ...
> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>  {running_nodes,['rabbit at vid-mq02-mia','rabbit at vid-mq01-mia']},
>  {partitions,[]}]
> ...done.
>
> # rabbitmqctl list_policies
> Listing policies ...
> /       ha-all  ^ha\\.  {"ha-mode":"all"}       0
> ...done.
>
> ## rabbitmqctl list_queues name durable policy
> Listing queues ...
> ha.teste        true    ha-all
> ...done.
>
>
> - Step 1 - I write 2 messages, at ha.teste queue, 1 on each member of the 
> cluster and the messages are mirrored as expected. Here there are two 
> messages on each nodes (rabbitmqctl list_queues)
>
> -- node1
> # rabbitmqctl list_queues
> Listing queues ...
> ha.teste        2
> ...done.
> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
> Listing queues ...
> ha.teste        <'rabbit at vid-mq01-mia'.3.253.0> 
> [<'rabbit at vid-mq02-mia'.2.252.0>]       [<'rabbit at vid-mq02-mia'.2.252.0>]
> ...done.
>
> -- node2
> # rabbitmqctl list_queues
> Listing queues ...
> ha.teste        2
> ...done.
> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
> Listing queues ...
> ha.teste        <'rabbit at vid-mq01-mia'.3.253.0> 
> [<'rabbit at vid-mq02-mia'.2.252.0>]       [<'rabbit at vid-mq02-mia'.2.252.0>]
> ...done.
>
> - Step 2 - I've stopped rabbitmq-server on node1 and node2 assume the 
> master. Here there are the two messages on node2
>
> -- Logs on node2
> Mirrored-queue (queue 'ha.teste' in vhost '/'): Promoting slave 
> <'rabbit at vid-mq02-mia'.2.252.0> to master
>
> -- Status of the cluster on node2
>
> # rabbitmqctl cluster_status
> Cluster status of node 'rabbit at vid-mq02-mia' ...
> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>  {running_nodes,['rabbit at vid-mq02-mia']},
>  {partitions,[]}]
> ...done.
>
> -- Status of the ha.teste queue on node2
>
> # rabbitmqctl list_queues
> Listing queues ...
> ha.teste        2
> ...done.
> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
> Listing queues ...
> ha.teste        <'rabbit at vid-mq02-mia'.2.252.0> []      []
> ...done.
>
> - Step 3 - I start the rabbitmq-server on node1, node2 still master and 
> node1 is running as slave. Here there are the two messages on node1 and 
> node2
>
> -- Log on node2
> =INFO REPORT==== 9-Jan-2013::18:40:46 ===
> rabbit on node 'rabbit at vid-mq01-mia' up
>
> -- Status of the cluster on node1
>
> # rabbitmqctl cluster_status
> Cluster status of node 'rabbit at vid-mq01-mia' ...
> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>  {running_nodes,['rabbit at vid-mq02-mia','rabbit at vid-mq01-mia']},
>  {partitions,[]}]
> ...done.
>
> -- Status of the ha.teste queue on node2
> # rabbitmqctl list_queues
> Listing queues ...
> ha.teste        2
> ...done.
>
> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
> Listing queues ...
> ha.teste        <'rabbit at vid-mq02-mia'.2.252.0> 
> [<'rabbit at vid-mq01-mia'.1.253.0>]       []
> ...done.
>
> - Step 4 - I've stopped rabbitmq-server on node2 and node1 assume the 
> master but the messages disapeared
>
> -- Logs on node1
>
> =INFO REPORT==== 9-Jan-2013::18:43:42 ===
> Mirrored-queue (queue 'ha.teste' in vhost '/'): Promoting slave 
> <'rabbit at vid-mq01-mia'.1.253.0> to master
>
> =INFO REPORT==== 9-Jan-2013::18:43:42 ===
> rabbit on node 'rabbit at vid-mq02-mia' down
>
> -- Cluster status on node1
>
> # rabbitmqctl cluster_status
> Cluster status of node 'rabbit at vid-mq01-mia' ...
> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>  {running_nodes,['rabbit at vid-mq01-mia']},
>  {partitions,[]}]
> ...done.
>
> -- Status of the ha.teste queue on node1
>
> # rabbitmqctl list_queues
> Listing queues ...
> ha.teste        0
> ...done.
>
> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
> Listing queues ...
> ha.teste        <'rabbit at vid-mq01-mia'.1.253.0> []      []
> ...done.
>
> Is it expected? What we have to do at server side or client side (python 
> script) to make the messages really durable? All the servers are running 
> centos6 x86_64 and rabbitmq 3.0.1-1.
>
> Thanks
>
> Alexandre Bunn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130110/f7be03ae/attachment.htm>