[rabbitmq-discuss] loss o messages on a cluster + durable queues + mirrored queues

Alexandre Bunn albunn at gmail.com
Fri Jan 11 10:51:38 GMT 2013


I've found in a forum 
(http://comments.gmane.org/gmane.comp.networking.rabbitmq.general/12438) a 
user that have reported the same situation and an user have answered:

"

You can run "rabbitmqctl list_queues name slave_pids
synchronised_slave_pids" to see if the cluster is synchronized or not
but I can tell you right now that they will not be. In step 3 when you
start vm1 back up and it is the slave. It is telling you that there
are 10 messages in the vm2 queue. When you talk to a broker in a
cluster it will talk to the master queue. VM1 will not be synchronized
until all 10 messages are read out of the vm2 queue, because rabbitmq
mirrored clusters do not read old messages that are already in the
master queue. The slave reads the tail of the new message being sent
to the master and expects that once it has been long enough then it
will catch up to the same state as the master.
http://www.rabbitmq.com/ha.html "Unsynchronised Slaves" I think does a
good job of explaining it.
"


Does rabbitmq continues with this behavior? Is there any configuration that 
I can do to solve this issue?

Thanks

Alexandre

Em quinta-feira, 10 de janeiro de 2013 21h29min24s UTC-2, Scott Brown 
escreveu:
>
> Alex, I believe what you are seeing is because mirrored queues do not 
> automatically sync.
>
>    - When you restart node1 it has 0 messages and is not synced with 
>    node2.  Since node1 was offline, the durable messages in it's queue are 
>    considered out dated, since node2 is more likely to be up to date, and are 
>    discarded.  Node1 is now a slave of node2, but the messages in node2 are 
>    not copied to node1.
>    - Then when you restart node2, node1 with 0 messages becomes the 
>    master.
>    
> This can feel odd, because often you bounce one box and then the other for 
> maintenance or similar.  You need to make sure that all messages in the 
> queue when one node is bounced have been drained before bouncing the other 
> node.  You can verify this in a few ways, but the easiest is to just peek 
> at the Admin console and your queue will show -1 (or -n for n nodes) rather 
> than +1.  This tells you that the mirrors are not synced.
>
>
> On Wednesday, January 9, 2013 1:57:54 PM UTC-5, Alexandre Bunn wrote:
>>
>> Good afternoon
>>
>> I have a 2 nodes cluster with mirrored queues and I'm setting 
>> delivery_mode = 2 when I'm publishing messages to the cluster. I'm missing 
>> messages when I follow these steps below (for test issues).
>>
>> Step 0 - Node1 (master) Node2 (slave)
>>
>> # rabbitmqctl cluster_status
>> Cluster status of node 'rabbit at vid-mq01-mia' ...
>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>  {running_nodes,['rabbit at vid-mq02-mia','rabbit at vid-mq01-mia']},
>>  {partitions,[]}]
>> ...done.
>>
>> # rabbitmqctl list_policies
>> Listing policies ...
>> /       ha-all  ^ha\\.  {"ha-mode":"all"}       0
>> ...done.
>>
>> ## rabbitmqctl list_queues name durable policy
>> Listing queues ...
>> ha.teste        true    ha-all
>> ...done.
>>
>>
>> - Step 1 - I write 2 messages, at ha.teste queue, 1 on each member of the 
>> cluster and the messages are mirrored as expected. Here there are two 
>> messages on each nodes (rabbitmqctl list_queues)
>>
>> -- node1
>> # rabbitmqctl list_queues
>> Listing queues ...
>> ha.teste        2
>> ...done.
>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>> Listing queues ...
>> ha.teste        <'rabbit at vid-mq01-mia'.3.253.0> 
>> [<'rabbit at vid-mq02-mia'.2.252.0>]       [<'rabbit at vid-mq02-mia'.2.252.0>]
>> ...done.
>>
>> -- node2
>> # rabbitmqctl list_queues
>> Listing queues ...
>> ha.teste        2
>> ...done.
>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>> Listing queues ...
>> ha.teste        <'rabbit at vid-mq01-mia'.3.253.0> 
>> [<'rabbit at vid-mq02-mia'.2.252.0>]       [<'rabbit at vid-mq02-mia'.2.252.0>]
>> ...done.
>>
>> - Step 2 - I've stopped rabbitmq-server on node1 and node2 assume the 
>> master. Here there are the two messages on node2
>>
>> -- Logs on node2
>> Mirrored-queue (queue 'ha.teste' in vhost '/'): Promoting slave 
>> <'rabbit at vid-mq02-mia'.2.252.0> to master
>>
>> -- Status of the cluster on node2
>>
>> # rabbitmqctl cluster_status
>> Cluster status of node 'rabbit at vid-mq02-mia' ...
>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>  {running_nodes,['rabbit at vid-mq02-mia']},
>>  {partitions,[]}]
>> ...done.
>>
>> -- Status of the ha.teste queue on node2
>>
>> # rabbitmqctl list_queues
>> Listing queues ...
>> ha.teste        2
>> ...done.
>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>> Listing queues ...
>> ha.teste        <'rabbit at vid-mq02-mia'.2.252.0> []      []
>> ...done.
>>
>> - Step 3 - I start the rabbitmq-server on node1, node2 still master and 
>> node1 is running as slave. Here there are the two messages on node1 and 
>> node2
>>
>> -- Log on node2
>> =INFO REPORT==== 9-Jan-2013::18:40:46 ===
>> rabbit on node 'rabbit at vid-mq01-mia' up
>>
>> -- Status of the cluster on node1
>>
>> # rabbitmqctl cluster_status
>> Cluster status of node 'rabbit at vid-mq01-mia' ...
>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>  {running_nodes,['rabbit at vid-mq02-mia','rabbit at vid-mq01-mia']},
>>  {partitions,[]}]
>> ...done.
>>
>> -- Status of the ha.teste queue on node2
>> # rabbitmqctl list_queues
>> Listing queues ...
>> ha.teste        2
>> ...done.
>>
>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>> Listing queues ...
>> ha.teste        <'rabbit at vid-mq02-mia'.2.252.0> 
>> [<'rabbit at vid-mq01-mia'.1.253.0>]       []
>> ...done.
>>
>> - Step 4 - I've stopped rabbitmq-server on node2 and node1 assume the 
>> master but the messages disapeared
>>
>> -- Logs on node1
>>
>> =INFO REPORT==== 9-Jan-2013::18:43:42 ===
>> Mirrored-queue (queue 'ha.teste' in vhost '/'): Promoting slave 
>> <'rabbit at vid-mq01-mia'.1.253.0> to master
>>
>> =INFO REPORT==== 9-Jan-2013::18:43:42 ===
>> rabbit on node 'rabbit at vid-mq02-mia' down
>>
>> -- Cluster status on node1
>>
>> # rabbitmqctl cluster_status
>> Cluster status of node 'rabbit at vid-mq01-mia' ...
>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>  {running_nodes,['rabbit at vid-mq01-mia']},
>>  {partitions,[]}]
>> ...done.
>>
>> -- Status of the ha.teste queue on node1
>>
>> # rabbitmqctl list_queues
>> Listing queues ...
>> ha.teste        0
>> ...done.
>>
>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>> Listing queues ...
>> ha.teste        <'rabbit at vid-mq01-mia'.1.253.0> []      []
>> ...done.
>>
>> Is it expected? What we have to do at server side or client side (python 
>> script) to make the messages really durable? All the servers are running 
>> centos6 x86_64 and rabbitmq 3.0.1-1.
>>
>> Thanks
>>
>> Alexandre Bunn
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130111/096c6d24/attachment.htm>


More information about the rabbitmq-discuss mailing list