[rabbitmq-discuss] loss o messages on a cluster + durable queues + mirrored queues

Fri Jan 11 14:32:53 GMT 2013

In general your queues should not be "stopped" so this would not be a 
problem.  In practice there can be reasons this occurs.  I would look first 
at minimizing the amount of time a message spends in a queue.  If you have 
error queues that hold messages, don't use rabbit to store these.  Read 
them off and store them in a db.
If you absolutely feel you must be able to sync queues, the best I know of 
is to create a process that reads and requeues the message.  You are likely 
to lose message order if you have new messages coming in.

On Friday, January 11, 2013 5:51:38 AM UTC-5, Alexandre Bunn wrote:
>
> I've found in a forum (
> http://comments.gmane.org/gmane.comp.networking.rabbitmq.general/12438) a 
> user that have reported the same situation and an user have answered:
>
> "
>
> You can run "rabbitmqctl list_queues name slave_pids
> synchronised_slave_pids" to see if the cluster is synchronized or not
> but I can tell you right now that they will not be. In step 3 when you
> start vm1 back up and it is the slave. It is telling you that there
> are 10 messages in the vm2 queue. When you talk to a broker in a
> cluster it will talk to the master queue. VM1 will not be synchronized
> until all 10 messages are read out of the vm2 queue, because rabbitmq
> mirrored clusters do not read old messages that are already in the
> master queue. The slave reads the tail of the new message being sent
> to the master and expects that once it has been long enough then it
> will catch up to the same state as the master.
> http://www.rabbitmq.com/ha.html "Unsynchronised Slaves" I think does a
> good job of explaining it.
> "
>
>
> Does rabbitmq continues with this behavior? Is there any configuration 
> that I can do to solve this issue?
>
> Thanks
>
> Alexandre
>
> Em quinta-feira, 10 de janeiro de 2013 21h29min24s UTC-2, Scott Brown 
> escreveu:
>>
>> Alex, I believe what you are seeing is because mirrored queues do not 
>> automatically sync.
>>
>>    - When you restart node1 it has 0 messages and is not synced with 
>>    node2.  Since node1 was offline, the durable messages in it's queue are 
>>    considered out dated, since node2 is more likely to be up to date, and are 
>>    discarded.  Node1 is now a slave of node2, but the messages in node2 are 
>>    not copied to node1.
>>    - Then when you restart node2, node1 with 0 messages becomes the 
>>    master.
>>    
>> This can feel odd, because often you bounce one box and then the other 
>> for maintenance or similar.  You need to make sure that all messages in the 
>> queue when one node is bounced have been drained before bouncing the other 
>> node.  You can verify this in a few ways, but the easiest is to just peek 
>> at the Admin console and your queue will show -1 (or -n for n nodes) rather 
>> than +1.  This tells you that the mirrors are not synced.
>>
>>
>> On Wednesday, January 9, 2013 1:57:54 PM UTC-5, Alexandre Bunn wrote:
>>>
>>> Good afternoon
>>>
>>> I have a 2 nodes cluster with mirrored queues and I'm setting 
>>> delivery_mode = 2 when I'm publishing messages to the cluster. I'm missing 
>>> messages when I follow these steps below (for test issues).
>>>
>>> Step 0 - Node1 (master) Node2 (slave)
>>>
>>> # rabbitmqctl cluster_status
>>> Cluster status of node 'rabbit at vid-mq01-mia' ...
>>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>>  {running_nodes,['rabbit at vid-mq02-mia','rabbit at vid-mq01-mia']},
>>>  {partitions,[]}]
>>> ...done.
>>>
>>> # rabbitmqctl list_policies
>>> Listing policies ...
>>> /       ha-all  ^ha\\.  {"ha-mode":"all"}       0
>>> ...done.
>>>
>>> ## rabbitmqctl list_queues name durable policy
>>> Listing queues ...
>>> ha.teste        true    ha-all
>>> ...done.
>>>
>>>
>>> - Step 1 - I write 2 messages, at ha.teste queue, 1 on each member of 
>>> the cluster and the messages are mirrored as expected. Here there are two 
>>> messages on each nodes (rabbitmqctl list_queues)
>>>
>>> -- node1
>>> # rabbitmqctl list_queues
>>> Listing queues ...
>>> ha.teste        2
>>> ...done.
>>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>>> Listing queues ...
>>> ha.teste        <'rabbit at vid-mq01-mia'.3.253.0> 
>>> [<'rabbit at vid-mq02-mia'.2.252.0>]       [<'rabbit at vid-mq02-mia'.2.252.0>]
>>> ...done.
>>>
>>> -- node2
>>> # rabbitmqctl list_queues
>>> Listing queues ...
>>> ha.teste        2
>>> ...done.
>>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>>> Listing queues ...
>>> ha.teste        <'rabbit at vid-mq01-mia'.3.253.0> 
>>> [<'rabbit at vid-mq02-mia'.2.252.0>]       [<'rabbit at vid-mq02-mia'.2.252.0>]
>>> ...done.
>>>
>>> - Step 2 - I've stopped rabbitmq-server on node1 and node2 assume the 
>>> master. Here there are the two messages on node2
>>>
>>> -- Logs on node2
>>> Mirrored-queue (queue 'ha.teste' in vhost '/'): Promoting slave 
>>> <'rabbit at vid-mq02-mia'.2.252.0> to master
>>>
>>> -- Status of the cluster on node2
>>>
>>> # rabbitmqctl cluster_status
>>> Cluster status of node 'rabbit at vid-mq02-mia' ...
>>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>>  {running_nodes,['rabbit at vid-mq02-mia']},
>>>  {partitions,[]}]
>>> ...done.
>>>
>>> -- Status of the ha.teste queue on node2
>>>
>>> # rabbitmqctl list_queues
>>> Listing queues ...
>>> ha.teste        2
>>> ...done.
>>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>>> Listing queues ...
>>> ha.teste        <'rabbit at vid-mq02-mia'.2.252.0> []      []
>>> ...done.
>>>
>>> - Step 3 - I start the rabbitmq-server on node1, node2 still master and 
>>> node1 is running as slave. Here there are the two messages on node1 and 
>>> node2
>>>
>>> -- Log on node2
>>> =INFO REPORT==== 9-Jan-2013::18:40:46 ===
>>> rabbit on node 'rabbit at vid-mq01-mia' up
>>>
>>> -- Status of the cluster on node1
>>>
>>> # rabbitmqctl cluster_status
>>> Cluster status of node 'rabbit at vid-mq01-mia' ...
>>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>>  {running_nodes,['rabbit at vid-mq02-mia','rabbit at vid-mq01-mia']},
>>>  {partitions,[]}]
>>> ...done.
>>>
>>> -- Status of the ha.teste queue on node2
>>> # rabbitmqctl list_queues
>>> Listing queues ...
>>> ha.teste        2
>>> ...done.
>>>
>>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>>> Listing queues ...
>>> ha.teste        <'rabbit at vid-mq02-mia'.2.252.0> 
>>> [<'rabbit at vid-mq01-mia'.1.253.0>]       []
>>> ...done.
>>>
>>> - Step 4 - I've stopped rabbitmq-server on node2 and node1 assume the 
>>> master but the messages disapeared
>>>
>>> -- Logs on node1
>>>
>>> =INFO REPORT==== 9-Jan-2013::18:43:42 ===
>>> Mirrored-queue (queue 'ha.teste' in vhost '/'): Promoting slave 
>>> <'rabbit at vid-mq01-mia'.1.253.0> to master
>>>
>>> =INFO REPORT==== 9-Jan-2013::18:43:42 ===
>>> rabbit on node 'rabbit at vid-mq02-mia' down
>>>
>>> -- Cluster status on node1
>>>
>>> # rabbitmqctl cluster_status
>>> Cluster status of node 'rabbit at vid-mq01-mia' ...
>>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>>  {running_nodes,['rabbit at vid-mq01-mia']},
>>>  {partitions,[]}]
>>> ...done.
>>>
>>> -- Status of the ha.teste queue on node1
>>>
>>> # rabbitmqctl list_queues
>>> Listing queues ...
>>> ha.teste        0
>>> ...done.
>>>
>>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>>> Listing queues ...
>>> ha.teste        <'rabbit at vid-mq01-mia'.1.253.0> []      []
>>> ...done.
>>>
>>> Is it expected? What we have to do at server side or client side (python 
>>> script) to make the messages really durable? All the servers are running 
>>> centos6 x86_64 and rabbitmq 3.0.1-1.
>>>
>>> Thanks
>>>
>>> Alexandre Bunn
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130111/ebcc4cfc/attachment-0001.htm>