[rabbitmq-discuss] loss o messages on a cluster + durable queues + mirrored queues
Scott Brown
scott.brown at secondsupper.com
Fri Jan 11 14:32:53 GMT 2013
In general your queues should not be "stopped" so this would not be a
problem. In practice there can be reasons this occurs. I would look first
at minimizing the amount of time a message spends in a queue. If you have
error queues that hold messages, don't use rabbit to store these. Read
them off and store them in a db.
If you absolutely feel you must be able to sync queues, the best I know of
is to create a process that reads and requeues the message. You are likely
to lose message order if you have new messages coming in.
On Friday, January 11, 2013 5:51:38 AM UTC-5, Alexandre Bunn wrote:
>
> I've found in a forum (
> http://comments.gmane.org/gmane.comp.networking.rabbitmq.general/12438) a
> user that have reported the same situation and an user have answered:
>
> "
>
> You can run "rabbitmqctl list_queues name slave_pids
> synchronised_slave_pids" to see if the cluster is synchronized or not
> but I can tell you right now that they will not be. In step 3 when you
> start vm1 back up and it is the slave. It is telling you that there
> are 10 messages in the vm2 queue. When you talk to a broker in a
> cluster it will talk to the master queue. VM1 will not be synchronized
> until all 10 messages are read out of the vm2 queue, because rabbitmq
> mirrored clusters do not read old messages that are already in the
> master queue. The slave reads the tail of the new message being sent
> to the master and expects that once it has been long enough then it
> will catch up to the same state as the master.
> http://www.rabbitmq.com/ha.html "Unsynchronised Slaves" I think does a
> good job of explaining it.
> "
>
>
> Does rabbitmq continues with this behavior? Is there any configuration
> that I can do to solve this issue?
>
> Thanks
>
> Alexandre
>
> Em quinta-feira, 10 de janeiro de 2013 21h29min24s UTC-2, Scott Brown
> escreveu:
>>
>> Alex, I believe what you are seeing is because mirrored queues do not
>> automatically sync.
>>
>> - When you restart node1 it has 0 messages and is not synced with
>> node2. Since node1 was offline, the durable messages in it's queue are
>> considered out dated, since node2 is more likely to be up to date, and are
>> discarded. Node1 is now a slave of node2, but the messages in node2 are
>> not copied to node1.
>> - Then when you restart node2, node1 with 0 messages becomes the
>> master.
>>
>> This can feel odd, because often you bounce one box and then the other
>> for maintenance or similar. You need to make sure that all messages in the
>> queue when one node is bounced have been drained before bouncing the other
>> node. You can verify this in a few ways, but the easiest is to just peek
>> at the Admin console and your queue will show -1 (or -n for n nodes) rather
>> than +1. This tells you that the mirrors are not synced.
>>
>>
>> On Wednesday, January 9, 2013 1:57:54 PM UTC-5, Alexandre Bunn wrote:
>>>
>>> Good afternoon
>>>
>>> I have a 2 nodes cluster with mirrored queues and I'm setting
>>> delivery_mode = 2 when I'm publishing messages to the cluster. I'm missing
>>> messages when I follow these steps below (for test issues).
>>>
>>> Step 0 - Node1 (master) Node2 (slave)
>>>
>>> # rabbitmqctl cluster_status
>>> Cluster status of node 'rabbit at vid-mq01-mia' ...
>>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>> {running_nodes,['rabbit at vid-mq02-mia','rabbit at vid-mq01-mia']},
>>> {partitions,[]}]
>>> ...done.
>>>
>>> # rabbitmqctl list_policies
>>> Listing policies ...
>>> / ha-all ^ha\\. {"ha-mode":"all"} 0
>>> ...done.
>>>
>>> ## rabbitmqctl list_queues name durable policy
>>> Listing queues ...
>>> ha.teste true ha-all
>>> ...done.
>>>
>>>
>>> - Step 1 - I write 2 messages, at ha.teste queue, 1 on each member of
>>> the cluster and the messages are mirrored as expected. Here there are two
>>> messages on each nodes (rabbitmqctl list_queues)
>>>
>>> -- node1
>>> # rabbitmqctl list_queues
>>> Listing queues ...
>>> ha.teste 2
>>> ...done.
>>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>>> Listing queues ...
>>> ha.teste <'rabbit at vid-mq01-mia'.3.253.0>
>>> [<'rabbit at vid-mq02-mia'.2.252.0>] [<'rabbit at vid-mq02-mia'.2.252.0>]
>>> ...done.
>>>
>>> -- node2
>>> # rabbitmqctl list_queues
>>> Listing queues ...
>>> ha.teste 2
>>> ...done.
>>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>>> Listing queues ...
>>> ha.teste <'rabbit at vid-mq01-mia'.3.253.0>
>>> [<'rabbit at vid-mq02-mia'.2.252.0>] [<'rabbit at vid-mq02-mia'.2.252.0>]
>>> ...done.
>>>
>>> - Step 2 - I've stopped rabbitmq-server on node1 and node2 assume the
>>> master. Here there are the two messages on node2
>>>
>>> -- Logs on node2
>>> Mirrored-queue (queue 'ha.teste' in vhost '/'): Promoting slave
>>> <'rabbit at vid-mq02-mia'.2.252.0> to master
>>>
>>> -- Status of the cluster on node2
>>>
>>> # rabbitmqctl cluster_status
>>> Cluster status of node 'rabbit at vid-mq02-mia' ...
>>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>> {running_nodes,['rabbit at vid-mq02-mia']},
>>> {partitions,[]}]
>>> ...done.
>>>
>>> -- Status of the ha.teste queue on node2
>>>
>>> # rabbitmqctl list_queues
>>> Listing queues ...
>>> ha.teste 2
>>> ...done.
>>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>>> Listing queues ...
>>> ha.teste <'rabbit at vid-mq02-mia'.2.252.0> [] []
>>> ...done.
>>>
>>> - Step 3 - I start the rabbitmq-server on node1, node2 still master and
>>> node1 is running as slave. Here there are the two messages on node1 and
>>> node2
>>>
>>> -- Log on node2
>>> =INFO REPORT==== 9-Jan-2013::18:40:46 ===
>>> rabbit on node 'rabbit at vid-mq01-mia' up
>>>
>>> -- Status of the cluster on node1
>>>
>>> # rabbitmqctl cluster_status
>>> Cluster status of node 'rabbit at vid-mq01-mia' ...
>>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>> {running_nodes,['rabbit at vid-mq02-mia','rabbit at vid-mq01-mia']},
>>> {partitions,[]}]
>>> ...done.
>>>
>>> -- Status of the ha.teste queue on node2
>>> # rabbitmqctl list_queues
>>> Listing queues ...
>>> ha.teste 2
>>> ...done.
>>>
>>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>>> Listing queues ...
>>> ha.teste <'rabbit at vid-mq02-mia'.2.252.0>
>>> [<'rabbit at vid-mq01-mia'.1.253.0>] []
>>> ...done.
>>>
>>> - Step 4 - I've stopped rabbitmq-server on node2 and node1 assume the
>>> master but the messages disapeared
>>>
>>> -- Logs on node1
>>>
>>> =INFO REPORT==== 9-Jan-2013::18:43:42 ===
>>> Mirrored-queue (queue 'ha.teste' in vhost '/'): Promoting slave
>>> <'rabbit at vid-mq01-mia'.1.253.0> to master
>>>
>>> =INFO REPORT==== 9-Jan-2013::18:43:42 ===
>>> rabbit on node 'rabbit at vid-mq02-mia' down
>>>
>>> -- Cluster status on node1
>>>
>>> # rabbitmqctl cluster_status
>>> Cluster status of node 'rabbit at vid-mq01-mia' ...
>>> [{nodes,[{disc,['rabbit at vid-mq01-mia','rabbit at vid-mq02-mia']}]},
>>> {running_nodes,['rabbit at vid-mq01-mia']},
>>> {partitions,[]}]
>>> ...done.
>>>
>>> -- Status of the ha.teste queue on node1
>>>
>>> # rabbitmqctl list_queues
>>> Listing queues ...
>>> ha.teste 0
>>> ...done.
>>>
>>> # rabbitmqctl list_queues name pid slave_pids synchronised_slave_pids
>>> Listing queues ...
>>> ha.teste <'rabbit at vid-mq01-mia'.1.253.0> [] []
>>> ...done.
>>>
>>> Is it expected? What we have to do at server side or client side (python
>>> script) to make the messages really durable? All the servers are running
>>> centos6 x86_64 and rabbitmq 3.0.1-1.
>>>
>>> Thanks
>>>
>>> Alexandre Bunn
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130111/ebcc4cfc/attachment-0001.htm>
More information about the rabbitmq-discuss
mailing list