<p style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);clear:both;word-wrap:break-word;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
There is a problem that we are trying to deal with queues. I posted the problem on Stackoverflow and got a very quick response from <a href="http://stackoverflow.com/users/1397341/alexis">Alexis</a> which does clear a lot of things but I have a few more questions.</p>
<p style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);clear:both;word-wrap:break-word;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
<font size="4">The problem as stated on <a href="http://stackoverflow.com/questions/11241837/synchronize-one-queue-instance-with-multiple-redis-instances">Stackoverflow</a>:</font></p><p style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);clear:both;word-wrap:break-word;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
<strong style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;font-weight:bold">The Scenario:</strong></p>
<p style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);clear:both;word-wrap:break-word;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
We have multiple nodes distributed geographically on which we want to have queues collecting messages for that location. And then we want to send this collected data from every queue in every node to their corresponding queues in a central location. In the central node, we will pull out data collected in the queues (from other nodes), process it and store it persistently.</p>
<p style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);clear:both;word-wrap:break-word;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
Constraints:</p><ul style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:30px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);list-style-position:initial;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
<li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
Data is very important to us. Therefore, we have to make sure that we are not loosing data in any case.</li><li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
Therefore, we need persistent queues on every node so that even if the node goes down for some random reason, when we bring it up we have the collected data safe with us and we can send it to the central node where it can be processed.</li>
<li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
Similarly, if the central node goes down, the data must remain at all the other nodes so that when the central node comes up we can send all the data to the central node for processing.</li><li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
Also, the data on the central node must not get duplicated or stored again. That is data collected on one of the nodes should be stored on the central nodes only once.</li><li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
The data that we are collecting is very important to us and the order of data delivery to the central node is not an issue.</li></ul><p style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);clear:both;word-wrap:break-word;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
<strong style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;font-weight:bold">Our Solution:</strong></p>
<p style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);clear:both;word-wrap:break-word;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
We have considered a couple of solutions out of which I am going to list down the one that we thought would be the best. A possible solution (in our opinion) is to use Redis to maintain queues everywhere because Redis provides persistent storage. Then perhaps have a daemon running on all the geographically separated nodes which reads the data from the queue and sends it to the central node. The central node on receiving the data sends an ACK to the node it received the data from (because data is very important to us) and then on receiving the ACK, the node deletes the data from the queue. Of course, there will be timeout period in which the ACK must be received.</p>
<p style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);clear:both;word-wrap:break-word;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
<strong style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;font-weight:bold">The Problem:</strong></p>
<p style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);clear:both;word-wrap:break-word;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
The above stated solution (according to us) will work fine but the issue is that we don't want to implement the whole synchronization protocol by ourselves for the simple reason that we might be wrong here. We were unable to find this particular way of synchronization in Redis. So we are open to other AMQP based queues like RabbitMQ, ZeroMQ, etc. Again we were not able to figure out if we can do this with these solutions.</p>
<ul style="margin-top:0px;margin-right:0px;margin-bottom:1em;margin-left:30px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);list-style-position:initial;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px;text-align:left">
<li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
Do these Message Queues or any other data store provide features that can be the solution to our problem? If yes, then how?</li><li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
If not, then is our solution good enough?</li><li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
Can anyone suggest a better solution?</li><li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
Can there be a better way to do this?</li><li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
What would be the best way to make it fail safe?</li><li style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;vertical-align:baseline;background-image:initial;background-color:transparent;word-wrap:break-word">
The data that we are collecting is very important to us and the order of data delivery to the central node is not an issue.</li></ul><div style="text-align:left"><font face="Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif"><span style="line-height:16px">---</span></font></div>
<div style="text-align:left"><font face="Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif"><span style="line-height:16px"><br></span></font></div><div style="text-align:left"><font face="Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif" size="4"><span style="line-height:16px">Response from Alexis:</span></font></div>
<div style="text-align:left"><span style="font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;font-size:13px;line-height:16px;background-color:rgb(255,255,255)"><br></span></div><div style="text-align:left">
<span style="font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;font-size:13px;line-height:16px;background-color:rgb(255,255,255)">You could do this with RabbitMQ by setting up the central node (or cluster of nodes) to be a consumer of messages from the other nodes, and using the message acknowledgement feature. This feature means that the central node(s) can ack delivery, so that other nodes only delete messages after the ack. See for example: </span><a href="http://www.rabbitmq.com/tutorials/tutorial-two-python.html" rel="nofollow" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;border-top-width:0px;border-right-width:0px;border-bottom-width:0px;border-left-width:0px;border-style:initial;border-color:initial;font-size:13px;vertical-align:baseline;background-image:initial;background-color:rgb(255,255,255);color:rgb(74,107,130);text-decoration:none;font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;line-height:16px">http://www.rabbitmq.com/tutorials/tutorial-two-python.html</a></div>
<div style="text-align:left"><br></div><div style="text-align:left">---</div><div style="text-align:left"><br></div><div style="text-align:left">The response from Alexis actually solves most of our problems. But, there is one more thing to consider.</div>
<div style="text-align:left"><br></div><div style="text-align:left"><b>I had stated it as a constraint in the original post that:</b></div><div style="text-align:left"><span style="font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;font-size:13px;line-height:17px;background-color:rgb(250,250,250)">"Also, the data on the central node must not get duplicated or stored again. That is data collected on one of the nodes should be stored on the central nodes only once."</span></div>
<div style="text-align:left"><span style="font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;font-size:13px;line-height:17px;background-color:rgb(250,250,250)"><br></span></div><div style="text-align:left">
<span style="font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;font-size:13px;line-height:17px;background-color:rgb(250,250,250)">How do I ensure this? Consider the scenario in which the ACK does not get delivered due to network issues (in our scenario). What happens in that case? The queue still is not aware of the status of the completion of the work. Does the message in that case get locked? Or does another worker pick it up? If another worker picks it up, then will we have the same data worked twice? How do we deal this situation?</span></div>
<div style="text-align:left"><span style="font-family:Arial,'Liberation Sans','DejaVu Sans',sans-serif;font-size:13px;line-height:17px;background-color:rgb(250,250,250)"><br></span></div><div style="text-align:left">
<font face="Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif"><span style="line-height:17px">Thanks in advance,</span></font></div><div style="text-align:left"><font face="Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif"><span style="line-height:17px">Vaidik</span></font></div>