Matthew,<div> an excellent response and thank you for it! Yes, difficult it is!</div><div><br></div><div>It raises a somewhat philosophical discussion around where the onus is placed in terms of guaranteeing such things as 'guaranteed once', i.e., on the client side or on the server side? The JMS standard offers guaranteed once, whereby the onus is on the server (JMS implementation) and not on the client. </div>
<div><br></div><div>What I am trying to say is that, in my opinion, client programs should be as 'simple' as possible with the servers doing all the hard work. This is what the JMS standard forces on implementors and, perhaps to a lesser extent today, do does AMQP.</div>
<div><br></div><div>Note: the word 'server' is horribly overloaded these days. It is used here to indicate the software with which clients, producers and consumers, communicate.</div><div><br></div><div>Oh well, off to librabbitMQ and some example programs written in COBOL...</div>
<div><br></div><div>Cheers, John<br><div class="gmail_quote">On Thu, Aug 5, 2010 at 13:22, Matthew Sackman <span dir="ltr"><<a href="mailto:matthew@rabbitmq.com">matthew@rabbitmq.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi Mike,<br>
<br>
On Tue, Aug 03, 2010 at 04:43:56AM -0400, Mike Petrusis wrote:<br>
> In reviewing the mailing list archives, I see various threads which state that ensuring "exactly once" delivery requires deduplication by the consumer. For example the following:<br>
><br>
> "Exactly-once requires coordination between consumers, or idempotency,<br>
> even when there is just a single queue. The consumer, broker or network<br>
> may die during the transmission of the ack for a message, thus causing<br>
> retransmission of the message (which the consumer has already seen and<br>
> processed) at a later point." <a href="http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-July/004237.html" target="_blank">http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-July/004237.html</a><br>
><br>
> In the case of competing consumers which pull messages from the same queue, this will require some sort of shared state between consumers to de-duplicate messages (assuming the consumers are not idempotent).<br>
><br>
> Our application is using RabbitMQ to distribute tasks across multiple workers residing on different servers, this adds to the cost of sharing state between the workers.<br>
><br>
> Another message in the email archive mentions that "You can guarantee exactly-once delivery if you use transactions, durable queues and exchanges, and persistent messages, but only as long as any failing node eventually recovers."<br>
<br>
All the above is sort of wrong. You can never *guarantee* exactly once<br>
(there's always some argument about whether receiving message duplicates<br>
but relying on idempotency is achieving exactly once. I don't feel it<br>
does, and this should become clearer as to why further on...)<br>
<br>
The problem is publishers. If the server on which RabbitMQ is running<br>
crashes, after commiting a transaction containing publishes, it's<br>
possible the commit-ok message may get lost. Thus the publishers still<br>
think they need to republish, so wait until the broker comes back up and<br>
then republishes. This can happen an infinite number of times: the<br>
publishers connect, start a transaction, publish messages, commit the<br>
transaction and then the commit-ok gets lost and so the publishers<br>
repeat the process.<br>
<br>
As a result, on the clients, you need to detect duplicates. Now this is<br>
really a barrier to making all operations idempotent. The problem is<br>
that you never know how many copies of a message there will be. Thus you<br>
never know when it's safe to remove messages from your dedup cache. Now<br>
things like redis apparently have the means to delete entries after an<br>
amount of time, which would at least allow you to avoid the database<br>
eating up all the RAM in the universe, but there's still the possibility<br>
that after the entry's been deleted, another duplicate will come along<br>
which you now won't detect as a duplicate.<br>
<br>
This isn't just a problem with RabbitMQ - in any messaging system, if<br>
any message can be lost, you can not achieve exactly once semantics. The<br>
best you can hope for is a probability of a large number of 9s that you<br>
will be able to detect all the duplicates. But that's the best you can<br>
achieve.<br>
<br>
Scaling horizontally is thus more tricky because, as you say, you may<br>
now have multiple consumers which each receive one copy of a message.<br>
Thus the dedup database would have to be distributed. With high message<br>
rates, this might well become prohibitive because of the amount of<br>
network traffic due to transactions between the consumers.<br>
<br>
> What's the recommended way to deal with the potential of duplicate messages?<br>
<br>
Currently, there is no "recommended" way. If you have a single consumer,<br>
it's quite easy - something like tokyocabinet should be more than<br>
sufficiently performant. For multiple consumers, you're currently going<br>
to have to look at some sort of distributed database.<br>
<br>
> Is this a rare enough edge case that most people just ignore it?<br>
<br>
No idea. But one way of making your life easier is for the producer to<br>
send slightly different messages on every republish (they would still<br>
obviously need to have the same msg id). That way, if you detect a msg<br>
with "republish count" == 0, then you know it's the first copy, so you<br>
can insert async into your shared database and then act on the message.<br>
You only need to do a query on the database whenever you receive a msg<br>
with "republish count" > 0 - thus you can tune your database for<br>
inserts and hopefully save some work - the common case will then be the<br>
first case, and lookups will be exceedingly rare.<br>
<br>
The question then is: if you've received a msg, republish count > 0 but<br>
there are no entries in the database, what do you do? It shouldn't have<br>
overtaken the first publish (though if consumers disconnected without<br>
acking, or requeued messages, it could have), but you need to cause some<br>
sort of synchronise operation between all the consumers to ensure none<br>
are in the process of adding to the database - it all gets a bit hairy<br>
at this point.<br>
<br>
Thus if your message rate is low, you're much safer doing the insert and<br>
select on every message. If that's too expensive, you're going to have<br>
to think very hard indeed about how to avoid races between different<br>
consumers thinking they're both/all responsible for acting on the same<br>
message.<br>
<br>
This stuff isn't easy.<br>
<br>
Matthew<br>
_______________________________________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>
</blockquote></div><br><br clear="all"><br>-- <br>---<br>John Apps<br>(49) 171 869 1813<br>
</div>