[rabbitmq-discuss] Exchange Feature request: Drop Duplicates

Simon MacMullen simon at rabbitmq.com
Tue Nov 12 09:48:37 GMT 2013


The trouble is, exchanges are meant to be stateless. So it's possible to 
introduce some state into an exchange, but we have to choose between 
having per-node state (in which case dedup only works per-node), or 
having cluster-global state (where we either funnel all messages through 
one node in the cluster before they get routed to queues, or distribute 
the state around the cluster, making updates into expensive 2PC).

So this is doable but it's not obvious where compromises should be made. 
And as Matthias sort of pointed out, duplication can still happen due to 
redelivery, so this has to be an optimisation rather than something that 
guarantees duplicates won't happen.

Having said all that, it wouldn't be hideously difficult to implement, 
so I might give it a go. Depends on whether anybody else would find such 
a feature useful...

Cheers, Simon

On 11/11/2013 19:28, Laing, Michael wrote:
> Yes - that's actually what we do currently, using Cassandra, and it
> scales well.
>
> And we also do it in memory, at the retail level, and it is very fast as
> well.
>
> I am just trying to shave a millisecond off at the retail level.
>
> Cheers,
>
> Michael
>
>
> On Mon, Nov 11, 2013 at 2:22 PM, Matthias Reik <maze at reik.se
> <mailto:maze at reik.se>> wrote:
>
>     Even though it sounds like a nice feature, it is probably difficult
>     to really implement, if not done on the client side. The duplicates
>     might happen when delivering to the client side. but on the client
>     side it should be quite easy to do the filtering:
>     * get a message from the queue,
>     * check against memcached (couchbase, or some other cache
>     technology) whether the messageID exists.
>     * Add the new message to memcached (can be done with the previous step)
>     * Set the timeout in memcached to your window size.
>
>     This should be straight forward, would scale up to quite a lot of
>     messages) and should remove (depending on your window size) all
>     duplicates.
>
>     Is there a good reason why you wouldn't want to do this on the
>     client side as described?
>
>     Cheers
>     Matthias
>
>     PS: as a caching technology you could of course do your own
>     in-memory-solution but that's probably more work than to use an
>     out-of-the-box solution.
>
>
>     On 2013-11-11 12:35 , Laing, Michael wrote:
>>     In our scenarios, messages are ultimately delivered to a 'retail'
>>     rabbitmq instance to be delivered to a client. The pipelines that
>>     process and deliver messages are purposefully redundant, hence
>>     there may be multiple replicas of each message 'racing' to the
>>     endpoint.
>>
>>     Usually, the replicas are resolved before getting to the retail
>>     rabbit. When components fail, however, duplicates can leak through
>>     during a small window of time. We eliminate those duplicates at
>>     the retail layer by looking at each message_id. Ultimately, our
>>     client contract allows duplicates as well in case one slips by.
>>
>>     It seems to me that this is a generic issue.
>>
>>     What would be useful in our case, and hopefully for many others,
>>     would be a 'Duplicate Message ID Window' in milliseconds, as an
>>     exchange attribute.
>>
>>     If non-zero, the exchange would drop any message with a duplicate
>>     message_id that appeared within the specified window of time,
>>     possibly routing it to the alternate exchange, if set.
>>
>>     In our case, a window of a few seconds, perhaps up to a minute
>>     would suffice.
>>
>>     Thanks,
>>
>>     Michael
>>
>>
>>
>>     _______________________________________________
>>     rabbitmq-discuss mailing list
>>     rabbitmq-discuss at lists.rabbitmq.com  <mailto:rabbitmq-discuss at lists.rabbitmq.com>
>>     https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
>     _______________________________________________
>     rabbitmq-discuss mailing list
>     rabbitmq-discuss at lists.rabbitmq.com
>     <mailto:rabbitmq-discuss at lists.rabbitmq.com>
>     https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>



More information about the rabbitmq-discuss mailing list