[rabbitmq-discuss] Exchange Feature request: Drop Duplicates

Tue Nov 12 10:54:15 GMT 2013

Yes I can see the point about statelessness.

It seems to me that in a messaging fabric, it is generally useful to have
ways of dampening duplicates.

It occurred to me this morning that federation uses hop counts - in some
topologies, esp. with planned redundancy, this does not work so well, and
perhaps a feature like this would help.

Michael

On Tue, Nov 12, 2013 at 4:48 AM, Simon MacMullen <simon at rabbitmq.com> wrote:

> The trouble is, exchanges are meant to be stateless. So it's possible to
> introduce some state into an exchange, but we have to choose between having
> per-node state (in which case dedup only works per-node), or having
> cluster-global state (where we either funnel all messages through one node
> in the cluster before they get routed to queues, or distribute the state
> around the cluster, making updates into expensive 2PC).
>
> So this is doable but it's not obvious where compromises should be made.
> And as Matthias sort of pointed out, duplication can still happen due to
> redelivery, so this has to be an optimisation rather than something that
> guarantees duplicates won't happen.
>
> Having said all that, it wouldn't be hideously difficult to implement, so
> I might give it a go. Depends on whether anybody else would find such a
> feature useful...
>
> Cheers, Simon
>
>
> On 11/11/2013 19:28, Laing, Michael wrote:
>
>> Yes - that's actually what we do currently, using Cassandra, and it
>> scales well.
>>
>> And we also do it in memory, at the retail level, and it is very fast as
>> well.
>>
>> I am just trying to shave a millisecond off at the retail level.
>>
>> Cheers,
>>
>> Michael
>>
>>
>> On Mon, Nov 11, 2013 at 2:22 PM, Matthias Reik <maze at reik.se
>> <mailto:maze at reik.se>> wrote:
>>
>>     Even though it sounds like a nice feature, it is probably difficult
>>     to really implement, if not done on the client side. The duplicates
>>     might happen when delivering to the client side. but on the client
>>     side it should be quite easy to do the filtering:
>>     * get a message from the queue,
>>     * check against memcached (couchbase, or some other cache
>>     technology) whether the messageID exists.
>>     * Add the new message to memcached (can be done with the previous
>> step)
>>     * Set the timeout in memcached to your window size.
>>
>>     This should be straight forward, would scale up to quite a lot of
>>     messages) and should remove (depending on your window size) all
>>     duplicates.
>>
>>     Is there a good reason why you wouldn't want to do this on the
>>     client side as described?
>>
>>     Cheers
>>     Matthias
>>
>>     PS: as a caching technology you could of course do your own
>>     in-memory-solution but that's probably more work than to use an
>>     out-of-the-box solution.
>>
>>
>>     On 2013-11-11 12:35 , Laing, Michael wrote:
>>
>>>     In our scenarios, messages are ultimately delivered to a 'retail'
>>>     rabbitmq instance to be delivered to a client. The pipelines that
>>>     process and deliver messages are purposefully redundant, hence
>>>     there may be multiple replicas of each message 'racing' to the
>>>     endpoint.
>>>
>>>     Usually, the replicas are resolved before getting to the retail
>>>     rabbit. When components fail, however, duplicates can leak through
>>>     during a small window of time. We eliminate those duplicates at
>>>     the retail layer by looking at each message_id. Ultimately, our
>>>     client contract allows duplicates as well in case one slips by.
>>>
>>>     It seems to me that this is a generic issue.
>>>
>>>     What would be useful in our case, and hopefully for many others,
>>>     would be a 'Duplicate Message ID Window' in milliseconds, as an
>>>     exchange attribute.
>>>
>>>     If non-zero, the exchange would drop any message with a duplicate
>>>     message_id that appeared within the specified window of time,
>>>     possibly routing it to the alternate exchange, if set.
>>>
>>>     In our case, a window of a few seconds, perhaps up to a minute
>>>     would suffice.
>>>
>>>     Thanks,
>>>
>>>     Michael
>>>
>>>
>>>
>>>     _______________________________________________
>>>     rabbitmq-discuss mailing list
>>>     rabbitmq-discuss at lists.rabbitmq.com  <mailto:rabbitmq-discuss@
>>> lists.rabbitmq.com>
>>>     https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>
>>
>>
>>     _______________________________________________
>>     rabbitmq-discuss mailing list
>>     rabbitmq-discuss at lists.rabbitmq.com
>>     <mailto:rabbitmq-discuss at lists.rabbitmq.com>
>>
>>     https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>
>>
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131112/e1335401/attachment.htm>