[rabbitmq-discuss] One Producer, X Consumers where X can change

Mon Jan 14 09:37:19 GMT 2013

Hi,

On 01/14/2013 08:34 AM, Shadowalker wrote:
> Hi again,
> Been doing a lot of googling on the queue/topic/listening for consumed 
> messages count an found this on activemq :
>
> http://activemq.apache.org/cms/handling-advisory-messages.html 
> <http://activemq.apache.org/cms/handling-advisory-messages.html>
> It allows one to check the count of currently listening consumer a 
> queue/topic.
>

I would not recommend an architecture for distributed resource tracking 
based on that. What happens if a consumer is temporarily disconnected 
when you go into the check, but reconnects after (or whilst) the rest of 
the participants are being updated? You've introduced even more 
possibilities for race conditions than before.

What I would suggest is that you carefully consider whether you actually 
need synchronous communications here, as messaging based architectures 
inherently de-couple producers from consumers, yet you've repeatedly 
attempted to force some 'awareness' of consumers into the producer 
whilst discussing this design. I would like to posit that this reveals 
an 'impedance mismatch' between your requirements and the inherently 
disconnected nature of a queue based solution. Of course distributed 
locking is often implemented using asynchronous communication protocols, 
but this is usually done at a *much lower protocol level* - I'd suggest 
researching Paxos or similar distributed consensus algorithms to get an 
idea of what's involved in designing a reliable solution to this kind of 
problem.

> Is there anything like this in rabbit mq ?

Not that I know of, although it's possible to use the HTTP APIs in order 
to track consumers but that is, as I mentioned above, subject to lots of 
*nasty* race conditions. You *could* look at using Tony G's presence 
exchange (https://github.com/tonyg/presence-exchange) to track bindings 
- although this would complicate your topology quite a lot, it might 
make tracking the various participants plausibly useful, providing you 
use a known set of binding keys.

> This might allow me to create a listener that would only send a 
> message to notify the first manager that the references were removed.

I'm not clear on how that helps!? I did have a bit of an early start 
this morning though... ;)

> Another could be to define the "delete referenrences" message to live 
> for x consumptions (x being the number of listener on the "delete 
> references" queue) and add an advisory listener on the deletion of the 
> message from the queue to process deletion of initial data.

That doesn't help at all unless you've actually tracked the number of 
acquired messages in the first place. Plus you *can* do that without 
'detecting' the number of consumers. You just insist on getting a 
'make-ref' message from the consumer (with some unique id) before 
incrementing the reference count. There's no real difference between 
*detecting* the consumer's connection/channel and providing a ref/lock 
acquisition queue, except that the latter is probably more structured, 
architecturally clearer and quite likely to be more reliable.

Even if you used ActiveMQ's detection functionality or RabbitMQ's 
management HTTP APIs, the fundamental problem of race conditions 
wouldn't go away. Before we go much further discussing various ways you 
can design a solution - and I *am* interested in this discussion BTW - 
please read 
http://en.wikipedia.org/wiki/Byzantine_fault_tolerance#Byzantine_failures and 
make sure you've understood the consequences of nodes *just 
disappearing* and then *maybe* coming back later on.

You've also still not explained what the consequences of loosing track 
of resources actually are. If one of your nodes dies, when it comes back 
to life has any state been persisted and will that state thus be used to 
try and re-acquire or release the 'lock count' for this resource? What 
happens if your node sends an 'acquire' request asynchronously, then 
starts to write the resource/lock state to its own local database and 
dies (e.g., the machine crashes) before committing the transaction? 
Because the 'acquire' request was not synchronous, the master now thinks 
that your node holds the lock, whilst the node does *not* think the 
same. If you bring the node back online and then start asking for the 
resource lock, you're breaking the contract for lock acquisition on that 
node unless you're willing to make 'acquire' idempotent, which has its 
own pitfalls. If you don't make 'acquire' idempotent, then acquisition 
will fail. If you try to handle this by making 'acquire' re-entrant and 
then try to release the node's locl, the master will be confused as it 
thinks you hold the lock twice and the *lost lock acquisition* will 
never be released.

tldr; this is not a simple problem.

Cheers,
Tim

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130114/c3dfa35a/attachment.htm>