[rabbitmq-discuss] Fully reliable setup impossible?

Fri May 23 12:16:39 BST 2014

On 22/05/14 20:04, Steffen Daniel Jensen wrote:
> Yes, I know. But I am not asking an unreasonably lot, IMO :-)
> I am aware of the CAP theorem, but I don't see how it is in violation. I
> am willing to live with eventual consistency.

Ah, right.

> What I mean when I say "reliable" is: All subscribers at the time of
> publish will eventually get the message.
>
> That should be possible, assuming that all live inconsistent nodes will
> eventually rejoin (without dumping messages). I know this is not the
> case in rabbitmq, but it is definitely theoretically possible. I guess
> this is what is usually referred to as eventual consistency.

So why did you originally say:

> It must be a cluster because we want clients to be able to connect to
> each node transparently. Federation is not an option.

...because it really sounds like federation is what you want :-)

>     Yes, see http://www.rabbitmq.com/__nettick.html
>     <http://www.rabbitmq.com/nettick.html>
>
>
> Thank you! (!)
> I have been looking for that one. But I am surprised to see that it is
> actually 60sec. Then I really don't understand how I could have seen so
> many clusters ending up partitioned.
>
> Do you know what the consequence of doubling it might be?

It will take twice as long to conclude that a remote node that is no 
longer connected has actually gone away. Until then, things can block.

> RabbitMq writes:
> Increasing the net_ticktime across all nodes in a cluster will make the
> cluster more resilient to short network outtages, but it will take
> longer for remaing nodes to detect crashed nodes.
>
> More specifically I wonder what happens in the time a node is actually
> in its own network, but before it finds out. In our setup all publishes
> have all-HA subscriber queues, with publisher confirm. So I will expect
> a distributed agreement that the msg has been persisted.

Yes.

> Will a
> publisher confirm then block until the node decides that other nodes are
> down, and then succeed?

Yes.

> The duplication is ok -- but assuming that rabbit is usually empty, it
> won't really happen, I think.
> But -- I am sure that rabbit does not guarantee exactly once delivery
> anyway.
> For that reason, we will build in idempotency for critical messages.
>
> Ordering can always get scrambled when nacking consuming messages, so we
> are not assuming ordering either.

OK.

> About the CAP theorem in relation to rabbit.
> Reliable messaging (zero message loss), is often preferred in
> SOA-settings. I wonder why vmware/pivotal/... chose not to prioritize
> this guarantee. It is aimed by the federation setup, but it is a little
> to weak in its synchronization. It would be preferred if it had a
> possibility of communicating consumption of messages. Then one could
> mirror queues between up/down-stream exchanges, and have even more
> "availability".

I'm not sure what you're talking about here. Federation certainly should 
be able to ensure zero message loss! (Assuming you leave it in 
"on-confirm" ack-mode).

So when you say "if it had a possibility of communicating consumption of 
messages" you're talking about a sort of eventually consistent federated 
mirrored queue? I have wondered about producing such a thing. But as 
usual the list of things we could do is large, and the resources small. 
And I suspect the cost of producing it would be quite large, mostly due 
to the need to somehow reunify the queues after a partition.

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, Pivotal