[rabbitmq-discuss] Fully reliable setup impossible?

Thu May 22 20:04:47 BST 2014

Hi Simon,

Thank you for your reply.

>
>  We have two data centers connected closely by LAN.
>> We are interested in a *reliable cluster* setup. It must be a cluster
>> because we want clients to be able to connect to each node
>> transparently. Federation is not an option.
>>
>
> I hope you realise that you are asking for a lot here! You should read up
> on the CAP theorem if you have not already done so.
>

Yes, I know. But I am not asking an unreasonably lot, IMO :-)
I am aware of the CAP theorem, but I don't see how it is in violation. I am
willing to live with eventual consistency.

1. It happens that the firewall/switch is restarted, and maybe a few
>> ping messages are lost.
>> 2. The setup should survive data center crash
>> 3. All queues are durable and mirrored, all messages are persisted, all
>> publishes are confirmed
>> There are 3 cluster-recovery settings
>> a) ignore: A cross data center network break-down would cause message
>> loss on the node that is restarted In order to rejoin.
>> b) pause_minority: If we choose the same number of nodes in each data
>> center, the whole cluster will pause. If we don't, only the data center
>> with the most nodes can survive.
>> c) auto_heal: If the cluster decides network partitioning, there is a
>> potential of message loss, when joining.
>> [I would really like a resync-setting similar to the one described below]
>> Question 1: Is it even possible to have a fully reliable setup in such a
>> setting?
>>
>
> Depends how you define "fully reliable". If you want Consistency (i.e.
> mirrored queues), Availability (i.e. neither data centre pauses) and
> Partition tolerance (no loss of data from either side if the network goes
> down between them) then I'm afraid you can't.
>

What I mean when I say "reliable" is: All subscribers at the time of
publish will eventually get the message.

That should be possible, assuming that all live inconsistent nodes will
eventually rejoin (without dumping messages). I know this is not the case
in rabbitmq, but it is definitely theoretically possible. I guess this is
what is usually referred to as eventual consistency.

> In reality we probably won't have actual network partitions, and it will
>> most probably only be a very short network downtime.
>> Question 2: Is it possible to adjust how long it takes rabbitmq to
>> decide "node down"?
>>
>
> Yes, see http://www.rabbitmq.com/nettick.html

Thank you! (!)
I have been looking for that one. But I am surprised to see that it is
actually 60sec. Then I really don't understand how I could have seen so
many clusters ending up partitioned.

Do you know what the consequence of doubling it might be?

RabbitMq writes:
Increasing the net_ticktime across all nodes in a cluster will make the
cluster more resilient to short network outtages, but it will take longer
for remaing nodes to detect crashed nodes.

More specifically I wonder what happens in the time a node is actually in
its own network, but before it finds out. In our setup all publishes have
all-HA subscriber queues, with publisher confirm. So I will expect a
distributed agreement that the msg has been persisted. Will a publisher
confirm then block until the node decides that other nodes are down, and
then succeed?

It is much better to have a halted rabbitmq for some seconds than to
>> have message loss.
>> Question 3: Assume that we are using the ignore setting, and that we
>> have only two nodes in the cluster. Would the following be a full
>> recovery with zero message loss?
>> 0. Decide which node survives, Ns, and which should be restarted, Nr.
>> 1. Refuse all connections to Nr except from a special recovery
>> application. (One could change the ip, so all running services can't
>> connect or similar)
>> 2. Consume and republish all message from Nr to Ns.
>> 3. Restart Nr
>> Then the cluster should be up-and-running again.
>>
>
> That sounds like it would work. You're losing some availability and
> consistency, and your message ordering will change. You have a pretty good
> chance of duplicating lots of messages too (any that were in the queues
> when the partition happened). Assuming you're happy with that it sounds
> reasonable.
>

The duplication is ok -- but assuming that rabbit is usually empty, it
won't really happen, I think.
But -- I am sure that rabbit does not guarantee exactly once delivery
anyway.
For that reason, we will build in idempotency for critical messages.

Ordering can always get scrambled when nacking consuming messages, so we
are not assuming ordering either.

About the CAP theorem in relation to rabbit.
Reliable messaging (zero message loss), is often preferred in SOA-settings.
I wonder why vmware/pivotal/... chose not to prioritize this guarantee. It
is aimed by the federation setup, but it is a little to weak in its
synchronization. It would be preferred if it had a possibility of
communicating consumption of messages. Then one could mirror queues between
up/down-stream exchanges, and have even more "availability". One would
definitely give up consistency a little further, but it would be possible
to have the setup above, I think. I know it definitely doesn't come
out-of-the-box, and it is not a part of AMQP, AFAIK, but it seems possible.

Thank you, Simon!

-- S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140522/fdf9cfd4/attachment.html>