[rabbitmq-discuss] Fully reliable setup impossible?
Ron Cordell
ron.cordell at gmail.com
Thu May 22 21:04:20 BST 2014
It seems that one of your assumptions is that a cluster would operate
across data centers. This is not recommended for RabbitMQ and we don't use
it that way - we use shovels between clusters since we, like you, can deal
with eventual consistency.
Our clusters are similar to what you describe. For us, (almost) all queues
are persistent and mirrored because we can't tolerate message loss. We have
seen significant sensitivity to partitioning in Windows OS-based clusters
under very heavy load; we do not see this on Linux and can run at least
twice the load as well.
There is an F5 in front of the cluster, but it doesn't do load balancing
but just acts as a persistent router. We've found that by directing all
traffic to one node and letting it replicate to other nodes we don't have
to deal with issues if a network partition occurs, and they *do* occur
(earlier this week one of the virtual NICs stopped on one of the Rabbit
nodes, for example). The F5 will detect this very quickly and divert
traffic to another node if necessary. (Note to others - we've found that
this arrangement scales significantly better than round robin load
balancing for mirrored persistent queues).
We have clusters in each data center; for clusters that need to replicate
to a different data center there is the shovel. However, there is a chance
of message loss if the application sends the message to the cluster in one
DC and that entire DC is hit by a meteor before the message can be
delivered to the other data center. For most scenarios, however, the
messages will be eventually delivered when the DC comes back up.
Hope that helps a little...
Cheers,
-ronc
On Thu, May 22, 2014 at 12:04 PM, Steffen Daniel Jensen <
steffen.daniel.jensen at gmail.com> wrote:
> Hi Simon,
>
> Thank you for your reply.
>
>>
>> We have two data centers connected closely by LAN.
>>> We are interested in a *reliable cluster* setup. It must be a cluster
>>> because we want clients to be able to connect to each node
>>> transparently. Federation is not an option.
>>>
>>
>> I hope you realise that you are asking for a lot here! You should read up
>> on the CAP theorem if you have not already done so.
>>
>
> Yes, I know. But I am not asking an unreasonably lot, IMO :-)
> I am aware of the CAP theorem, but I don't see how it is in violation. I
> am willing to live with eventual consistency.
>
> 1. It happens that the firewall/switch is restarted, and maybe a few
>>> ping messages are lost.
>>> 2. The setup should survive data center crash
>>> 3. All queues are durable and mirrored, all messages are persisted, all
>>> publishes are confirmed
>>> There are 3 cluster-recovery settings
>>> a) ignore: A cross data center network break-down would cause message
>>> loss on the node that is restarted In order to rejoin.
>>> b) pause_minority: If we choose the same number of nodes in each data
>>> center, the whole cluster will pause. If we don't, only the data center
>>> with the most nodes can survive.
>>> c) auto_heal: If the cluster decides network partitioning, there is a
>>> potential of message loss, when joining.
>>> [I would really like a resync-setting similar to the one described below]
>>> Question 1: Is it even possible to have a fully reliable setup in such a
>>> setting?
>>>
>>
>> Depends how you define "fully reliable". If you want Consistency (i.e.
>> mirrored queues), Availability (i.e. neither data centre pauses) and
>> Partition tolerance (no loss of data from either side if the network goes
>> down between them) then I'm afraid you can't.
>>
>
> What I mean when I say "reliable" is: All subscribers at the time of
> publish will eventually get the message.
>
> That should be possible, assuming that all live inconsistent nodes will
> eventually rejoin (without dumping messages). I know this is not the case
> in rabbitmq, but it is definitely theoretically possible. I guess this is
> what is usually referred to as eventual consistency.
>
>
>> In reality we probably won't have actual network partitions, and it will
>>> most probably only be a very short network downtime.
>>> Question 2: Is it possible to adjust how long it takes rabbitmq to
>>> decide "node down"?
>>>
>>
>> Yes, see http://www.rabbitmq.com/nettick.html
>
>
> Thank you! (!)
> I have been looking for that one. But I am surprised to see that it is
> actually 60sec. Then I really don't understand how I could have seen so
> many clusters ending up partitioned.
>
> Do you know what the consequence of doubling it might be?
>
> RabbitMq writes:
> Increasing the net_ticktime across all nodes in a cluster will make the
> cluster more resilient to short network outtages, but it will take longer
> for remaing nodes to detect crashed nodes.
>
> More specifically I wonder what happens in the time a node is actually in
> its own network, but before it finds out. In our setup all publishes have
> all-HA subscriber queues, with publisher confirm. So I will expect a
> distributed agreement that the msg has been persisted. Will a publisher
> confirm then block until the node decides that other nodes are down, and
> then succeed?
>
> It is much better to have a halted rabbitmq for some seconds than to
>>> have message loss.
>>> Question 3: Assume that we are using the ignore setting, and that we
>>> have only two nodes in the cluster. Would the following be a full
>>> recovery with zero message loss?
>>> 0. Decide which node survives, Ns, and which should be restarted, Nr.
>>> 1. Refuse all connections to Nr except from a special recovery
>>> application. (One could change the ip, so all running services can't
>>> connect or similar)
>>> 2. Consume and republish all message from Nr to Ns.
>>> 3. Restart Nr
>>> Then the cluster should be up-and-running again.
>>>
>>
>> That sounds like it would work. You're losing some availability and
>> consistency, and your message ordering will change. You have a pretty good
>> chance of duplicating lots of messages too (any that were in the queues
>> when the partition happened). Assuming you're happy with that it sounds
>> reasonable.
>>
>
> The duplication is ok -- but assuming that rabbit is usually empty, it
> won't really happen, I think.
> But -- I am sure that rabbit does not guarantee exactly once delivery
> anyway.
> For that reason, we will build in idempotency for critical messages.
>
> Ordering can always get scrambled when nacking consuming messages, so we
> are not assuming ordering either.
>
>
> About the CAP theorem in relation to rabbit.
> Reliable messaging (zero message loss), is often preferred in
> SOA-settings. I wonder why vmware/pivotal/... chose not to prioritize this
> guarantee. It is aimed by the federation setup, but it is a little to weak
> in its synchronization. It would be preferred if it had a possibility of
> communicating consumption of messages. Then one could mirror queues between
> up/down-stream exchanges, and have even more "availability". One would
> definitely give up consistency a little further, but it would be possible
> to have the setup above, I think. I know it definitely doesn't come
> out-of-the-box, and it is not a part of AMQP, AFAIK, but it seems possible.
>
> Thank you, Simon!
>
> -- S
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140522/6972a4b1/attachment.html>
More information about the rabbitmq-discuss
mailing list