[rabbitmq-discuss] How robust is clustering, and under what conditions?

Eugene Kirpichov ekirpichov at gmail.com
Thu Nov 15 13:42:10 GMT 2012


Actually, why couldn't RabbitMQ use a majority quorum for determining who's
the most up-to-date guy?

Assume 3 nodes: A, B, C. Assume that A is the leader.

Stop A: now B and C elect a leader and, say, they elect B.
Stop B: now C knows it's no longer part of a quorum and it just sits there
unresponsive. B doesn't behave like a leader either (if it was just
partitioned and not killed). Ok.
Start A: A and C now form a quorum with C as the most up-to-date member;
they elect C as the leader and A synchronizes from C.
Start B: B synchronizes from C too.

This seems implementable and I believe that's what replicated databases
like Galera do; is it just difficult, or is there a theoretical issue
related to RabbitMQ specifically, or are you ruling out this option because
it requires at least 3 nodes to actually be H/A?


On Thu, Nov 15, 2012 at 5:28 PM, Eugene Kirpichov <ekirpichov at gmail.com>wrote:

> Hi Simon,
>
> Thank you, it all makes sense now.
>
> So, we can say "either reboot one node at a time, or - if you're rebooting
> all of them - make sure they start in reverse order, or simultaneously in a
> window of 30sec max".
>
> Can we also say "if something bad happened, kill -9 all rabbits, then
> start them in a window of 30sec max"? [I'm talking kill -9 because in some
> cases with messed up startup order, rabbitmqctl stop also hangs]
>
>
> On Thu, Nov 15, 2012 at 4:33 PM, Simon MacMullen <simon at rabbitmq.com>wrote:
>
>> On 15/11/12 12:04, Eugene Kirpichov wrote:
>>
>>> Is RabbitMQ HA and clustering sufficiently reliable to use it in
>>> scenarios where the network is good, but nodes can reboot at any time?
>>>
>>
>> We believe so.
>>
>>
>>  My understanding was that this is what "HA" is supposed to mean, but
>>> then I read this:
>>>
>>> http://stackoverflow.com/**questions/8654053/rabbitmq-**
>>> cluster-is-not-reconnecting-**after-network-failure<http://stackoverflow.com/questions/8654053/rabbitmq-cluster-is-not-reconnecting-after-network-failure>
>>>
>>
>> This one was a network partition - clusters don't handle partitions well.
>>
>>  http://rabbitmq.1065348.n5.**nabble.com/Cluster-nodes-stop-**
>>> start-order-can-lead-to-**failures-td21965.html<http://rabbitmq.1065348.n5.nabble.com/Cluster-nodes-stop-start-order-can-lead-to-failures-td21965.html>
>>>
>>
>> This one is the stop-start ordering problem (discussed below).
>>
>>  http://rabbitmq.1065348.n5.**nabble.com/Cluster-busting-**
>>> shut-off-all-nodes-at-the-**same-time-td22971.html<http://rabbitmq.1065348.n5.nabble.com/Cluster-busting-shut-off-all-nodes-at-the-same-time-td22971.html>
>>> :
>>>
>>
>> As was this.
>>
>>  http://rabbitmq.1065348.n5.**nabble.com/Repairing-a-a-**
>>> crashed-cluster-td22466.html<http://rabbitmq.1065348.n5.nabble.com/Repairing-a-a-crashed-cluster-td22466.html>
>>>
>>
>> This one was unclear ("something happened"), but I took the question to
>> be about removing a node from a cluster when that node cannot come up. This
>> is handled badly in 2.x, but 3.0 will have a rabbitmqctl subcommand to do
>> that.
>>
>>  http://grokbase.com/t/**rabbitmq/rabbitmq-discuss/**
>>> 125nxzf5nh/highly-available-**cluster<http://grokbase.com/t/rabbitmq/rabbitmq-discuss/125nxzf5nh/highly-available-cluster>
>>>
>>
>> This is another stop-start ordering problem.
>>
>>
>>  And now I'm not so sure. It seems that there are a lot of scenarios
>>> where merely rebooting the nodes in some order brings the cluster into a
>>> state from which there is no automatic way out.
>>>
>>
>> So the most common problem you cited above looks like this (let's suppose
>> we have a two node cluster AB for simplicity):
>>
>> 1) Stop B
>> 2) Stop A
>> 3) Start B
>> 4) Start A
>>
>> 3) will fail. More precisely, it will wait for 30 seconds to see if 4)
>> happens, and if not then it will fail.
>>
>> Why? Well, a lot could have happened between 1) and 2). You could have
>> declared or deleted all sorts of queues, changed everybody's password, all
>> sorts of things. B has no way to know; it was down.
>>
>> It *can't* (responsibly) start up by itself. So it has to wait around for
>> A to become available.
>>
>> To be more general, the last node to be stopped has to be the first one
>> to be started. No other node knows what's happened in the mean time!
>>
>>
>>  Questions:
>>> 1) Is there a set of assumptions or procedures under which I can be
>>> *certain* that my RabbitMQ cluster will actually tolerate unexpected
>>> node failures? Maybe something like "no more than 1 node down at the
>>> same time", or "at least X seconds between reboots", or "after a node
>>> reboots, restart all rabbit instances" or "have at most 2 nodes" etc.?
>>> I'm asking because I need to at least document this to my customers.
>>>
>>
>> * Avoid network partitions. You can recover (see
>> http://next.rabbitmq.com/**partitions.html<http://next.rabbitmq.com/partitions.html>)
>> but it's a good way to introduce problems.
>>
>> * If you stop all nodes, the first (disc) node to start should be the
>> last one to stop.
>>
>> * If you have RAM nodes, start them after you've started some disc nodes.
>>
>>
>>  2) To what degree are the issues described in those threads fixed in the
>>> next release of RabbitMQ - 3.0.0, and how soon is it expected to be
>>> production-ready?
>>>
>>
>> 3.0.0 will not remove this stop-start ordering constraint. I don't see
>> how anything can.
>>
>> However, it will have some enhancements to make clustering problems
>> easier to detect and fix (such as a removing a dead node without its
>> cooperation, making sure you don't get into a state where nodes disagree on
>> whether they are clustered with each other) and it will also detect and
>> warn more clearly about network partitions.
>>
>> It should be available any day now.
>>
>> Cheers, Simon
>>
>> --
>> Simon MacMullen
>> RabbitMQ, VMware
>>
>
>
>
> --
> Eugene Kirpichov
> http://www.linkedin.com/in/eugenekirpichov
> We're hiring! http://tinyurl.com/mirantis-openstack-engineer
>



-- 
Eugene Kirpichov
http://www.linkedin.com/in/eugenekirpichov
We're hiring! http://tinyurl.com/mirantis-openstack-engineer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121115/5b7d0b05/attachment.htm>


More information about the rabbitmq-discuss mailing list