[rabbitmq-discuss] Clustered startup with multiple queues and multiple masters

Matt Pietrek mpietrek at skytap.com
Mon Jun 18 19:16:40 BST 2012


Francesco,

Thanks very much for the detailed reply. It was extremely helpful.

A few more clarifying questions and background.

All of our nodes are disk-based.
Version upgrades aren't an issue. We plan to keep all nodes in lock step.

Question 1
----------------
When you suggested starting a cluster after an abrupt, unplanned
shutdown of all nodes, you said:

> 1) Start one disc node. If it hangs waiting for the table, try the next one until one works.

I stumbled across this in my own previous experimentation. The
question is, do I risk message loss by first starting a node that
joined the cluster late, thus not having the full set of messages that
other nodes have?

Question 2
---------------
Related to the above scenario, is there any danger (after an unplanned
shutdown), in simply letting all the nodes start in parallel and
letting Mnesia's waiting sort out the order? It seems to work OK in my
limited testing so far, but I don't know if we're risking data loss.


Question 3
---------------
You said:

> In other words, it's up to you to restart them so that the node with the most up-to-date mnesia is started first

Is there any information recorded somewhere (e.g. the logs), which
would indicate which node has the "most up to date" Mnesia database? I
see messages like:

 > Mirrored-queue (queue 'charon' in vhost '/'): Promoting slave
<rabbit at play2.1.273.0> to master

But don't know if they're necessarily correlated to who the most
up-to-date Mnesia is.


Thanks very much for your help on this,

Matt

On Wed, Jun 13, 2012 at 2:54 AM, Francesco Mazzoli
<francesco at rabbitmq.com> wrote:
> Hi,
>
>> As I understand from other messages on this forum, in a clustered
>> setup, the last node shut down should be the first node set up. Again
>> (in my possibly incorrect assumption), this is because Rabbit and/or
>> Mnesia may wait for what they believe to be the previous master to
>> come up first.
>
> That's correct, this is because mnesia wants to make sure that the
> node with the most up-to-date dataset starts up first, so that we
> avoid diverging tables.
>
>> Now, consider a situation like this, where there are N queues that are
>> mastered on different brokers (e.g, rabbit at play, rabbit at play2). If we
>> pulled the power cord on all these machines, what should the node
>> startup order be?
>
> If you shut down the nodes abruptly, rabbit won't complain when
> starting the nodes because in whatever order because it won't know
> about the running nodes at the time of shutdown (which are recorded in
> a file in the shutdown sequence). In other words, it's up to you to
> restart them so that the node with the most up-to-date mnesia is
> started first (what will happen if mnesia thinks that we're not the
> most up to date one is mnesia will hang waiting for the table copies
> in the other nodes, which are offline).
>
>> And at the risk of asking a broader question, what is the recommended
>> approach to restarting from a catastrophic power failure where all
>> nodes go down within a very short period of time?
>
> I would say that the safest thing to do here is:
>
>  1) Start one disc node. If it hangs waiting for the table, try the
>     next one until one works. If none works, things are ugly, and
>     I can think of ways of fixing them manually but that's more
>     complicated (and dangerous)
>  2) Start another node without starting rabbit (you can do
>     that setting the RABBITMQ_NODE_ONLY env variable)
>  3) Reset it, force_cluster it to the disc node you brought up,
>     and then reset it again. This will make the disc node believe that
>     the original node has left the cluster.
>  4) Once you have done this for each node, you will be left with only
>     one node which is not in a cluster, and you can cluster your nodes
>     back to that one.
>
> This is pretty ugly but it's the only safe way in all situations, due
> to the possibility of the nodes performing upgrades. If you're sure
> that the nodes won't need to upgrade (e.g. same version of rabbit and
> erlang) you can perform step 1 and then just start the other nodes
> normally later, and it should be OK. Someone else in the team might
> have a better idea, but I don't :).
>
> By the way, we're working hard on making this process and in general
> clustering simpler and safer, so in the future things should be
> better.
>
> Francesco
>
> At Tue, 12 Jun 2012 10:29:52 -0700,
> Matt Pietrek wrote:
>>
>> Looking for some clarification here.
>>
>> As I understand from other messages on this forum, in a clustered
>> setup, the last node shut down should be the first node set up. Again
>> (in my possibly incorrect assumption), this is because Rabbit and/or
>> Mnesia may wait for what they believe to be the previous master to
>> come up first. By starting up the "master" first, any blocking/waiting
>> can be avoided. In addition, message loss can be avoided by preventing
>> a prior out-of-sync slave from becoming the master.
>>
>> Now, consider a situation like this, where there are N queues that are
>> mastered on different brokers (e.g, rabbit at play, rabbit at play2). If we
>> pulled the power cord on all these machines, what should the node
>> startup order be?
>>
>> real_cm rabbit at play +2  HA D Active 0 0 0
>> aliveness-test rabbit at play  Active 0 0 0
>> carbon rabbit at play +2  HA D Idle 0 0 0
>> cmcmd rabbit at play +2  HA D Idle 0 0 0
>> fake_cm rabbit at play2 +2  HA D Idle 0 0 0
>> fake_mu_queue rabbit at play2 +2  HA D Idle 0 0 0
>> fake_service_2 rabbit at play +2  HA D Idle 0 0 0
>> random rabbit at play +2  HA D Idle
>>
>> And at the risk of asking a broader question, what is the recommended
>> approach to restarting from a catastrophic power failure where all
>> nodes go down within a very short period of time?
>>
>> In our experiments with RabbitMQ 2.82, Ubuntu 10.04 and Erlang R13B03,
>> it's a total crap shoot whether the cluster comes back up or hangs
>> with all nodes stuck at the "starting database...." point.
>>
>> Thanks,
>>
>> Matt
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


More information about the rabbitmq-discuss mailing list