[rabbitmq-discuss] Mirror queue recovery questions
Matthew Sackman
matthew at rabbitmq.com
Tue Sep 6 18:07:32 BST 2011
Hi Elias,
On Tue, Sep 06, 2011 at 09:59:26AM -0700, Elias Levy wrote:
> I did not get a response to my questions, so let me try again.
Sorry about that - it managed to slip down a crack...
> What happens to persisted messages in mirrored queue slaves if they are
> restarted and they don't find a queue master?
Assuming that at least one node that is meant to have a copy of the
queue on it survives, there'll always be a master. Thus other nodes
coming up will throw away their own contents, and start receiving new
content from the master. Note there is currently no eager
resynchronisation, as the documentation (hopefully) makes clear.
In the event that the entire cluster is stopped, due to mnesia, it's
normally required that the last node down comes up first - indeed the
other nodes in the cluster may very well refuse to start up (they'll
just block in the "starting database" step) until the nodes that were
alive when it died reappear.
Thus, again, as the documentation says, as nodes fall off, the master
will continue to migrate until the mirrored queue has just one node left
which is its master. Due to mnesia, it's very likely that this node will
_have_ to come up before any of the nodes that died earlier can start
up.
However, in catastrophic cases, such as instantaneous powerloss to the
whole cluster subsequent startup may well not be as orderly. In this
case, it's possible that the first node to recover that contained a
mirror of the queue, regardless of whether it was a master or slave,
takes the role of master. All other mirrors will then become slaves.
> When using mirrored queued,must we ensure that the last node shutdown that
> was participating in a mirrored queue, and thus the one that became master
> last, is the one that is restarted first, so that the last master is online
> when slaves join the cluster?
Yes, but as I say, the effect of mnesia (the distributed database that
Rabbit uses) is likely to enforce the correct startup ordering itself
(at least, it's quite good at that on recent versions of Erlang).
I hope that helps.
Matthew
More information about the rabbitmq-discuss
mailing list