[rabbitmq-discuss] Mirrored queue failover

Matthew Sackman matthew at rabbitmq.com
Thu Apr 5 12:26:33 BST 2012


Hi Kats,

(Just popping this back on the mailing list in case others are seeing
the same problem)

On Thu, Apr 05, 2012 at 05:47:36PM +0900, Katsushi Fukui wrote:
> But today I re-built new 3-nodes cluster and got the same situation now (unfortunately). Attached logs.
> 
> The result of list_queues are odd. Now rabbit3 has an error, and list_queues on that node shows different results.
> rabbit1:
> # ./rabbitmqctl list_queues name durable pid slave_pids synnchronised_slave_pids
> Listing queues ...
> que1	true	<rabbit at rabbit1.1.578.0>	[<rabbit at rabbit2.2.856.0>]	[<rabbit at rabbit2.2.856.0>]
> ...done.
> 
> rabbit2:
> # ./rabbitmqctl list_queues name durable pid slave_pids synnchronised_slave_pids
> Listing queues ...
> que1	true	<rabbit at rabbit1.1.578.0>	[<rabbit at rabbit2.2.856.0>]	[<rabbit at rabbit2.2.856.0>]
> ...done.
> 
> rabbit3:
> # ./rabbitmqctl list_queues name durable pid slave_pids
> Listing queues ...
> que1	true	<rabbit at rabbit1.1.578.0>	[<rabbit at rabbit3.3.705.0>, <rabbit at rabbit2.2.856.0>]
> ...done.
> 
> Please check the logs rabbit1-3-script.log.
> 
> If I stop rabbit1 now, the que1 loose all slaves like this:
> Listing queues ...
> que1    <rabbit at rabbit2.2.856.0>        []
> ...done.

I think this is a mis-reporting issue actually - the errors in the logs
are indicating more that the querying the slaves for their status is the
problem, not that the slaves don't exist. That doesn't mean the slave on
3 *does* exist, but the error doesn't indicate it doesn't, if you see
what I mean.

Could you repeat the test, and when you get to the same situation (i.e.
a slave seems to have vanished), stop both of the other nodes and then
check the logs of the "phantom" node.

So in your above example, you stopped rabbit1, which should promote
rabbit2 to master (and there should be log entries indicating that).
Then, even though it looks like there's no slave on rabbit3, try
stopping rabbit2 too, and see if the queue then still exists on rabbit3
- eg a rabbitmqctl -n rabbit at rabbit3 list_queues, and also again check
the logs of rabbit3 to see if there are messages about the promotion of
a slave to master.

Matthew


More information about the rabbitmq-discuss mailing list