[rabbitmq-discuss] Mirrored queue failover

Katsushi Fukui ka.fukui at ms.scsk.jp
Thu Apr 19 06:41:59 BST 2012


Hi Matthew,

I found an interesting behavior and logs when I repeatedly restarted a node of the cluster.
I rebuilt a new cluster and checked a logs of the mirrored queue again. Now the master of the mirrored queue is rabbit1, the slaves are rabbit2 and rabbit3. When rabbit3 is stopped, the logs of the rabbit1 shows:
=INFO REPORT==== 19-Apr-2012::11:18:58 ===
Mirrored-queue (queue 'que1' in vhost '/'): Master <rabbit at rabbit1.2.594.0> saw deaths of mirrors <rabbit at rabbit3.3.229.0>

=INFO REPORT==== 19-Apr-2012::11:18:59 ===
rabbit on node rabbit at rabbit3 down


This means that que1 detected death of the queue slave on rabbit3 and node is down. But if I repeat restarting the slaves over and over, sometime logs shows like this:
=INFO REPORT==== 19-Apr-2012::11:56:26 ===
Mirrored-queue (queue 'que1' in vhost '/'): Master <rabbit at rabbit1.2.594.0> saw deaths of mirrors <rabbit at rabbit3.3.229.0>

=INFO REPORT==== 19-Apr-2012::11:56:26 ===
rabbit on node rabbit at rabbit3 down

=INFO REPORT==== 19-Apr-2012::11:56:35 ===
rabbit on node rabbit at rabbit3 up

=INFO REPORT==== 19-Apr-2012::11:56:38 ===
rabbit on node rabbit at rabbit2 down

=INFO REPORT==== 19-Apr-2012::11:59:32 ===
rabbit on node rabbit at rabbit2 up


Despite stopping rabbit2, que1 doesn't report death of mirror. Now rabbit2 is up, but que1 has only one slave on rabbit3. Next, I stopped rabbit3 and the logs shows:
=INFO REPORT==== 19-Apr-2012::12:00:04 ===
Mirrored-queue (queue 'que1' in vhost '/'): Master <rabbit at rabbit1.2.594.0> saw deaths of mirrors <rabbit at rabbit2.3.229.0> <rabbit at rabbit3.1.227.0>

=INFO REPORT==== 19-Apr-2012::12:00:04 ===
rabbit on node rabbit at rabbit3 down

Que1 detected deaths of two slaves. Finally I restarted rabbit2 and rabbit3, so que1 got two slaves.
It looks there is a case a mirrored queue can not detect the failure of the slaves.

Kats


> Thank you, Matthew,
>
> I created an exchange "ex1" that is bound to que1, and published a message using a publisher was connected to rabbit3 as you suggested.
> I could send it without returning error, but the logs of rabbit3 shows this error:
> =ERROR REPORT==== 17-Apr-2012::11:12:30 ===
> Discarding message {'$gen_cast',{deliver,{delivery,false,false,<0.3061.0>,{basic_message,{resource,<<1 byte>>,exchange,<<3 bytes>>},[<<5 bytes>>],{content,60,{'P_basic',<<10 bytes>>,undefined,undefined,2,0,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},<<15 bytes>>,rabbit_framing_amqp_0_9_1,[<<12 bytes>>]},<<16 bytes>>,true},undefined},flow}} from <0.3061.0> to <0.229.0> in an old incarnation (3) of this node (2)
>
> The result of list_queues is:
> # ./rabbitmqctl list_queues name messages slave_pids synchronised_slave_pids
> Listing queues ...
> que1 1 [<rabbit at rabbit2.1.229.0>] [<rabbit at rabbit2.1.229.0>]
> ...done.
>
> I wonder if this problem is only occurred in my environment.
>
>
>
> (2012/04/16 22:16), Matthew Sackman wrote:
>> Hi,
>>
>> I'm afraid I really don't know what to suggest. It looks like somehow
>> Erlang is not properly noticing rabbit3 coming back up, and getting
>> itself very confused. I don't know why this would be.
>>
>> You say you're already using 2.8.1 and R15B, so I can't really just
>> suggest upgrading. Perhaps one further thing to test is when rabbit3
>> comes back up, connect a publisher to it (specifically rabbit3 rather
>> than any other node) and try to publish to an exchange routing to the
>> problematic queue. I'd be curious to know whether those messages
>> actually end up in the queue or not.
>>
>> For some reason, whilst rabbit3 seems to know about 1 and 2, 1 and 2 are
>> a long way from convinced they know about 3. Something is very unhappy
>> in the cluster, and I've really no idea why.
>>
>> Matthew
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>




More information about the rabbitmq-discuss mailing list