[rabbitmq-discuss] Mirrored queue failover

Katsushi Fukui ka.fukui at ms.scsk.jp
Tue May 8 10:09:13 BST 2012


I also checked this issue with v2.8.2, but the situation is not improved.

Kats


(2012/04/19 14:41), Katsushi Fukui wrote:
> Hi Matthew,
>
> I found an interesting behavior and logs when I repeatedly restarted a node of the cluster.
> I rebuilt a new cluster and checked a logs of the mirrored queue again. Now the master of the mirrored queue is rabbit1, the slaves are rabbit2 and rabbit3. When rabbit3 is stopped, the logs of the rabbit1 shows:
> =INFO REPORT==== 19-Apr-2012::11:18:58 ===
> Mirrored-queue (queue 'que1' in vhost '/'): Master <rabbit at rabbit1.2.594.0> saw deaths of mirrors <rabbit at rabbit3.3.229.0>
>
> =INFO REPORT==== 19-Apr-2012::11:18:59 ===
> rabbit on node rabbit at rabbit3 down
>
>
> This means that que1 detected death of the queue slave on rabbit3 and node is down. But if I repeat restarting the slaves over and over, sometime logs shows like this:
> =INFO REPORT==== 19-Apr-2012::11:56:26 ===
> Mirrored-queue (queue 'que1' in vhost '/'): Master <rabbit at rabbit1.2.594.0> saw deaths of mirrors <rabbit at rabbit3.3.229.0>
>
> =INFO REPORT==== 19-Apr-2012::11:56:26 ===
> rabbit on node rabbit at rabbit3 down
>
> =INFO REPORT==== 19-Apr-2012::11:56:35 ===
> rabbit on node rabbit at rabbit3 up
>
> =INFO REPORT==== 19-Apr-2012::11:56:38 ===
> rabbit on node rabbit at rabbit2 down
>
> =INFO REPORT==== 19-Apr-2012::11:59:32 ===
> rabbit on node rabbit at rabbit2 up
>
>
> Despite stopping rabbit2, que1 doesn't report death of mirror. Now rabbit2 is up, but que1 has only one slave on rabbit3. Next, I stopped rabbit3 and the logs shows:
> =INFO REPORT==== 19-Apr-2012::12:00:04 ===
> Mirrored-queue (queue 'que1' in vhost '/'): Master <rabbit at rabbit1.2.594.0> saw deaths of mirrors <rabbit at rabbit2.3.229.0> <rabbit at rabbit3.1.227.0>
>
> =INFO REPORT==== 19-Apr-2012::12:00:04 ===
> rabbit on node rabbit at rabbit3 down
>
> Que1 detected deaths of two slaves. Finally I restarted rabbit2 and rabbit3, so que1 got two slaves.
> It looks there is a case a mirrored queue can not detect the failure of the slaves.
>
> Kats
>
>
>> Thank you, Matthew,
>>
>> I created an exchange "ex1" that is bound to que1, and published a message using a publisher was connected to rabbit3 as you suggested.
>> I could send it without returning error, but the logs of rabbit3 shows this error:
>> =ERROR REPORT==== 17-Apr-2012::11:12:30 ===
>> Discarding message {'$gen_cast',{deliver,{delivery,false,false,<0.3061.0>,{basic_message,{resource,<<1 byte>>,exchange,<<3 bytes>>},[<<5 bytes>>],{content,60,{'P_basic',<<10 bytes>>,undefined,undefined,2,0,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},<<15 bytes>>,rabbit_framing_amqp_0_9_1,[<<12 bytes>>]},<<16 bytes>>,true},undefined},flow}} from <0.3061.0> to <0.229.0> in an old incarnation (3) of this node (2)
>>
>> The result of list_queues is:
>> # ./rabbitmqctl list_queues name messages slave_pids synchronised_slave_pids
>> Listing queues ...
>> que1 1 [<rabbit at rabbit2.1.229.0>] [<rabbit at rabbit2.1.229.0>]
>> ...done.
>>
>> I wonder if this problem is only occurred in my environment.
>>




More information about the rabbitmq-discuss mailing list