[rabbitmq-discuss] Delay between minority detected and stopped server

Malte Schirmacher mas at crosscan.com
Tue Aug 13 16:45:24 BST 2013


On 13/08/13 16:20, Simon MacMullen wrote:
> Hi.

Hi Simon,

> It can happen when a node gets two nodedown notifications in rapid
> succession, both of which push it into a minority.

Shouldn't this behavior be fixed quickly?
Suppose the publisher is running on the same machine as the broker and 
someone pulls the network-cable out of this box.
Then it is pushed into minority instantaneously but would possibly still 
accept messages via 127.0.0.1 leading to inconsistency between the 
mirrored queues what we tried to prevent by using pause_minority...

> I'm not able to say what happened to make it take that long to pause,
> but be aware that the pause is (nearly) a complete shutdown so it's not
> inconceivable it could take some time. Was there anything in the logs
> between 13-Aug-2013::11:02:31 and 13-Aug-2013::11:03:46?

Nope, the log was complete.
While do not know a way ti reproduce this behavior it happend again this 
afternoon and i made sure i still could publish messages via 127.0.0.1 
as this log shows:

=WARNING REPORT==== 13-Aug-2013::13:37:20 ===
Cluster minority status detected - awaiting recovery

=ERROR REPORT==== 13-Aug-2013::13:37:20 ===
Error in process <0.872.0> on node 'rabbit at rabbit-test-1' with exit 
value: 
{badarg,[{erlang,register,[rabbit_outside_app_process,<0.872.0>],[]},{rabbit_node_monitor,'-run_outside_applications/1-fun-0-',1,[{file,"src/rabbit_node_monitor.erl"},{line,391}]}]}


=INFO REPORT==== 13-Aug-2013::13:37:53 ===
accepting AMQP connection <0.989.0> (127.0.0.1:45364 -> 127.0.0.1:5672)

=WARNING REPORT==== 13-Aug-2013::13:38:06 ===
closing AMQP connection <0.989.0> (127.0.0.1:45364 -> 127.0.0.1:5672):
connection_closed_abruptly

=INFO REPORT==== 13-Aug-2013::13:38:38 ===
stopped STOMP TCP Listener on [::]:61613

=INFO REPORT==== 13-Aug-2013::13:38:38 ===
stopped TCP Listener on 127.0.0.1:5672

=INFO REPORT==== 13-Aug-2013::13:38:38 ===
stopped TCP Listener on 192.168.123.136:5672


Again this log is complete


> When you say "the aforementioned bug" which are you talking about?

I meant the bug fixed by the patch i linked:
http://hg.rabbitmq.com/rabbitmq-server/rev/be0b06386a8c
Its tag suggests it belongs to bug #25700


> Cheers, Simon

>
> On 13/08/2013 10:29AM, Malte Schirmacher wrote:
>> Hi,
>>
>> we are using rabbitmq 3.1.4 with this bugfix [1] applied.
>>
>> Playing around with the clustering features i came across the following
>> situation.
>> I killed 2 out of 3 machines from the cluster. As expected the remaining
>> machine detected its minority and due to the patch it stopped itself.
>> But there was a delay between the minority detection and stopping the
>> server as the following log entries show:
>>
>> =WARNING REPORT==== 13-Aug-2013::11:02:31 ===
>> Cluster minority status detected - awaiting recovery
>>
>> =ERROR REPORT==== 13-Aug-2013::11:02:31 ===
>> Error in process<0.2836.0>  on node 'rabbit at rabbit-test-3' with exit
>> value:
>> {badarg,[{erlang,register,[rabbit_outside_app_process,<0.2836.0>],[]},{rabbit_node_monitor,'-run_outside_applications/1-fun-0-',1,[{fil
>>
>> e,"src/rabbit_node_monitor.erl"},{line,391}]}]}
>>
>>
>> =INFO REPORT==== 13-Aug-2013::11:03:46 ===
>> stopped STOMP TCP Listener on [::]:61613
>>
>> =INFO REPORT==== 13-Aug-2013::11:03:46 ===
>> stopped TCP Listener on [::]:5672
>>
>>
>> Yet i was unable to reproduce this situation again. But still i'm
>> worried as i kind of lost confidence in the clustering abilities of
>> rabbitmq due to the aforementioned bug (lost messages, lost queues, lost
>> HA-policies, you name it)
>>
>> Anyone able to tell me what went wrong here?
>>
>> Thanks in advance
>>     malte
>>
>>
>> [1] http://hg.rabbitmq.com/rabbitmq-server/rev/be0b06386a8c
>> --
>> Geschaeftsanschrift/Business Address: crosscan GmbH | Ruhrstraße 48 |
>> 58452 Witten | Germany
>> Support: +49.2302.28232-22 Phone: +49.2302.28232-00 Fax:
>> +49.2302.28232-09 Geschaeftsfuehrung/Management Board: Philip Lehmann,
>> Erwin Berg, Ulrich Kellner
>> Sitz Witten, Amtsgericht Bochum, HRB 8036/Registered Office Witten,
>> Commercial Register of the Bochum County Court, HRB 8036
>> UST-ID-Nr./VAT-IdNo.: DE234398770
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>

--
Geschaeftsanschrift/Business Address: crosscan GmbH | Ruhrstraße 48 | 58452 Witten | Germany
Support: +49.2302.28232-22 Phone: +49.2302.28232-00 Fax: +49.2302.28232-09 
Geschaeftsfuehrung/Management Board: Philip Lehmann, Erwin Berg, Ulrich Kellner
Sitz Witten, Amtsgericht Bochum, HRB 8036/Registered Office Witten, Commercial Register of the Bochum County Court, HRB 8036
UST-ID-Nr./VAT-IdNo.: DE234398770



More information about the rabbitmq-discuss mailing list