[rabbitmq-discuss] Delay between minority detected and stopped server

Malte Schirmacher mas at crosscan.com
Tue Aug 13 10:29:35 BST 2013


Hi,

we are using rabbitmq 3.1.4 with this bugfix [1] applied.

Playing around with the clustering features i came across the following 
situation.
I killed 2 out of 3 machines from the cluster. As expected the remaining 
machine detected its minority and due to the patch it stopped itself. 
But there was a delay between the minority detection and stopping the 
server as the following log entries show:

=WARNING REPORT==== 13-Aug-2013::11:02:31 ===
Cluster minority status detected - awaiting recovery

=ERROR REPORT==== 13-Aug-2013::11:02:31 ===
Error in process <0.2836.0> on node 'rabbit at rabbit-test-3' with exit 
value: 
{badarg,[{erlang,register,[rabbit_outside_app_process,<0.2836.0>],[]},{rabbit_node_monitor,'-run_outside_applications/1-fun-0-',1,[{fil
e,"src/rabbit_node_monitor.erl"},{line,391}]}]}


=INFO REPORT==== 13-Aug-2013::11:03:46 ===
stopped STOMP TCP Listener on [::]:61613

=INFO REPORT==== 13-Aug-2013::11:03:46 ===
stopped TCP Listener on [::]:5672


Yet i was unable to reproduce this situation again. But still i'm 
worried as i kind of lost confidence in the clustering abilities of 
rabbitmq due to the aforementioned bug (lost messages, lost queues, lost 
HA-policies, you name it)

Anyone able to tell me what went wrong here?

Thanks in advance
   malte


[1] http://hg.rabbitmq.com/rabbitmq-server/rev/be0b06386a8c
--
Geschaeftsanschrift/Business Address: crosscan GmbH | Ruhrstraße 48 | 58452 Witten | Germany
Support: +49.2302.28232-22 Phone: +49.2302.28232-00 Fax: +49.2302.28232-09 
Geschaeftsfuehrung/Management Board: Philip Lehmann, Erwin Berg, Ulrich Kellner
Sitz Witten, Amtsgericht Bochum, HRB 8036/Registered Office Witten, Commercial Register of the Bochum County Court, HRB 8036
UST-ID-Nr./VAT-IdNo.: DE234398770



More information about the rabbitmq-discuss mailing list