[rabbitmq-discuss] cluster buster

Mark Ward ward.mark at gmail.com
Wed Oct 17 17:04:34 BST 2012


Tim's post to my earlier question about the inet_dist_listen setting might 
be impacting my testing as my cluster in the tests described were using a 
single port.  I have performed a test with 7 ports in the range and still 
received the same results. I am in the process of doing this test again 
with about 9 ports in the range.

the thread 
http://rabbitmq.1065348.n5.nabble.com/understanding-the-inet-dist-listen-min-inet-dist-listen-max-tc22644.html


On Wednesday, October 17, 2012 10:53:13 AM UTC-5, Mark Ward wrote:
>
> Hello,
> I am testing what happens to the cluster when nodes are shutdown.  The 
> results of my testing have not been positive.  I hope to either be advised 
> if I have a mistake in my cluster setup or if something is really going 
> wrong.
>
> The issue is when I shutdown a server using the Windows Services to stop a 
> node the cluster does fail over the queue but does not continue delivering 
> messages to the subscriber.  The subscriber also does not receive any 
> errors.  The client's connection still has a running status but no data 
> flows.  I am able to reproduce this error.
>
> I have added two test run results to this email. I have more tests and the 
> results saved if more information is required.  I can also rerun future 
> tests.
>
> I have the following puny virtual machines for my cluster.
> 3 Virtual Machines
> • Windows 2008 R2 sp1 64-bit
> • 1 GB of RAM
> • 16 GB of drive space free
> RabbitMQ: 2.8.7
> Erlang: R15B02
> RabbitMQ is setup as the following on each of the nodes:
> [ {kernel, [{inet_dist_listen_min, 55021}, {inet_dist_listen_max, 55021} 
> ] },
> {rabbit, 
> [{cluster_nodes, ['rabbit at RIOBARON-1', 'rabbit at RIOOVERLORD-1', 
> 'rabbit at CUST1-MASTER']}
> ,{vm_memory_high_watermark, 0.2}]
> } 
> ].
>
> Virtual Host: MoBunniesMoProblems
> Queue: MoData: Mirror(all) and Durable
> Admin is setup on all nodes.
> Using the .NET client version: 2.8.7 (nuget download)
> The publisher uses SelectConfirm with messages set to persistent.
> The subscriber uses BasicConsume with BasicQos set depending on the test. 
> (10, 1000, 5)
> Message size has been 3,196 bytes
> 100,000 messages sent in each test.
>
>
> ---------------------------------------------------------------------------------------------
> Test 01:
> Assumptions: shutdown cluster node.  A new server will be master of the 
> queue.  subscriber should continue to receive messages.  reconnection is 
> possible and duplicate messages are possible.
> Result: A node took ownership of the node's queue.  The subscriber failed 
> to continue after the node was shutdown.  connection was never dropped.
>
> Test Notes
> Before testing make sure all servers in the cluster are active and see 
> each other.  Admin confirmed this with all green for nodes.
> Publish 100k messages 3,196 in size with no subscriber active.
> When all messages have been published Turn on subscriber.
> During subscriber's active download of the messages shut down the node 
> that is the master of the queue.
> Queue will fail over to the next oldest node.
> Subscriber connection will not be inturrupted but will stop receiving data.
>
> riobaron-1: the connection server.  The publisher will send to this 
> server.  The subscriber will connect to this server.
> riooverlord-1: MoData queue master.  queue was mirrored on riobaron-1 and 
> cust1-master.  Stats server.
> cust1-master: mirror: second oldest server in the cluster.  
>
> During the publishing the high watermark for memory and drive space was 
> never reached.
>
> After riooverlord-1 was shutdown using Windows Service manager to shutdown 
> the rabbitMQ service.  cust1-master was promoted as the MoData queue master.
> Subscriber data send stopped but the connection was still active.  Timeout 
> on the connection was set to 25 seconds.  The connection never timed out 
> even after letting it sit for 24 hours.
>
> Results of log files:
> riobaron-1's log file
> =INFO REPORT==== 16-Oct-2012::08:02:15 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' down
>
>
> =ERROR REPORT==== 16-Oct-2012::08:02:15 ===
> Mnesia('rabbit at RIOBARON-1'): ** ERROR ** mnesia_event got 
> {inconsistent_database, bad_decision, 'rabbit at RIOOVERLORD-1'}
>
> cust1-master's log file
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Statistics database started.
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'Q' in vhost '/'): Slave 
> <'rabbit at CUST1-MASTER'.2.255.0> saw deaths of mirrors 
> <'rabbit at RIOOVERLORD-1'.1.220.0> 
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'Q' in vhost '/'): Promoting slave 
> <'rabbit at CUST1-MASTER'.2.255.0> to master
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' down
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Slave 
> <'rabbit at CUST1-MASTER'.2.253.0> saw deaths of mirrors 
> <'rabbit at RIOOVERLORD-1'.1.219.0> 
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Promoting 
> slave <'rabbit at CUST1-MASTER'.2.253.0> to master
>
>
> =ERROR REPORT==== 16-Oct-2012::08:02:18 ===
> Mnesia('rabbit at CUST1-MASTER'): ** ERROR ** mnesia_event got 
> {inconsistent_database, bad_decision, 'rabbit at RIOOVERLORD-1'}
>
> riooverlord-1's log file
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> Stopping Rabbit
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     application: rabbitmq_management
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     application: rabbitmq_management_agent
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> stopped TCP Listener on 0.0.0.0:5672
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> stopped TCP Listener on [::]:5672
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     application: rabbit
>     exited: stopped
>     type: permanent
>
>
> =ERROR REPORT==== 16-Oct-2012::08:02:16 ===
> Error in process <0.20312.2> on node 'rabbit at RIOOVERLORD-1' with exit 
> value: 
> {badarg,[{mnesia_tm,commit_participant,6,[{file,"mnesia_tm.erl"},{line,1750}]}]}
>
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     application: mnesia
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     application: os_mon
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> Halting Erlang VM
>
>
>
> ---------------------------------------------------------------------------------------------
> Test 02:
> Assumption: send data into cluster. shutdown cluster master node of the 
> queue.  subscriber should continue to receive messages.
> Result: a new node promoted to master of the queue upon the shutdown. 
>  subscriber did not receive data after the promotion.  connection was still 
> running and no errors from the client.
>
> Test Notes:
> All nodes in the cluster were online and green.  The queue was mirrored on 
> all nodes.
> queue starts off with 0 messages in the queue.
>
> riobaron-1: connection server.  publisher connected to this server. 
>  subscriber connects to this server.  youngest server in the cluster
> cust1-master: stats server and master of MoData queue.   oldest server
> riooverlord: second oldest server.
>
> After the publisher completed the subscriber was turned on.  During the 
> subscribers connection and active data being received the riobaron-1 server 
> was shutdown.  The subscriber reconnected to riooverlord-1.  Data resumed 
> to send to the subscriber.  During the subscription the cust1-master node 
> was shutdown.  Subscription stopped.  connection remained active and 
> running.
> rioverlord-1 become the new master of the MoData queue.
>
> Waited on the cluster but the connection remained active but no data 
> received from the cluster.  When the client is shutdown and makes a new 
> connection sending will resume.
>
> Results from log files:
>
> riobaron-1: 
> =INFO REPORT==== 17-Oct-2012::08:15:35 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> Stopping Rabbit
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     application: rabbitmq_management
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     application: rabbitmq_management_agent
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> stopped TCP Listener on 0.0.0.0:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> stopped TCP Listener on [::]:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     application: rabbit
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     application: mnesia
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     application: os_mon
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> Halting Erlang VM
>
> cust1-master:
> =INFO REPORT==== 17-Oct-2012::08:15:12 ===
> Statistics database started.
>
>
> =INFO REPORT==== 17-Oct-2012::08:15:32 ===
> rabbit on node 'rabbit at RIOBARON-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:15:38 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:16:03 ===
> Adding mirror of queue 'MoData' in vhost 'MoBunnyMoProblems' on node 
> 'rabbit at RIOOVERLORD-1': <3151.343.0>
>
>
> =INFO REPORT==== 17-Oct-2012::08:16:03 ===
> Adding mirror of queue 'MoData' in vhost 'MoBunnyMoProblems' on node 
> 'rabbit at RIOBARON-1': <3150.354.0>
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:49 ===
> rabbit on node 'rabbit at RIOBARON-1' down
>
>
> =INFO REPORT==== 17-Oct-2012::08:27:11 ===
> rabbit on node 'rabbit at RIOBARON-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:05 ===
> rabbit on node 'rabbit at RIOBARON-1' down
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> Stopping Rabbit
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
>     application: rabbitmq_management
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
>     application: rabbitmq_management_agent
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> stopped TCP Listener on 0.0.0.0:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> stopped TCP Listener on [::]:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
>     application: rabbit
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
>     application: mnesia
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
>     application: os_mon
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
> Halting Erlang VM
>
> riooverlord-1:
>
> =INFO REPORT==== 17-Oct-2012::08:24:47 ===
> rabbit on node 'rabbit at RIOBARON-1' down
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:47 ===
> Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Slave 
> <'rabbit at RIOOVERLORD-1'.3.343.0> saw deaths of mirrors 
> <'rabbit at RIOBARON-1'.2.354.0> 
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:47 ===
> Mirrored-queue (queue 'Q' in vhost '/'): Slave 
> <'rabbit at RIOOVERLORD-1'.3.258.0> saw deaths of mirrors 
> <'rabbit at RIOBARON-1'.2.252.0> 
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121017/01848a6c/attachment.htm>


More information about the rabbitmq-discuss mailing list