[rabbitmq-discuss] cluster buster
Mark Ward
ward.mark at gmail.com
Wed Oct 17 17:04:34 BST 2012
Tim's post to my earlier question about the inet_dist_listen setting might
be impacting my testing as my cluster in the tests described were using a
single port. I have performed a test with 7 ports in the range and still
received the same results. I am in the process of doing this test again
with about 9 ports in the range.
the thread
http://rabbitmq.1065348.n5.nabble.com/understanding-the-inet-dist-listen-min-inet-dist-listen-max-tc22644.html
On Wednesday, October 17, 2012 10:53:13 AM UTC-5, Mark Ward wrote:
>
> Hello,
> I am testing what happens to the cluster when nodes are shutdown. The
> results of my testing have not been positive. I hope to either be advised
> if I have a mistake in my cluster setup or if something is really going
> wrong.
>
> The issue is when I shutdown a server using the Windows Services to stop a
> node the cluster does fail over the queue but does not continue delivering
> messages to the subscriber. The subscriber also does not receive any
> errors. The client's connection still has a running status but no data
> flows. I am able to reproduce this error.
>
> I have added two test run results to this email. I have more tests and the
> results saved if more information is required. I can also rerun future
> tests.
>
> I have the following puny virtual machines for my cluster.
> 3 Virtual Machines
> • Windows 2008 R2 sp1 64-bit
> • 1 GB of RAM
> • 16 GB of drive space free
> RabbitMQ: 2.8.7
> Erlang: R15B02
> RabbitMQ is setup as the following on each of the nodes:
> [ {kernel, [{inet_dist_listen_min, 55021}, {inet_dist_listen_max, 55021}
> ] },
> {rabbit,
> [{cluster_nodes, ['rabbit at RIOBARON-1', 'rabbit at RIOOVERLORD-1',
> 'rabbit at CUST1-MASTER']}
> ,{vm_memory_high_watermark, 0.2}]
> }
> ].
>
> Virtual Host: MoBunniesMoProblems
> Queue: MoData: Mirror(all) and Durable
> Admin is setup on all nodes.
> Using the .NET client version: 2.8.7 (nuget download)
> The publisher uses SelectConfirm with messages set to persistent.
> The subscriber uses BasicConsume with BasicQos set depending on the test.
> (10, 1000, 5)
> Message size has been 3,196 bytes
> 100,000 messages sent in each test.
>
>
> ---------------------------------------------------------------------------------------------
> Test 01:
> Assumptions: shutdown cluster node. A new server will be master of the
> queue. subscriber should continue to receive messages. reconnection is
> possible and duplicate messages are possible.
> Result: A node took ownership of the node's queue. The subscriber failed
> to continue after the node was shutdown. connection was never dropped.
>
> Test Notes
> Before testing make sure all servers in the cluster are active and see
> each other. Admin confirmed this with all green for nodes.
> Publish 100k messages 3,196 in size with no subscriber active.
> When all messages have been published Turn on subscriber.
> During subscriber's active download of the messages shut down the node
> that is the master of the queue.
> Queue will fail over to the next oldest node.
> Subscriber connection will not be inturrupted but will stop receiving data.
>
> riobaron-1: the connection server. The publisher will send to this
> server. The subscriber will connect to this server.
> riooverlord-1: MoData queue master. queue was mirrored on riobaron-1 and
> cust1-master. Stats server.
> cust1-master: mirror: second oldest server in the cluster.
>
> During the publishing the high watermark for memory and drive space was
> never reached.
>
> After riooverlord-1 was shutdown using Windows Service manager to shutdown
> the rabbitMQ service. cust1-master was promoted as the MoData queue master.
> Subscriber data send stopped but the connection was still active. Timeout
> on the connection was set to 25 seconds. The connection never timed out
> even after letting it sit for 24 hours.
>
> Results of log files:
> riobaron-1's log file
> =INFO REPORT==== 16-Oct-2012::08:02:15 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' down
>
>
> =ERROR REPORT==== 16-Oct-2012::08:02:15 ===
> Mnesia('rabbit at RIOBARON-1'): ** ERROR ** mnesia_event got
> {inconsistent_database, bad_decision, 'rabbit at RIOOVERLORD-1'}
>
> cust1-master's log file
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Statistics database started.
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'Q' in vhost '/'): Slave
> <'rabbit at CUST1-MASTER'.2.255.0> saw deaths of mirrors
> <'rabbit at RIOOVERLORD-1'.1.220.0>
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'Q' in vhost '/'): Promoting slave
> <'rabbit at CUST1-MASTER'.2.255.0> to master
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' down
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Slave
> <'rabbit at CUST1-MASTER'.2.253.0> saw deaths of mirrors
> <'rabbit at RIOOVERLORD-1'.1.219.0>
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Promoting
> slave <'rabbit at CUST1-MASTER'.2.253.0> to master
>
>
> =ERROR REPORT==== 16-Oct-2012::08:02:18 ===
> Mnesia('rabbit at CUST1-MASTER'): ** ERROR ** mnesia_event got
> {inconsistent_database, bad_decision, 'rabbit at RIOOVERLORD-1'}
>
> riooverlord-1's log file
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> Stopping Rabbit
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> application: rabbitmq_management
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> application: rabbitmq_management_agent
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> stopped TCP Listener on 0.0.0.0:5672
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> stopped TCP Listener on [::]:5672
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> application: rabbit
> exited: stopped
> type: permanent
>
>
> =ERROR REPORT==== 16-Oct-2012::08:02:16 ===
> Error in process <0.20312.2> on node 'rabbit at RIOOVERLORD-1' with exit
> value:
> {badarg,[{mnesia_tm,commit_participant,6,[{file,"mnesia_tm.erl"},{line,1750}]}]}
>
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> application: mnesia
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> application: os_mon
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> Halting Erlang VM
>
>
>
> ---------------------------------------------------------------------------------------------
> Test 02:
> Assumption: send data into cluster. shutdown cluster master node of the
> queue. subscriber should continue to receive messages.
> Result: a new node promoted to master of the queue upon the shutdown.
> subscriber did not receive data after the promotion. connection was still
> running and no errors from the client.
>
> Test Notes:
> All nodes in the cluster were online and green. The queue was mirrored on
> all nodes.
> queue starts off with 0 messages in the queue.
>
> riobaron-1: connection server. publisher connected to this server.
> subscriber connects to this server. youngest server in the cluster
> cust1-master: stats server and master of MoData queue. oldest server
> riooverlord: second oldest server.
>
> After the publisher completed the subscriber was turned on. During the
> subscribers connection and active data being received the riobaron-1 server
> was shutdown. The subscriber reconnected to riooverlord-1. Data resumed
> to send to the subscriber. During the subscription the cust1-master node
> was shutdown. Subscription stopped. connection remained active and
> running.
> rioverlord-1 become the new master of the MoData queue.
>
> Waited on the cluster but the connection remained active but no data
> received from the cluster. When the client is shutdown and makes a new
> connection sending will resume.
>
> Results from log files:
>
> riobaron-1:
> =INFO REPORT==== 17-Oct-2012::08:15:35 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> Stopping Rabbit
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> application: rabbitmq_management
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> application: rabbitmq_management_agent
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> stopped TCP Listener on 0.0.0.0:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> stopped TCP Listener on [::]:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> application: rabbit
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> application: mnesia
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> application: os_mon
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> Halting Erlang VM
>
> cust1-master:
> =INFO REPORT==== 17-Oct-2012::08:15:12 ===
> Statistics database started.
>
>
> =INFO REPORT==== 17-Oct-2012::08:15:32 ===
> rabbit on node 'rabbit at RIOBARON-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:15:38 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:16:03 ===
> Adding mirror of queue 'MoData' in vhost 'MoBunnyMoProblems' on node
> 'rabbit at RIOOVERLORD-1': <3151.343.0>
>
>
> =INFO REPORT==== 17-Oct-2012::08:16:03 ===
> Adding mirror of queue 'MoData' in vhost 'MoBunnyMoProblems' on node
> 'rabbit at RIOBARON-1': <3150.354.0>
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:49 ===
> rabbit on node 'rabbit at RIOBARON-1' down
>
>
> =INFO REPORT==== 17-Oct-2012::08:27:11 ===
> rabbit on node 'rabbit at RIOBARON-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:05 ===
> rabbit on node 'rabbit at RIOBARON-1' down
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> Stopping Rabbit
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> application: rabbitmq_management
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> application: rabbitmq_management_agent
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> stopped TCP Listener on 0.0.0.0:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> stopped TCP Listener on [::]:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
> application: rabbit
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
> application: mnesia
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
> application: os_mon
> exited: stopped
> type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
> Halting Erlang VM
>
> riooverlord-1:
>
> =INFO REPORT==== 17-Oct-2012::08:24:47 ===
> rabbit on node 'rabbit at RIOBARON-1' down
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:47 ===
> Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Slave
> <'rabbit at RIOOVERLORD-1'.3.343.0> saw deaths of mirrors
> <'rabbit at RIOBARON-1'.2.354.0>
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:47 ===
> Mirrored-queue (queue 'Q' in vhost '/'): Slave
> <'rabbit at RIOOVERLORD-1'.3.258.0> saw deaths of mirrors
> <'rabbit at RIOBARON-1'.2.252.0>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121017/01848a6c/attachment.htm>
More information about the rabbitmq-discuss
mailing list