[rabbitmq-discuss] cluster buster

Mark Ward ward.mark at gmail.com
Wed Oct 17 18:01:06 BST 2012


Simon,

This was exactly what I needed.  I have tested my client and I was able to 
reconnect on the fault and resume subscription.

-Mark
 
-------------------------------------------------------------
Oct 17, 2012; 11:39am   Simon MacMullen-2
The method QueueingBasicConsumer.OnCancel() gets invoked whenever your 
consumer gets cancelled for *any* reason - including if you asked for it 
via BasicCancel(), or if the connection was closed. 

Instead you probably want to override HandleBasicCancel(string) (which 
by default invokes OnCancel()). HandleBasicCancel(string) is only 
invoked in the case where the server caused the cancellation - either 
because the queue failed over, or it was deleted. 

Cheers, Simon 
---------------------------------------

On Wednesday, October 17, 2012 10:53:13 AM UTC-5, Mark Ward wrote:
>
> Hello,
> I am testing what happens to the cluster when nodes are shutdown.  The 
> results of my testing have not been positive.  I hope to either be advised 
> if I have a mistake in my cluster setup or if something is really going 
> wrong.
>
> The issue is when I shutdown a server using the Windows Services to stop a 
> node the cluster does fail over the queue but does not continue delivering 
> messages to the subscriber.  The subscriber also does not receive any 
> errors.  The client's connection still has a running status but no data 
> flows.  I am able to reproduce this error.
>
> I have added two test run results to this email. I have more tests and the 
> results saved if more information is required.  I can also rerun future 
> tests.
>
> I have the following puny virtual machines for my cluster.
> 3 Virtual Machines
> • Windows 2008 R2 sp1 64-bit
> • 1 GB of RAM
> • 16 GB of drive space free
> RabbitMQ: 2.8.7
> Erlang: R15B02
> RabbitMQ is setup as the following on each of the nodes:
> [ {kernel, [{inet_dist_listen_min, 55021}, {inet_dist_listen_max, 55021} 
> ] },
> {rabbit, 
> [{cluster_nodes, ['rabbit at RIOBARON-1', 'rabbit at RIOOVERLORD-1', 
> 'rabbit at CUST1-MASTER']}
> ,{vm_memory_high_watermark, 0.2}]
> } 
> ].
>
> Virtual Host: MoBunniesMoProblems
> Queue: MoData: Mirror(all) and Durable
> Admin is setup on all nodes.
> Using the .NET client version: 2.8.7 (nuget download)
> The publisher uses SelectConfirm with messages set to persistent.
> The subscriber uses BasicConsume with BasicQos set depending on the test. 
> (10, 1000, 5)
> Message size has been 3,196 bytes
> 100,000 messages sent in each test.
>
>
> ---------------------------------------------------------------------------------------------
> Test 01:
> Assumptions: shutdown cluster node.  A new server will be master of the 
> queue.  subscriber should continue to receive messages.  reconnection is 
> possible and duplicate messages are possible.
> Result: A node took ownership of the node's queue.  The subscriber failed 
> to continue after the node was shutdown.  connection was never dropped.
>
> Test Notes
> Before testing make sure all servers in the cluster are active and see 
> each other.  Admin confirmed this with all green for nodes.
> Publish 100k messages 3,196 in size with no subscriber active.
> When all messages have been published Turn on subscriber.
> During subscriber's active download of the messages shut down the node 
> that is the master of the queue.
> Queue will fail over to the next oldest node.
> Subscriber connection will not be inturrupted but will stop receiving data.
>
> riobaron-1: the connection server.  The publisher will send to this 
> server.  The subscriber will connect to this server.
> riooverlord-1: MoData queue master.  queue was mirrored on riobaron-1 and 
> cust1-master.  Stats server.
> cust1-master: mirror: second oldest server in the cluster.  
>
> During the publishing the high watermark for memory and drive space was 
> never reached.
>
> After riooverlord-1 was shutdown using Windows Service manager to shutdown 
> the rabbitMQ service.  cust1-master was promoted as the MoData queue master.
> Subscriber data send stopped but the connection was still active.  Timeout 
> on the connection was set to 25 seconds.  The connection never timed out 
> even after letting it sit for 24 hours.
>
> Results of log files:
> riobaron-1's log file
> =INFO REPORT==== 16-Oct-2012::08:02:15 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' down
>
>
> =ERROR REPORT==== 16-Oct-2012::08:02:15 ===
> Mnesia('rabbit at RIOBARON-1'): ** ERROR ** mnesia_event got 
> {inconsistent_database, bad_decision, 'rabbit at RIOOVERLORD-1'}
>
> cust1-master's log file
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Statistics database started.
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'Q' in vhost '/'): Slave 
> <'rabbit at CUST1-MASTER'.2.255.0> saw deaths of mirrors 
> <'rabbit at RIOOVERLORD-1'.1.220.0> 
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'Q' in vhost '/'): Promoting slave 
> <'rabbit at CUST1-MASTER'.2.255.0> to master
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' down
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Slave 
> <'rabbit at CUST1-MASTER'.2.253.0> saw deaths of mirrors 
> <'rabbit at RIOOVERLORD-1'.1.219.0> 
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:18 ===
> Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Promoting 
> slave <'rabbit at CUST1-MASTER'.2.253.0> to master
>
>
> =ERROR REPORT==== 16-Oct-2012::08:02:18 ===
> Mnesia('rabbit at CUST1-MASTER'): ** ERROR ** mnesia_event got 
> {inconsistent_database, bad_decision, 'rabbit at RIOOVERLORD-1'}
>
> riooverlord-1's log file
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> Stopping Rabbit
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     application: rabbitmq_management
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     application: rabbitmq_management_agent
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> stopped TCP Listener on 0.0.0.0:5672
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> stopped TCP Listener on [::]:5672
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     application: rabbit
>     exited: stopped
>     type: permanent
>
>
> =ERROR REPORT==== 16-Oct-2012::08:02:16 ===
> Error in process <0.20312.2> on node 'rabbit at RIOOVERLORD-1' with exit 
> value: 
> {badarg,[{mnesia_tm,commit_participant,6,[{file,"mnesia_tm.erl"},{line,1750}]}]}
>
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     application: mnesia
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     application: os_mon
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 16-Oct-2012::08:02:16 ===
> Halting Erlang VM
>
>
>
> ---------------------------------------------------------------------------------------------
> Test 02:
> Assumption: send data into cluster. shutdown cluster master node of the 
> queue.  subscriber should continue to receive messages.
> Result: a new node promoted to master of the queue upon the shutdown. 
>  subscriber did not receive data after the promotion.  connection was still 
> running and no errors from the client.
>
> Test Notes:
> All nodes in the cluster were online and green.  The queue was mirrored on 
> all nodes.
> queue starts off with 0 messages in the queue.
>
> riobaron-1: connection server.  publisher connected to this server. 
>  subscriber connects to this server.  youngest server in the cluster
> cust1-master: stats server and master of MoData queue.   oldest server
> riooverlord: second oldest server.
>
> After the publisher completed the subscriber was turned on.  During the 
> subscribers connection and active data being received the riobaron-1 server 
> was shutdown.  The subscriber reconnected to riooverlord-1.  Data resumed 
> to send to the subscriber.  During the subscription the cust1-master node 
> was shutdown.  Subscription stopped.  connection remained active and 
> running.
> rioverlord-1 become the new master of the MoData queue.
>
> Waited on the cluster but the connection remained active but no data 
> received from the cluster.  When the client is shutdown and makes a new 
> connection sending will resume.
>
> Results from log files:
>
> riobaron-1: 
> =INFO REPORT==== 17-Oct-2012::08:15:35 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> Stopping Rabbit
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     application: rabbitmq_management
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     application: rabbitmq_management_agent
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> stopped TCP Listener on 0.0.0.0:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> stopped TCP Listener on [::]:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     application: rabbit
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     application: mnesia
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     application: os_mon
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:46 ===
> Halting Erlang VM
>
> cust1-master:
> =INFO REPORT==== 17-Oct-2012::08:15:12 ===
> Statistics database started.
>
>
> =INFO REPORT==== 17-Oct-2012::08:15:32 ===
> rabbit on node 'rabbit at RIOBARON-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:15:38 ===
> rabbit on node 'rabbit at RIOOVERLORD-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:16:03 ===
> Adding mirror of queue 'MoData' in vhost 'MoBunnyMoProblems' on node 
> 'rabbit at RIOOVERLORD-1': <3151.343.0>
>
>
> =INFO REPORT==== 17-Oct-2012::08:16:03 ===
> Adding mirror of queue 'MoData' in vhost 'MoBunnyMoProblems' on node 
> 'rabbit at RIOBARON-1': <3150.354.0>
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:49 ===
> rabbit on node 'rabbit at RIOBARON-1' down
>
>
> =INFO REPORT==== 17-Oct-2012::08:27:11 ===
> rabbit on node 'rabbit at RIOBARON-1' up
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:05 ===
> rabbit on node 'rabbit at RIOBARON-1' down
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> Stopping Rabbit
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
>     application: rabbitmq_management
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
>     application: rabbitmq_management_agent
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> stopped TCP Listener on 0.0.0.0:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:44 ===
> stopped TCP Listener on [::]:5672
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
>     application: rabbit
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
>     application: mnesia
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
>     application: os_mon
>     exited: stopped
>     type: permanent
>
>
> =INFO REPORT==== 17-Oct-2012::08:46:45 ===
> Halting Erlang VM
>
> riooverlord-1:
>
> =INFO REPORT==== 17-Oct-2012::08:24:47 ===
> rabbit on node 'rabbit at RIOBARON-1' down
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:47 ===
> Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Slave 
> <'rabbit at RIOOVERLORD-1'.3.343.0> saw deaths of mirrors 
> <'rabbit at RIOBARON-1'.2.354.0> 
>
>
> =INFO REPORT==== 17-Oct-2012::08:24:47 ===
> Mirrored-queue (queue 'Q' in vhost '/'): Slave 
> <'rabbit at RIOOVERLORD-1'.3.258.0> saw deaths of mirrors 
> <'rabbit at RIOBARON-1'.2.252.0> 
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121017/840b1f80/attachment.htm>


More information about the rabbitmq-discuss mailing list