[rabbitmq-discuss] cluster buster

Simon MacMullen simon at rabbitmq.com
Wed Oct 17 17:39:04 BST 2012


The method QueueingBasicConsumer.OnCancel() gets invoked whenever your 
consumer gets cancelled for *any* reason - including if you asked for it 
via BasicCancel(), or if the connection was closed.

Instead you probably want to override HandleBasicCancel(string) (which 
by default invokes OnCancel()). HandleBasicCancel(string) is only 
invoked in the case where the server caused the cancellation - either 
because the queue failed over, or it was deleted.

Cheers, Simon

On 17/10/12 17:27, Mark Ward wrote:
> Thanks Simon,
>
> This is putting me on the right track to resolve this in my client.  I
> am using the .NET client and if I am understanding it correctly to
> handle the cancel extension I need to extend the QueueingBasicConsumer
> and override the OnCancel() method to provide my clients access to the
> cancel extension?
>
> http://www.rabbitmq.com/consumer-cancel.html
>
> http://www.rabbitmq.com/releases/rabbitmq-dotnet-client/v1.4.0/rabbitmq-dotnet-client-1.4.0-net-2.0-htmldoc/type-RabbitMQ.Client.QueueingBasicConsumer.html
>
> -Mark
>
>
> -------------------------------------------
> Oct 17, 2012; 11:10am   Simon MacMullen-2
> On 17/10/12 16:53, Mark Ward wrote:
>  > The issue is when I shutdown a server using the Windows Services to stop
>  > a node the cluster does fail over the queue but does not continue
>  > delivering messages to the subscriber.  The subscriber also does not
>  > receive any errors.  The client's connection still has a running status
>  > but no data flows.  I am able to reproduce this error.
>
> Have you read this:
>
> http://www.rabbitmq.com/ha.html#behaviour
>
> "Clients that were consuming from the mirrored-queue and support our
> Consumer Cancellation Notifications extension will receive a
> notification that their subscription to the mirrored-queue has been
> abruptly cancelled. At this point they should re-consume from the queue,
> which will pick up the new master. The reason for sending this
> notification is that informing clients of the loss of the master is
> essential: otherwise the client may continue to issue acknowledgements
> for messages they were sent by the old, failed master, and not expect
> that they might be about to see the same messages again, this time sent
> by the new master. Of course, clients that were connected to the failed
> node will find their connections failed, and will need to reconnect to a
> surviving node of the cluster."
>
> ?
>
> If the consumer is connected to a node other than the one that failed,
> it will not be disconnected, but it will be sent a cancellation
> notification and will have to reissue basic.consume.
>
> Cheers, Simon
>
> --
> Simon MacMullen
> RabbitMQ, VMware
>
> On Wednesday, October 17, 2012 10:53:13 AM UTC-5, Mark Ward wrote:
>
>     Hello,
>     I am testing what happens to the cluster when nodes are shutdown.
>       The results of my testing have not been positive.  I hope to
>     either be advised if I have a mistake in my cluster setup or if
>     something is really going wrong.
>
>     The issue is when I shutdown a server using the Windows Services to
>     stop a node the cluster does fail over the queue but does not
>     continue delivering messages to the subscriber.  The subscriber also
>     does not receive any errors.  The client's connection still has a
>     running status but no data flows.  I am able to reproduce this error.
>
>     I have added two test run results to this email. I have more tests
>     and the results saved if more information is required.  I can also
>     rerun future tests.
>
>     I have the following puny virtual machines for my cluster.
>     3 Virtual Machines
>     •Windows 2008 R2 sp1 64-bit
>     •1 GB of RAM
>     •16 GB of drive space free
>     RabbitMQ: 2.8.7
>     Erlang: R15B02
>     RabbitMQ is setup as the following on each of the nodes:
>     [{kernel, [{inet_dist_listen_min, 55021}, {inet_dist_listen_max,
>     55021} ] },
>     {rabbit,
>     [{cluster_nodes, ['rabbit at RIOBARON-1', 'rabbit at RIOOVERLORD-1',
>     'rabbit at CUST1-MASTER']}
>     ,{vm_memory_high_watermark, 0.2}]
>     }
>     ].
>
>     Virtual Host: MoBunniesMoProblems
>     Queue: MoData: Mirror(all) and Durable
>     Admin is setup on all nodes.
>     Using the .NET client version: 2.8.7 (nuget download)
>     The publisher uses SelectConfirm with messages set to persistent.
>     The subscriber uses BasicConsume with BasicQos set depending on the
>     test. (10, 1000, 5)
>     Message size has been 3,196 bytes
>     100,000 messages sent in each test.
>
>     ---------------------------------------------------------------------------------------------
>     Test 01:
>     Assumptions: shutdown cluster node.  A new server will be master of
>     the queue.  subscriber should continue to receive messages.
>       reconnection is possible and duplicate messages are possible.
>     Result: A node took ownership of the node's queue.  The subscriber
>     failed to continue after the node was shutdown.  connection was
>     never dropped.
>
>     Test Notes
>     Before testing make sure all servers in the cluster are active and
>     see each other.  Admin confirmed this with all green for nodes.
>     Publish 100k messages 3,196 in size with no subscriber active.
>     When all messages have been published Turn on subscriber.
>     During subscriber's active download of the messages shut down the
>     node that is the master of the queue.
>     Queue will fail over to the next oldest node.
>     Subscriber connection will not be inturrupted but will stop
>     receiving data.
>
>     riobaron-1: the connection server.  The publisher will send to this
>     server.  The subscriber will connect to this server.
>     riooverlord-1: MoData queue master.  queue was mirrored on
>     riobaron-1 and cust1-master.  Stats server.
>     cust1-master: mirror: second oldest server in the cluster.
>
>     During the publishing the high watermark for memory and drive space
>     was never reached.
>
>     After riooverlord-1 was shutdown using Windows Service manager to
>     shutdown the rabbitMQ service.  cust1-master was promoted as the
>     MoData queue master.
>     Subscriber data send stopped but the connection was still active.
>       Timeout on the connection was set to 25 seconds.  The connection
>     never timed out even after letting it sit for 24 hours.
>
>     Results of log files:
>     riobaron-1's log file
>     =INFO REPORT==== 16-Oct-2012::08:02:15 ===
>     rabbit on node 'rabbit at RIOOVERLORD-1' down
>
>
>     =ERROR REPORT==== 16-Oct-2012::08:02:15 ===
>     Mnesia('rabbit at RIOBARON-1'): ** ERROR ** mnesia_event got
>     {inconsistent_database, bad_decision, 'rabbit at RIOOVERLORD-1'}
>
>     cust1-master's log file
>
>     =INFO REPORT==== 16-Oct-2012::08:02:18 ===
>     Statistics database started.
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:18 ===
>     Mirrored-queue (queue 'Q' in vhost '/'): Slave
>     <'rabbit at CUST1-MASTER'.2.255.0> saw deaths of mirrors
>     <'rabbit at RIOOVERLORD-1'.1.220.0>
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:18 ===
>     Mirrored-queue (queue 'Q' in vhost '/'): Promoting slave
>     <'rabbit at CUST1-MASTER'.2.255.0> to master
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:18 ===
>     rabbit on node 'rabbit at RIOOVERLORD-1' down
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:18 ===
>     Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Slave
>     <'rabbit at CUST1-MASTER'.2.253.0> saw deaths of mirrors
>     <'rabbit at RIOOVERLORD-1'.1.219.0>
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:18 ===
>     Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'):
>     Promoting slave <'rabbit at CUST1-MASTER'.2.253.0> to master
>
>
>     =ERROR REPORT==== 16-Oct-2012::08:02:18 ===
>     Mnesia('rabbit at CUST1-MASTER'): ** ERROR ** mnesia_event got
>     {inconsistent_database, bad_decision, 'rabbit at RIOOVERLORD-1'}
>
>     riooverlord-1's log file
>
>     =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     Stopping Rabbit
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>          application: rabbitmq_management
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>          application: rabbitmq_management_agent
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     stopped TCP Listener on 0.0.0.0:5672 <http://0.0.0.0:5672>
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     stopped TCP Listener on [::]:5672
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>          application: rabbit
>          exited: stopped
>          type: permanent
>
>
>     =ERROR REPORT==== 16-Oct-2012::08:02:16 ===
>     Error in process <0.20312.2> on node 'rabbit at RIOOVERLORD-1' with
>     exit value:
>     {badarg,[{mnesia_tm,commit_participant,6,[{file,"mnesia_tm.erl"},{line,1750}]}]}
>
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>          application: mnesia
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>          application: os_mon
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 16-Oct-2012::08:02:16 ===
>     Halting Erlang VM
>
>
>     ---------------------------------------------------------------------------------------------
>     Test 02:
>     Assumption: send data into cluster. shutdown cluster master node of
>     the queue.  subscriber should continue to receive messages.
>     Result: a new node promoted to master of the queue upon the
>     shutdown.  subscriber did not receive data after the promotion.
>       connection was still running and no errors from the client.
>
>     Test Notes:
>     All nodes in the cluster were online and green.  The queue was
>     mirrored on all nodes.
>     queue starts off with 0 messages in the queue.
>
>     riobaron-1: connection server.  publisher connected to this server.
>       subscriber connects to this server.  youngest server in the cluster
>     cust1-master: stats server and master of MoData queue.   oldest server
>     riooverlord: second oldest server.
>
>     After the publisher completed the subscriber was turned on.  During
>     the subscribers connection and active data being received the
>     riobaron-1 server was shutdown.  The subscriber reconnected to
>     riooverlord-1.  Data resumed to send to the subscriber.  During the
>     subscription the cust1-master node was shutdown.  Subscription
>     stopped.  connection remained active and running.
>     rioverlord-1 become the new master of the MoData queue.
>
>     Waited on the cluster but the connection remained active but no data
>     received from the cluster.  When the client is shutdown and makes a
>     new connection sending will resume.
>
>     Results from log files:
>
>     riobaron-1:
>     =INFO REPORT==== 17-Oct-2012::08:15:35 ===
>     rabbit on node 'rabbit at RIOOVERLORD-1' up
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     Stopping Rabbit
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>          application: rabbitmq_management
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>          application: rabbitmq_management_agent
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     stopped TCP Listener on 0.0.0.0:5672 <http://0.0.0.0:5672>
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     stopped TCP Listener on [::]:5672
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>          application: rabbit
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>          application: mnesia
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>          application: os_mon
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:46 ===
>     Halting Erlang VM
>
>     cust1-master:
>     =INFO REPORT==== 17-Oct-2012::08:15:12 ===
>     Statistics database started.
>
>
>     =INFO REPORT==== 17-Oct-2012::08:15:32 ===
>     rabbit on node 'rabbit at RIOBARON-1' up
>
>
>     =INFO REPORT==== 17-Oct-2012::08:15:38 ===
>     rabbit on node 'rabbit at RIOOVERLORD-1' up
>
>
>     =INFO REPORT==== 17-Oct-2012::08:16:03 ===
>     Adding mirror of queue 'MoData' in vhost 'MoBunnyMoProblems' on node
>     'rabbit at RIOOVERLORD-1': <3151.343.0>
>
>
>     =INFO REPORT==== 17-Oct-2012::08:16:03 ===
>     Adding mirror of queue 'MoData' in vhost 'MoBunnyMoProblems' on node
>     'rabbit at RIOBARON-1': <3150.354.0>
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:49 ===
>     rabbit on node 'rabbit at RIOBARON-1' down
>
>
>     =INFO REPORT==== 17-Oct-2012::08:27:11 ===
>     rabbit on node 'rabbit at RIOBARON-1' up
>
>
>     =INFO REPORT==== 17-Oct-2012::08:46:05 ===
>     rabbit on node 'rabbit at RIOBARON-1' down
>
>
>     =INFO REPORT==== 17-Oct-2012::08:46:44 ===
>     Stopping Rabbit
>
>
>     =INFO REPORT==== 17-Oct-2012::08:46:44 ===
>          application: rabbitmq_management
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 17-Oct-2012::08:46:44 ===
>          application: rabbitmq_management_agent
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 17-Oct-2012::08:46:44 ===
>     stopped TCP Listener on 0.0.0.0:5672 <http://0.0.0.0:5672>
>
>
>     =INFO REPORT==== 17-Oct-2012::08:46:44 ===
>     stopped TCP Listener on [::]:5672
>
>
>     =INFO REPORT==== 17-Oct-2012::08:46:45 ===
>          application: rabbit
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 17-Oct-2012::08:46:45 ===
>          application: mnesia
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 17-Oct-2012::08:46:45 ===
>          application: os_mon
>          exited: stopped
>          type: permanent
>
>
>     =INFO REPORT==== 17-Oct-2012::08:46:45 ===
>     Halting Erlang VM
>
>     riooverlord-1:
>
>     =INFO REPORT==== 17-Oct-2012::08:24:47 ===
>     rabbit on node 'rabbit at RIOBARON-1' down
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:47 ===
>     Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Slave
>     <'rabbit at RIOOVERLORD-1'.3.343.0> saw deaths of mirrors
>     <'rabbit at RIOBARON-1'.2.354.0>
>
>
>     =INFO REPORT==== 17-Oct-2012::08:24:47 ===
>     Mirrored-queue (queue 'Q' in vhost '/'): Slave
>     <'rabbit at RIOOVERLORD-1'.3.258.0> saw deaths of mirrors
>     <'rabbit at RIOBARON-1'.2.252.0>
>
>
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>


-- 
Simon MacMullen
RabbitMQ, VMware


More information about the rabbitmq-discuss mailing list