[rabbitmq-discuss] cluster buster
Mark Ward
ward.mark at gmail.com
Wed Oct 17 16:53:13 BST 2012
Hello,
I am testing what happens to the cluster when nodes are shutdown. The
results of my testing have not been positive. I hope to either be advised
if I have a mistake in my cluster setup or if something is really going
wrong.
The issue is when I shutdown a server using the Windows Services to stop a
node the cluster does fail over the queue but does not continue delivering
messages to the subscriber. The subscriber also does not receive any
errors. The client's connection still has a running status but no data
flows. I am able to reproduce this error.
I have added two test run results to this email. I have more tests and the
results saved if more information is required. I can also rerun future
tests.
I have the following puny virtual machines for my cluster.
3 Virtual Machines
• Windows 2008 R2 sp1 64-bit
• 1 GB of RAM
• 16 GB of drive space free
RabbitMQ: 2.8.7
Erlang: R15B02
RabbitMQ is setup as the following on each of the nodes:
[ {kernel, [{inet_dist_listen_min, 55021}, {inet_dist_listen_max, 55021} ]
},
{rabbit,
[{cluster_nodes, ['rabbit at RIOBARON-1', 'rabbit at RIOOVERLORD-1',
'rabbit at CUST1-MASTER']}
,{vm_memory_high_watermark, 0.2}]
}
].
Virtual Host: MoBunniesMoProblems
Queue: MoData: Mirror(all) and Durable
Admin is setup on all nodes.
Using the .NET client version: 2.8.7 (nuget download)
The publisher uses SelectConfirm with messages set to persistent.
The subscriber uses BasicConsume with BasicQos set depending on the test.
(10, 1000, 5)
Message size has been 3,196 bytes
100,000 messages sent in each test.
---------------------------------------------------------------------------------------------
Test 01:
Assumptions: shutdown cluster node. A new server will be master of the
queue. subscriber should continue to receive messages. reconnection is
possible and duplicate messages are possible.
Result: A node took ownership of the node's queue. The subscriber failed
to continue after the node was shutdown. connection was never dropped.
Test Notes
Before testing make sure all servers in the cluster are active and see each
other. Admin confirmed this with all green for nodes.
Publish 100k messages 3,196 in size with no subscriber active.
When all messages have been published Turn on subscriber.
During subscriber's active download of the messages shut down the node that
is the master of the queue.
Queue will fail over to the next oldest node.
Subscriber connection will not be inturrupted but will stop receiving data.
riobaron-1: the connection server. The publisher will send to this server.
The subscriber will connect to this server.
riooverlord-1: MoData queue master. queue was mirrored on riobaron-1 and
cust1-master. Stats server.
cust1-master: mirror: second oldest server in the cluster.
During the publishing the high watermark for memory and drive space was
never reached.
After riooverlord-1 was shutdown using Windows Service manager to shutdown
the rabbitMQ service. cust1-master was promoted as the MoData queue master.
Subscriber data send stopped but the connection was still active. Timeout
on the connection was set to 25 seconds. The connection never timed out
even after letting it sit for 24 hours.
Results of log files:
riobaron-1's log file
=INFO REPORT==== 16-Oct-2012::08:02:15 ===
rabbit on node 'rabbit at RIOOVERLORD-1' down
=ERROR REPORT==== 16-Oct-2012::08:02:15 ===
Mnesia('rabbit at RIOBARON-1'): ** ERROR ** mnesia_event got
{inconsistent_database, bad_decision, 'rabbit at RIOOVERLORD-1'}
cust1-master's log file
=INFO REPORT==== 16-Oct-2012::08:02:18 ===
Statistics database started.
=INFO REPORT==== 16-Oct-2012::08:02:18 ===
Mirrored-queue (queue 'Q' in vhost '/'): Slave
<'rabbit at CUST1-MASTER'.2.255.0> saw deaths of mirrors
<'rabbit at RIOOVERLORD-1'.1.220.0>
=INFO REPORT==== 16-Oct-2012::08:02:18 ===
Mirrored-queue (queue 'Q' in vhost '/'): Promoting slave
<'rabbit at CUST1-MASTER'.2.255.0> to master
=INFO REPORT==== 16-Oct-2012::08:02:18 ===
rabbit on node 'rabbit at RIOOVERLORD-1' down
=INFO REPORT==== 16-Oct-2012::08:02:18 ===
Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Slave
<'rabbit at CUST1-MASTER'.2.253.0> saw deaths of mirrors
<'rabbit at RIOOVERLORD-1'.1.219.0>
=INFO REPORT==== 16-Oct-2012::08:02:18 ===
Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Promoting
slave <'rabbit at CUST1-MASTER'.2.253.0> to master
=ERROR REPORT==== 16-Oct-2012::08:02:18 ===
Mnesia('rabbit at CUST1-MASTER'): ** ERROR ** mnesia_event got
{inconsistent_database, bad_decision, 'rabbit at RIOOVERLORD-1'}
riooverlord-1's log file
=INFO REPORT==== 16-Oct-2012::08:02:16 ===
Stopping Rabbit
=INFO REPORT==== 16-Oct-2012::08:02:16 ===
application: rabbitmq_management
exited: stopped
type: permanent
=INFO REPORT==== 16-Oct-2012::08:02:16 ===
application: rabbitmq_management_agent
exited: stopped
type: permanent
=INFO REPORT==== 16-Oct-2012::08:02:16 ===
stopped TCP Listener on 0.0.0.0:5672
=INFO REPORT==== 16-Oct-2012::08:02:16 ===
stopped TCP Listener on [::]:5672
=INFO REPORT==== 16-Oct-2012::08:02:16 ===
application: rabbit
exited: stopped
type: permanent
=ERROR REPORT==== 16-Oct-2012::08:02:16 ===
Error in process <0.20312.2> on node 'rabbit at RIOOVERLORD-1' with exit
value:
{badarg,[{mnesia_tm,commit_participant,6,[{file,"mnesia_tm.erl"},{line,1750}]}]}
=INFO REPORT==== 16-Oct-2012::08:02:16 ===
application: mnesia
exited: stopped
type: permanent
=INFO REPORT==== 16-Oct-2012::08:02:16 ===
application: os_mon
exited: stopped
type: permanent
=INFO REPORT==== 16-Oct-2012::08:02:16 ===
Halting Erlang VM
---------------------------------------------------------------------------------------------
Test 02:
Assumption: send data into cluster. shutdown cluster master node of the
queue. subscriber should continue to receive messages.
Result: a new node promoted to master of the queue upon the shutdown.
subscriber did not receive data after the promotion. connection was still
running and no errors from the client.
Test Notes:
All nodes in the cluster were online and green. The queue was mirrored on
all nodes.
queue starts off with 0 messages in the queue.
riobaron-1: connection server. publisher connected to this server.
subscriber connects to this server. youngest server in the cluster
cust1-master: stats server and master of MoData queue. oldest server
riooverlord: second oldest server.
After the publisher completed the subscriber was turned on. During the
subscribers connection and active data being received the riobaron-1 server
was shutdown. The subscriber reconnected to riooverlord-1. Data resumed
to send to the subscriber. During the subscription the cust1-master node
was shutdown. Subscription stopped. connection remained active and
running.
rioverlord-1 become the new master of the MoData queue.
Waited on the cluster but the connection remained active but no data
received from the cluster. When the client is shutdown and makes a new
connection sending will resume.
Results from log files:
riobaron-1:
=INFO REPORT==== 17-Oct-2012::08:15:35 ===
rabbit on node 'rabbit at RIOOVERLORD-1' up
=INFO REPORT==== 17-Oct-2012::08:24:46 ===
Stopping Rabbit
=INFO REPORT==== 17-Oct-2012::08:24:46 ===
application: rabbitmq_management
exited: stopped
type: permanent
=INFO REPORT==== 17-Oct-2012::08:24:46 ===
application: rabbitmq_management_agent
exited: stopped
type: permanent
=INFO REPORT==== 17-Oct-2012::08:24:46 ===
stopped TCP Listener on 0.0.0.0:5672
=INFO REPORT==== 17-Oct-2012::08:24:46 ===
stopped TCP Listener on [::]:5672
=INFO REPORT==== 17-Oct-2012::08:24:46 ===
application: rabbit
exited: stopped
type: permanent
=INFO REPORT==== 17-Oct-2012::08:24:46 ===
application: mnesia
exited: stopped
type: permanent
=INFO REPORT==== 17-Oct-2012::08:24:46 ===
application: os_mon
exited: stopped
type: permanent
=INFO REPORT==== 17-Oct-2012::08:24:46 ===
Halting Erlang VM
cust1-master:
=INFO REPORT==== 17-Oct-2012::08:15:12 ===
Statistics database started.
=INFO REPORT==== 17-Oct-2012::08:15:32 ===
rabbit on node 'rabbit at RIOBARON-1' up
=INFO REPORT==== 17-Oct-2012::08:15:38 ===
rabbit on node 'rabbit at RIOOVERLORD-1' up
=INFO REPORT==== 17-Oct-2012::08:16:03 ===
Adding mirror of queue 'MoData' in vhost 'MoBunnyMoProblems' on node
'rabbit at RIOOVERLORD-1': <3151.343.0>
=INFO REPORT==== 17-Oct-2012::08:16:03 ===
Adding mirror of queue 'MoData' in vhost 'MoBunnyMoProblems' on node
'rabbit at RIOBARON-1': <3150.354.0>
=INFO REPORT==== 17-Oct-2012::08:24:49 ===
rabbit on node 'rabbit at RIOBARON-1' down
=INFO REPORT==== 17-Oct-2012::08:27:11 ===
rabbit on node 'rabbit at RIOBARON-1' up
=INFO REPORT==== 17-Oct-2012::08:46:05 ===
rabbit on node 'rabbit at RIOBARON-1' down
=INFO REPORT==== 17-Oct-2012::08:46:44 ===
Stopping Rabbit
=INFO REPORT==== 17-Oct-2012::08:46:44 ===
application: rabbitmq_management
exited: stopped
type: permanent
=INFO REPORT==== 17-Oct-2012::08:46:44 ===
application: rabbitmq_management_agent
exited: stopped
type: permanent
=INFO REPORT==== 17-Oct-2012::08:46:44 ===
stopped TCP Listener on 0.0.0.0:5672
=INFO REPORT==== 17-Oct-2012::08:46:44 ===
stopped TCP Listener on [::]:5672
=INFO REPORT==== 17-Oct-2012::08:46:45 ===
application: rabbit
exited: stopped
type: permanent
=INFO REPORT==== 17-Oct-2012::08:46:45 ===
application: mnesia
exited: stopped
type: permanent
=INFO REPORT==== 17-Oct-2012::08:46:45 ===
application: os_mon
exited: stopped
type: permanent
=INFO REPORT==== 17-Oct-2012::08:46:45 ===
Halting Erlang VM
riooverlord-1:
=INFO REPORT==== 17-Oct-2012::08:24:47 ===
rabbit on node 'rabbit at RIOBARON-1' down
=INFO REPORT==== 17-Oct-2012::08:24:47 ===
Mirrored-queue (queue 'MoData' in vhost 'MoBunnyMoProblems'): Slave
<'rabbit at RIOOVERLORD-1'.3.343.0> saw deaths of mirrors
<'rabbit at RIOBARON-1'.2.354.0>
=INFO REPORT==== 17-Oct-2012::08:24:47 ===
Mirrored-queue (queue 'Q' in vhost '/'): Slave
<'rabbit at RIOOVERLORD-1'.3.258.0> saw deaths of mirrors
<'rabbit at RIOBARON-1'.2.252.0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121017/59a63a76/attachment.htm>
More information about the rabbitmq-discuss
mailing list