[rabbitmq-discuss] 3.1.1 - Errors during failover

Rensen, Nathanael nathanael.rensen at zetta.com.au
Sun Jun 9 06:03:42 BST 2013


While testing a fail-over scenario with RabbitMQ 3.1.1 I have repeatedly encountered errors, sometimes resulting in durable queues vanishing.

The cluster consists of two brokers using LVS / keepalived in order to connect clients to a functional broker. There are 10 mirrored queues, each of which has ha-sync-mode = automatic. A script is used to shut down one broker or the other in turn using 'service rabbitmq-server {start|stop}', such that there is always one broker running and leaving at least 30 seconds between each start / stop. I am expecting that this test should be able to run indefinitely without destabilising the cluster, however I have not been able to achieve more than a few dozen fail-overs without some error occurring. I'm hoping someone may have some insight or suggestions as to how to stabilise this environment.

I have included basic environment details below and attached logs from both brokers showing one example. In this case zg-dev-mq-003 was stopped at 11:32:21 and went through what appears to be a clean shutdown:

=INFO REPORT==== 9-Jun-2013::11:33:22 === Halting Erlang VM

zg-dev-mq-002 detected the other broker down and promoted itself to master. Then after accepting connections from clients it logged an error as shown below:

=INFO REPORT==== 9-Jun-2013::11:33:22 === rabbit on node 'rabbit at zg-dev-mq-003' down
=INFO REPORT==== 9-Jun-2013::11:33:22 === accepting AMQP connection <0.427.0> (10.0.72.36:61434 -> 172.17.0.73:5672)
=INFO REPORT==== 9-Jun-2013::11:33:22 === accepting AMQP connection <0.430.0> (10.0.72.36:61435 -> 172.17.0.73:5672)
=ERROR REPORT==== 9-Jun-2013::11:33:22 ===
** Generic server <0.418.0> terminating
** Last message in was {'$gen_cast',
                        {delete_and_terminate,
                         {badarg,
                          [{ets,insert_new,
                            [360523,
                             {{<<10,71,177,42,66,240,207,204,251,26,181,155,
                                 246,83,172,137>>,
                               <<120,196,170,245,109,158,126,84,92,250,21,193,
                                 123,113,128,48>>},
                              -1}],
                            []},
                           {rabbit_msg_store,client_update_flying,3,[]},
                           {rabbit_msg_store,'-remove/2-lc$^0/1-0-',2,[]},
                           {rabbit_msg_store,remove,2,[]},
                           {rabbit_variable_queue,
                            '-with_immutable_msg_store_state/3-fun-0-',2,[]},
                           {rabbit_variable_queue,with_msg_store_state,3,[]},
                           {rabbit_variable_queue,
                            with_immutable_msg_store_state,3,[]},
                           {rabbit_variable_queue,'-ack/2-lc$^0/1-0-',2,
                            []}]}}}
etc

Environment details (same for both brokers):

[root at zg-dev-mq-002]# uname -a
Linux zg-dev-mq-002.zettagrid.local 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[root at zg-dev-mq-002]# cat /etc/centos-release
CentOS release 6.4 (Final)

[root at zg-dev-mq-002]# yum list installed | egrep 'rabbit|erlang'
esl-erlang.x86_64      R16B-2           @/esl-erlang-R16B-2.x86_64
esl-erlang-compat.noarch      R14B-1.el6       @/esl-erlang-compat-R14B-1.el6.noarch
rabbitmq-server.noarch 3.1.1-1          @/rabbitmq-server-3.1.1-1.noarch

Thanks very much,

Nathanael

________________________________

ZettaServe Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately if you have received this email by mistake and delete this email from your system. Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. ZettaServe Pty Ltd accepts no liability for any damage caused by any virus transmitted by this email.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs.zip
Type: application/x-zip-compressed
Size: 331237 bytes
Desc: logs.zip
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130609/34895bfb/attachment.bin>


More information about the rabbitmq-discuss mailing list