[rabbitmq-discuss] 3.1.1 - Errors during failover
Rensen, Nathanael
nathanael.rensen at zetta.com.au
Mon Jun 10 11:32:02 BST 2013
I've attached the sasl log from mq-002. Sorry I didn't include that originally.
Thanks for taking a look.
Nathanael
Simon MacMullen wrote:
Hi. Looking at the logs it seems like the message store on mq-002 crashed / shut down unexpectedly, but there's no information about this in the log. Do you have the corresponding sasl log?
Cheers, Simon
On 09/06/13 06:03, Rensen, Nathanael wrote:
While testing a fail-over scenario with RabbitMQ 3.1.1 I have repeatedly encountered errors, sometimes resulting in durable queues vanishing.
The cluster consists of two brokers using LVS / keepalived in order to connect clients to a functional broker. There are 10 mirrored queues, each of which has ha-sync-mode = automatic. A script is used to shut down one broker or the other in turn using 'service rabbitmq-server {start|stop}', such that there is always one broker running and leaving at least 30 seconds between each start / stop. I am expecting that this test should be able to run indefinitely without destabilising the cluster, however I have not been able to achieve more than a few dozen fail-overs without some error occurring. I'm hoping someone may have some insight or suggestions as to how to stabilise this environment.
I have included basic environment details below and attached logs from both brokers showing one example. In this case zg-dev-mq-003 was stopped at 11:32:21 and went through what appears to be a clean shutdown:
=INFO REPORT==== 9-Jun-2013::11:33:22 === Halting Erlang VM
zg-dev-mq-002 detected the other broker down and promoted itself to master. Then after accepting connections from clients it logged an error as shown below:
=INFO REPORT==== 9-Jun-2013::11:33:22 === rabbit on node 'rabbit at zg-dev-mq-003' down
=INFO REPORT==== 9-Jun-2013::11:33:22 === accepting AMQP connection <0.427.0> (10.0.72.36:61434 -> 172.17.0.73:5672)
=INFO REPORT==== 9-Jun-2013::11:33:22 === accepting AMQP connection <0.430.0> (10.0.72.36:61435 -> 172.17.0.73:5672)
=ERROR REPORT==== 9-Jun-2013::11:33:22 ===
** Generic server <0.418.0> terminating
** Last message in was {'$gen_cast',
{delete_and_terminate,
{badarg,
[{ets,insert_new,
[360523,
{{<<10,71,177,42,66,240,207,204,251,26,181,155,
246,83,172,137>>,
<<120,196,170,245,109,158,126,84,92,250,21,193,
123,113,128,48>>},
-1}],
[]},
{rabbit_msg_store,client_update_flying,3,[]},
{rabbit_msg_store,'-remove/2-lc$^0/1-0-',2,[]},
{rabbit_msg_store,remove,2,[]},
{rabbit_variable_queue,
'-with_immutable_msg_store_state/3-fun-0-',2,[]},
{rabbit_variable_queue,with_msg_store_state,3,[]},
{rabbit_variable_queue,
with_immutable_msg_store_state,3,[]},
{rabbit_variable_queue,'-ack/2-lc$^0/1-0-',2,
[]}]}}}
etc
Environment details (same for both brokers):
[root at zg-dev-mq-002]# uname -a
Linux zg-dev-mq-002.zettagrid.local 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[root at zg-dev-mq-002]# cat /etc/centos-release
CentOS release 6.4 (Final)
[root at zg-dev-mq-002]# yum list installed | egrep 'rabbit|erlang'
esl-erlang.x86_64 R16B-2 @/esl-erlang-R16B-2.x86_64
esl-erlang-compat.noarch R14B-1.el6 @/esl-erlang-compat-R14B-1.el6.noarch
rabbitmq-server.noarch 3.1.1-1 @/rabbitmq-server-3.1.1-1.noarch
Thanks very much,
Nathanael
________________________________
ZettaServe Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately if you have received this email by mistake and delete this email from your system. Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. ZettaServe Pty Ltd accepts no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit at zg-dev-mq-002-sasl.zip
Type: application/x-zip-compressed
Size: 7786 bytes
Desc: rabbit at zg-dev-mq-002-sasl.zip
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130610/6aae20b9/attachment.bin>
More information about the rabbitmq-discuss
mailing list