[rabbitmq-discuss] 3.1.1 - Errors during failover

Simon MacMullen simon at rabbitmq.com
Mon Jun 10 15:19:22 BST 2013


Hi thanks. This is definitely an odd looking error, can you tell us mre 
about what you're doing? Are you just starting / stopping nodes, or is 
there messaging activity going on at the same time (and if so, what?)

Cheers, Simon

On 10/06/13 11:32, Rensen, Nathanael wrote:
> I've attached the sasl log from mq-002. Sorry I didn't include that originally.
>
> Thanks for taking a look.
>
> Nathanael
>
>
> Simon MacMullen wrote:
>
> Hi. Looking at the logs it seems like the message store on mq-002 crashed / shut down unexpectedly, but there's no information about this in the log. Do you have the corresponding sasl log?
>
> Cheers, Simon
>
>
> On 09/06/13 06:03, Rensen, Nathanael wrote:
> While testing a fail-over scenario with RabbitMQ 3.1.1 I have repeatedly encountered errors, sometimes resulting in durable queues vanishing.
>
> The cluster consists of two brokers using LVS / keepalived in order to connect clients to a functional broker. There are 10 mirrored queues, each of which has ha-sync-mode = automatic. A script is used to shut down one broker or the other in turn using 'service rabbitmq-server {start|stop}', such that there is always one broker running and leaving at least 30 seconds between each start / stop. I am expecting that this test should be able to run indefinitely without destabilising the cluster, however I have not been able to achieve more than a few dozen fail-overs without some error occurring. I'm hoping someone may have some insight or suggestions as to how to stabilise this environment.
>
> I have included basic environment details below and attached logs from both brokers showing one example. In this case zg-dev-mq-003 was stopped at 11:32:21 and went through what appears to be a clean shutdown:
>
> =INFO REPORT==== 9-Jun-2013::11:33:22 === Halting Erlang VM
>
> zg-dev-mq-002 detected the other broker down and promoted itself to master. Then after accepting connections from clients it logged an error as shown below:
>
> =INFO REPORT==== 9-Jun-2013::11:33:22 === rabbit on node 'rabbit at zg-dev-mq-003' down
> =INFO REPORT==== 9-Jun-2013::11:33:22 === accepting AMQP connection <0.427.0> (10.0.72.36:61434 -> 172.17.0.73:5672)
> =INFO REPORT==== 9-Jun-2013::11:33:22 === accepting AMQP connection <0.430.0> (10.0.72.36:61435 -> 172.17.0.73:5672)
> =ERROR REPORT==== 9-Jun-2013::11:33:22 ===
> ** Generic server <0.418.0> terminating
> ** Last message in was {'$gen_cast',
>                           {delete_and_terminate,
>                            {badarg,
>                             [{ets,insert_new,
>                               [360523,
>                                {{<<10,71,177,42,66,240,207,204,251,26,181,155,
>                                    246,83,172,137>>,
>                                  <<120,196,170,245,109,158,126,84,92,250,21,193,
>                                    123,113,128,48>>},
>                                 -1}],
>                               []},
>                              {rabbit_msg_store,client_update_flying,3,[]},
>                              {rabbit_msg_store,'-remove/2-lc$^0/1-0-',2,[]},
>                              {rabbit_msg_store,remove,2,[]},
>                              {rabbit_variable_queue,
>                               '-with_immutable_msg_store_state/3-fun-0-',2,[]},
>                              {rabbit_variable_queue,with_msg_store_state,3,[]},
>                              {rabbit_variable_queue,
>                               with_immutable_msg_store_state,3,[]},
>                              {rabbit_variable_queue,'-ack/2-lc$^0/1-0-',2,
>                               []}]}}}
> etc
>
> Environment details (same for both brokers):
>
> [root at zg-dev-mq-002]# uname -a
> Linux zg-dev-mq-002.zettagrid.local 2.6.32-358.2.1.el6.x86_64 #1 SMP Wed Mar 13 00:26:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>
> [root at zg-dev-mq-002]# cat /etc/centos-release
> CentOS release 6.4 (Final)
>
> [root at zg-dev-mq-002]# yum list installed | egrep 'rabbit|erlang'
> esl-erlang.x86_64      R16B-2           @/esl-erlang-R16B-2.x86_64
> esl-erlang-compat.noarch      R14B-1.el6       @/esl-erlang-compat-R14B-1.el6.noarch
> rabbitmq-server.noarch 3.1.1-1          @/rabbitmq-server-3.1.1-1.noarch
>
> Thanks very much,
>
> Nathanael
>
> ________________________________
>
> ZettaServe Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately if you have received this email by mistake and delete this email from your system. Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. ZettaServe Pty Ltd accepts no liability for any damage caused by any virus transmitted by this email.
>


-- 
Simon MacMullen
RabbitMQ, Pivotal


More information about the rabbitmq-discuss mailing list