[rabbitmq-discuss] Queue disappears during partition/autoheal
Matt Pietrek
mpietrek at skytap.com
Tue Apr 15 23:09:24 BST 2014
This is rabbitmq 3.2.4, running in a 2 node cluster with all queues in ha.
There was a queue called cmcmd declared like this:
arguments:x-ha-policy:alldurable:true
At some point we saw a network partition (see below). It appears that
Autoheal eventually worked, but afterwards the cmcmd queue wasn't on the
broker.
Here's the Autoheal sequence (note the big time gap waiting for the
sea5m1mq1 shutdown):
-----------
21:57 mpietrek at foo:/$ grep heal 2014-04-14-mq.log
2014-04-14 18:02:35 sea5m1mq2 [info] rabbit at sea5m1mq2.log:
Auto*heal*request sent to rabbit at sea5m1mq1
2014-04-14 18:02:35 sea5m1mq2 [info] rabbit at sea5m1mq2.log: Auto*heal*: I am
the winner, waiting for [rabbit at sea5m1mq1] to stop
2014-04-14 18:02:35 sea5m1mq2 [info] rabbit at sea5m1mq2.log: Auto*heal*:
final node has stopped, starting...
2014-04-14 18:57:38 sea5m1mq1 [info] rabbit at sea5m1mq1.log:
Auto*heal*request received from rabbit at sea5m1mq2
2014-04-14 18:57:38 sea5m1mq1 [info] rabbit at sea5m1mq1.log: Auto*heal*decision
2014-04-14 18:57:38 sea5m1mq1 [info] rabbit at sea5m1mq1.log: Auto*heal*: we
were selected to restart; winner is rabbit at sea5m1mq2
------------
And the rabbit at sea5m1mq2 log spew:
=ERROR REPORT==== 14-Apr-2014::18:02:30 ===
** Node rabbit at sea5m1mq1 not responding **
** Removing (timedout) connection **
=INFO REPORT==== 14-Apr-2014::18:02:30 ===
rabbit on node rabbit at sea5m1mq1 down
=ERROR REPORT==== 14-Apr-2014::18:02:30 ===
Mnesia(rabbit at sea5m1mq2): ** ERROR ** mnesia_event got
{inconsistent_database, running_partitioned_network, rabbit at sea5m1mq1}
=INFO REPORT==== 14-Apr-2014::18:02:30 ===
Statistics database started.
=INFO REPORT==== 14-Apr-2014::18:02:30 ===
Autoheal request sent to rabbit at sea5m1mq1
=ERROR REPORT==== 14-Apr-2014::18:02:30 ===
** Generic server <0.204.0> terminating
** Last message in was {mnesia_locker,rabbit at sea5m1mq1,granted}
** When Server state == {state,2,{from,<0.302.0>,#Ref<0.0.1372.163190>}}
** Reason for termination ==
** {unexpected_info,{mnesia_locker,rabbit at sea5m1mq1,granted}}
=ERROR REPORT==== 14-Apr-2014::18:02:30 ===
** Generic server <0.302.0> terminating
** Last message in was {'DOWN',#Ref<0.0.0.2733>,process,<2782.309.0>,
noconnection}
** When Server state == {state,
{0,<0.302.0>},
{{0,<2782.309.0>},#Ref<0.0.0.2733>},
{{0,<2782.309.0>},#Ref<0.0.0.2734>},
{resource,<<"/">>,queue,<<"cmcmd">>},
rabbit_mirror_queue_coordinator,
{1,
[{{0,<0.302.0>},
{view_member,
{0,<0.302.0>},
[],
{0,<2782.309.0>},
{0,<2782.309.0>}}},
{{0,<2782.309.0>},
{view_member,
{0,<2782.309.0>},
[],
{0,<0.302.0>},
{0,<0.302.0>}}}]},
0,
[{{0,<0.302.0>},{member,{[],[]},0,0}},
{{0,<2782.309.0>},{member,{[],[]},0,0}}],
[<0.301.0>],
{[],[]},
[],0,undefined,
#Fun<rabbit_misc.execute_mnesia_transaction.1>}
** Reason for termination ==
** {noproc,{gen_server2,call,
[<0.204.0>,
{submit,#Fun<rabbit_misc.6.116010224>,<0.302.0>},
infinity]}}
=ERROR REPORT==== 14-Apr-2014::18:02:30 ===
** Generic server <0.203.0> terminating
** Last message in was {mnesia_locker,rabbit at sea5m1mq1,granted}
** When Server state == {state,1,undefined}
** Reason for termination ==
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140415/7982d737/attachment.html>
More information about the rabbitmq-discuss
mailing list