[rabbitmq-discuss] Missing Durable Queues on Startup

Chris Larsen clarsen at llnw.com
Tue May 8 01:45:37 BST 2012


Hello, we ran into an odd situation today where RabbitMQ seemed to start
properly but it didn't load most of the durable queues from the mnesia.
Running stop_app, then start_app brought back some of the queues but not
all. After we found out that not all queues were restored (after a few
hours), running stop_app, then start_app again brought the rest of the
queues online. Has anyone run into a similar situation?



Here are some notes about our setup with a few log entries from the logs
below. We have 6 machines in the cluster split into pairs running drbd and
pacemaker for failover. A glitchy switch caused one of these pairs to
split-brain and both MQ resources wound up on the same physical host. Drbd
seemed to be fine and after we resolved the split-brain, that's when we
noticed the missing queues. There weren't any errors in the startup_log or
startup_err files. We’re not using HA in rabbit itself, the queues are just
persistent and durable on each node in the cluster.



We had a number of messages in the SASL logs with “nodedown” so I wonder if
the MQ instances simply didn’t join the cluster properly the first couple
of times but finally did on the last try? I didn’t check the status of the
nodes in the cluster on each node (as suggested elsewhere) in between
restarts but I’ll give that a try if it happens again. Thanks for your help!



RabbitMQ 2.5.1

Erlang R13B03

Ubuntu Server 64bit 2.6.38-10

drbd 8.3.7



=ERROR REPORT==== 7-May-2012::10:11:30 ===

Mnesia('rabbit2 at host2'): ** ERROR ** Mnesia on 'rabbit2 at host2' could not
connect to node(s) ['rabbit1 at host1']



=INFO REPORT==== 7-May-2012::10:11:30 ===

Limiting to approx 32668 file handles (29399 sockets)



=INFO REPORT==== 7-May-2012::10:12:46 ===

msg_store_transient: using rabbit_msg_store_ets_index to provide index



=INFO REPORT==== 7-May-2012::10:12:46 ===

msg_store_persistent: using rabbit_msg_store_ets_index to provide index



=WARNING REPORT==== 7-May-2012::10:12:46 ===

msg_store_persistent: rebuilding indices from scratch



=INFO REPORT==== 7-May-2012::10:12:46 ===

started TCP Listener on 192.168.1.1:5672



=ERROR REPORT==== 7-May-2012::10:13:42 ===

Mnesia('rabbit2 at host2'): ** ERROR ** mnesia_event got
{inconsistent_database, starting_partitioned_network, 'rabbit1 at host3'}



=ERROR REPORT==== 7-May-2012::10:13:42 ===

Mnesia('rabbit2 at host2'): ** ERROR ** mnesia_event got
{inconsistent_database, starting_partitioned_network, 'rabbit2 at host4'}



=SUPERVISOR REPORT==== 7-May-2012::10:13:32 ===

     Supervisor: {<0.11398.2442>,rabbit_channel_sup}

     Context:    shutdown

     Reason:     reached_max_restart_intensity

     Offender:   [{pid,<0.11400.2442>},

                  {name,channel},

                  {mfa,

                      {rabbit_channel,start_link,

                          [1,<0.11368.2442>,<0.11399.2442>,<0.11368.2442>,

                           rabbit_framing_amqp_0_9_1,

                           {user,<<"my_app">>,true,

                               rabbit_auth_backend_internal,

                               {internal_user,<<"my_app">>,


<<199,64,175,52,127,65,248,9,70,171,15,9,5,

                                     122,73,4,195,147,238,67>>,

                                   true}},

                           <<"/my_app">>,[],<0.11366.2442>,

                           #Fun<rabbit_channel_sup.0.15412730>]}},

                  {restart_type,intrinsic},

                  {shutdown,4294967295},

                  {child_type,worker}]





=CRASH REPORT==== 7-May-2012::10:13:33 ===

  crasher:

    initial call: gen:init_it/6

    pid: <0.25562.2442>

    registered_name: []

    exception exit: {{badmatch,

                         {error,

                             [{<7748.8396.531>,

                               {exit,

                                   {nodedown,'rabbit1 at host1'},

                                   []}}]}},

                     [{rabbit_channel,terminate,2},

                      {gen_server2,terminate,3},

                      {proc_lib,wake_up,3}]}

      in function  gen_server2:terminate/3

    ancestors: [<0.25560.2442>,<0.25544.2442>,<0.25542.2442>,

                  rabbit_tcp_client_sup,rabbit_sup,<0.124.0>]

    messages: []

    links: [<0.25560.2442>]

    dictionary: [{{exchange_stats,

                       {resource,<<"/my_app">>,exchange,

                           <<"service.exchange">>}},

                   [{confirm,6},{publish,6}]},

                  {{queue_exchange_stats,

                       {<0.253.0>,

                        {resource,<<"/my_app">>,exchange,

                            <<"data.exchange">>}}},

                   [{confirm,6},{publish,6}]},

                  {delegate,delegate_4},

                  {{monitoring,<0.253.0>},true},

                  {{exchange_stats,

                       {resource,<<"/my_app">>,exchange,

                           <<"data.exchange">>}},

                   [{confirm,6},{publish,6}]},

                  {guid,{{11,<0.25562.2442>},11}}]

    trap_exit: true

    status: running

    heap_size: 987

    stack_size: 24

    reductions: 11357

  neighbours:



=SUPERVISOR REPORT==== 7-May-2012::10:13:33 ===

     Supervisor: {<0.25560.2442>,rabbit_channel_sup}

     Context:    child_terminated

     Reason:     {{badmatch,

                      {error,

                          [{<7748.8396.531>,

                            {exit,{nodedown,'rabbit1 at host1'},[]}}]}},

                  [{rabbit_channel,terminate,2},

                   {gen_server2,terminate,3},

                   {proc_lib,wake_up,3}]}

     Offender:   [{pid,<0.25562.2442>},

                  {name,channel},

                  {mfa,

                      {rabbit_channel,start_link,

                          [1,<0.25545.2442>,<0.25561.2442>,<0.25545.2442>,

                           rabbit_framing_amqp_0_9_1,

                           {user,<<"my_app">>,true,

                               rabbit_auth_backend_internal,

                               {internal_user,<<"my_app">>,


<<199,64,175,52,127,65,248,9,70,171,15,9,5,

                                     122,73,4,195,147,238,67>>,

                                   true}},

                           <<"/my_app">>,[],<0.25543.2442>,

                           #Fun<rabbit_channel_sup.0.15412730>]}},

                  {restart_type,intrinsic},

                  {shutdown,4294967295},

                  {child_type,worker}]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120507/022b8f63/attachment.htm>


More information about the rabbitmq-discuss mailing list