<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"><meta name="Generator" content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style></head><body lang="EN-US" link="blue" vlink="purple"><div class="WordSection1"><p class="MsoNormal">Hello, we ran into an odd situation today where RabbitMQ seemed to start properly but it didn&#39;t load most of the durable queues from the mnesia. Running stop_app, then start_app brought back some of the queues but not all. After we found out that not all queues were restored (after a few hours), running stop_app, then start_app again brought the rest of the queues online. Has anyone run into a similar situation?</p>
<p class="MsoNormal"> </p><p class="MsoNormal">Here are some notes about our setup with a few log entries from the logs below. We have 6 machines in the cluster split into pairs running drbd and pacemaker for failover. A glitchy switch caused one of these pairs to split-brain and both MQ resources wound up on the same physical host. Drbd seemed to be fine and after we resolved the split-brain, that&#39;s when we noticed the missing queues. There weren&#39;t any errors in the startup_log or startup_err files. We’re not using HA in rabbit itself, the queues are just persistent and durable on each node in the cluster.</p>
<p class="MsoNormal"> </p><p class="MsoNormal">We had a number of messages in the SASL logs with “nodedown” so I wonder if the MQ instances simply didn’t join the cluster properly the first couple of times but finally did on the last try? I didn’t check the status of the nodes in the cluster on each node (as suggested elsewhere) in between restarts but I’ll give that a try if it happens again. Thanks for your help!</p>
<p class="MsoNormal"> </p><p class="MsoNormal">RabbitMQ 2.5.1</p><p class="MsoNormal">Erlang R13B03</p><p class="MsoNormal">Ubuntu Server 64bit 2.6.38-10</p><p class="MsoNormal">drbd 8.3.7</p><p class="MsoNormal"> </p><p class="MsoNormal">
=ERROR REPORT==== 7-May-2012::10:11:30 ===</p><p class="MsoNormal">Mnesia(&#39;rabbit2@host2&#39;): ** ERROR ** Mnesia on &#39;rabbit2@host2&#39; could not connect to node(s) [&#39;rabbit1@host1&#39;]</p><p class="MsoNormal">
 </p><p class="MsoNormal">=INFO REPORT==== 7-May-2012::10:11:30 ===</p><p class="MsoNormal">Limiting to approx 32668 file handles (29399 sockets)</p><p class="MsoNormal"> </p><p class="MsoNormal">=INFO REPORT==== 7-May-2012::10:12:46 ===</p>
<p class="MsoNormal">msg_store_transient: using rabbit_msg_store_ets_index to provide index</p><p class="MsoNormal"> </p><p class="MsoNormal">=INFO REPORT==== 7-May-2012::10:12:46 ===</p><p class="MsoNormal">msg_store_persistent: using rabbit_msg_store_ets_index to provide index</p>
<p class="MsoNormal"> </p><p class="MsoNormal">=WARNING REPORT==== 7-May-2012::10:12:46 ===</p><p class="MsoNormal">msg_store_persistent: rebuilding indices from scratch</p><p class="MsoNormal"> </p><p class="MsoNormal">=INFO REPORT==== 7-May-2012::10:12:46 ===</p>
<p class="MsoNormal">started TCP Listener on <a href="http://192.168.1.1:5672">192.168.1.1:5672</a></p><p class="MsoNormal"> </p><p class="MsoNormal">=ERROR REPORT==== 7-May-2012::10:13:42 ===</p><p class="MsoNormal">Mnesia(&#39;rabbit2@host2&#39;): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, &#39;rabbit1@host3&#39;}</p>
<p class="MsoNormal"> </p><p class="MsoNormal">=ERROR REPORT==== 7-May-2012::10:13:42 ===</p><p class="MsoNormal">Mnesia(&#39;rabbit2@host2&#39;): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, &#39;rabbit2@host4&#39;}</p>
<p class="MsoNormal"> </p><p class="MsoNormal">=SUPERVISOR REPORT==== 7-May-2012::10:13:32 ===</p><p class="MsoNormal">     Supervisor: {&lt;0.11398.2442&gt;,rabbit_channel_sup}</p><p class="MsoNormal">     Context:    shutdown</p>
<p class="MsoNormal">     Reason:     reached_max_restart_intensity</p><p class="MsoNormal">     Offender:   [{pid,&lt;0.11400.2442&gt;},</p><p class="MsoNormal">                  {name,channel},</p><p class="MsoNormal">                  {mfa,</p>
<p class="MsoNormal">                      {rabbit_channel,start_link,</p><p class="MsoNormal">                          [1,&lt;0.11368.2442&gt;,&lt;0.11399.2442&gt;,&lt;0.11368.2442&gt;,</p><p class="MsoNormal">                           rabbit_framing_amqp_0_9_1,</p>
<p class="MsoNormal">                           {user,&lt;&lt;&quot;my_app&quot;&gt;&gt;,true,</p><p class="MsoNormal">                               rabbit_auth_backend_internal,</p><p class="MsoNormal">                               {internal_user,&lt;&lt;&quot;my_app&quot;&gt;&gt;,</p>
<p class="MsoNormal">                                   &lt;&lt;199,64,175,52,127,65,248,9,70,171,15,9,5,</p><p class="MsoNormal">                                     122,73,4,195,147,238,67&gt;&gt;,</p><p class="MsoNormal">
                                   true}},</p><p class="MsoNormal">                           &lt;&lt;&quot;/my_app&quot;&gt;&gt;,[],&lt;0.11366.2442&gt;,</p><p class="MsoNormal">                           #Fun&lt;rabbit_channel_sup.0.15412730&gt;]}},</p>
<p class="MsoNormal">                  {restart_type,intrinsic},</p><p class="MsoNormal">                  {shutdown,4294967295},</p><p class="MsoNormal">                  {child_type,worker}]</p><p class="MsoNormal"> </p>
<p class="MsoNormal"> </p><p class="MsoNormal">=CRASH REPORT==== 7-May-2012::10:13:33 ===</p><p class="MsoNormal">  crasher:</p><p class="MsoNormal">    initial call: gen:init_it/6</p><p class="MsoNormal">    pid: &lt;0.25562.2442&gt;</p>
<p class="MsoNormal">    registered_name: []</p><p class="MsoNormal">    exception exit: {{badmatch,</p><p class="MsoNormal">                         {error,</p><p class="MsoNormal">                             [{&lt;7748.8396.531&gt;,</p>
<p class="MsoNormal">                               {exit,</p><p class="MsoNormal">                                   {nodedown,&#39;rabbit1@host1&#39;},</p><p class="MsoNormal">                                   []}}]}},</p>
<p class="MsoNormal">                     [{rabbit_channel,terminate,2},</p><p class="MsoNormal">                      {gen_server2,terminate,3},</p><p class="MsoNormal">                      {proc_lib,wake_up,3}]}</p><p class="MsoNormal">
      in function  gen_server2:terminate/3</p><p class="MsoNormal">    ancestors: [&lt;0.25560.2442&gt;,&lt;0.25544.2442&gt;,&lt;0.25542.2442&gt;,</p><p class="MsoNormal">                  rabbit_tcp_client_sup,rabbit_sup,&lt;0.124.0&gt;]</p>
<p class="MsoNormal">    messages: []</p><p class="MsoNormal">    links: [&lt;0.25560.2442&gt;]</p><p class="MsoNormal">    dictionary: [{{exchange_stats,</p><p class="MsoNormal">                       {resource,&lt;&lt;&quot;/my_app&quot;&gt;&gt;,exchange,</p>
<p class="MsoNormal">                           &lt;&lt;&quot;service.exchange&quot;&gt;&gt;}},</p><p class="MsoNormal">                   [{confirm,6},{publish,6}]},</p><p class="MsoNormal">                  {{queue_exchange_stats,</p>
<p class="MsoNormal">                       {&lt;0.253.0&gt;,</p><p class="MsoNormal">                        {resource,&lt;&lt;&quot;/my_app&quot;&gt;&gt;,exchange,</p><p class="MsoNormal">                            &lt;&lt;&quot;data.exchange&quot;&gt;&gt;}}},</p>
<p class="MsoNormal">                   [{confirm,6},{publish,6}]},</p><p class="MsoNormal">                  {delegate,delegate_4},</p><p class="MsoNormal">                  {{monitoring,&lt;0.253.0&gt;},true},</p><p class="MsoNormal">
                  {{exchange_stats,</p><p class="MsoNormal">                       {resource,&lt;&lt;&quot;/my_app&quot;&gt;&gt;,exchange,</p><p class="MsoNormal">                           &lt;&lt;&quot;data.exchange&quot;&gt;&gt;}},</p>
<p class="MsoNormal">                   [{confirm,6},{publish,6}]},</p><p class="MsoNormal">                  {guid,{{11,&lt;0.25562.2442&gt;},11}}]</p><p class="MsoNormal">    trap_exit: true</p><p class="MsoNormal">    status: running</p>
<p class="MsoNormal">    heap_size: 987</p><p class="MsoNormal">    stack_size: 24</p><p class="MsoNormal">    reductions: 11357</p><p class="MsoNormal">  neighbours:</p><p class="MsoNormal"> </p><p class="MsoNormal">=SUPERVISOR REPORT==== 7-May-2012::10:13:33 ===</p>
<p class="MsoNormal">     Supervisor: {&lt;0.25560.2442&gt;,rabbit_channel_sup}</p><p class="MsoNormal">     Context:    child_terminated</p><p class="MsoNormal">     Reason:     {{badmatch,</p><p class="MsoNormal">                      {error,</p>
<p class="MsoNormal">                          [{&lt;7748.8396.531&gt;,</p><p class="MsoNormal">                            {exit,{nodedown,&#39;rabbit1@host1&#39;},[]}}]}},</p><p class="MsoNormal">                  [{rabbit_channel,terminate,2},</p>
<p class="MsoNormal">                   {gen_server2,terminate,3},</p><p class="MsoNormal">                   {proc_lib,wake_up,3}]}</p><p class="MsoNormal">     Offender:   [{pid,&lt;0.25562.2442&gt;},</p><p class="MsoNormal">
                  {name,channel},</p><p class="MsoNormal">                  {mfa,</p><p class="MsoNormal">                      {rabbit_channel,start_link,</p><p class="MsoNormal">                          [1,&lt;0.25545.2442&gt;,&lt;0.25561.2442&gt;,&lt;0.25545.2442&gt;,</p>
<p class="MsoNormal">                           rabbit_framing_amqp_0_9_1,</p><p class="MsoNormal">                           {user,&lt;&lt;&quot;my_app&quot;&gt;&gt;,true,</p><p class="MsoNormal">                               rabbit_auth_backend_internal,</p>
<p class="MsoNormal">                               {internal_user,&lt;&lt;&quot;my_app&quot;&gt;&gt;,</p><p class="MsoNormal">                                   &lt;&lt;199,64,175,52,127,65,248,9,70,171,15,9,5,</p><p class="MsoNormal">
                                     122,73,4,195,147,238,67&gt;&gt;,</p><p class="MsoNormal">                                   true}},</p><p class="MsoNormal">                           &lt;&lt;&quot;/my_app&quot;&gt;&gt;,[],&lt;0.25543.2442&gt;,</p>
<p class="MsoNormal">                           #Fun&lt;rabbit_channel_sup.0.15412730&gt;]}},</p><p class="MsoNormal">                  {restart_type,intrinsic},</p><p class="MsoNormal">                  {shutdown,4294967295},</p>
<p class="MsoNormal">                  {child_type,worker}]</p><p class="MsoNormal"> </p></div></body></html>