<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 12pt;
font-family:Calibri
}
--></style></head>
<body class='hmmessage'><div dir='ltr'><div><div>Hi,</div><div><br></div><div>While testing my 2 nodes cluster ( 'rabbit@existing-machine' + 'rabbit@new-machine'), I encountered a fatal situation in which the queue on one of the nodes ( 'rabbit@new-machine') was "lost" after I restarted that node. Before restarting, everything worked. Also, I was able to restart the nodes previously without any issue until that error appeared.</div><div><br></div><div>1. I'm using <span style="font-size: 12pt;">RabbitMQ 3.2.1 on Erlang R16B02 on Windows 7 for both machines.</span></div><div><span style="font-size: 12pt;">2. The config for both nodes uses:</span></div><div><div> <span class="Apple-tab-span" style="white-space: pre;"> </span> {cluster_nodes, {['rabbit@existing-machine'], disc}},</div><div> <span class="Apple-tab-span" style="white-space: pre;"> </span> {cluster_partition_handling, autoheal}</div></div><div><span style="font-size: 12pt;">3. A Javascript JSOCK-STOMP consumer running on my Android phone was connected to </span><span style="font-size: 12pt;">'rabbit@new-machine'</span><span style="font-size: 12pt;"> and subscribed to queue "queue01".</span></div><div><span style="font-size: 12pt;">4. </span><span style="font-size: 12pt;">I used NCAT (command line) to connect to </span><span style="font-size: 12pt;">'rabbit@existing-machine'</span><span style="font-size: 12pt;"> to send messages to "</span><span style="font-size: 12pt;">queue01" which I'll read off my Android phone if cluster mirroring works.</span></div><div><span style="font-size: 12pt;"><br>After restarting </span><span style="font-size: 12pt;">'rabbit@new-machine'</span><span style="font-size: 12pt;">, the following error appeared on the </span><span style="font-size: 12pt;">'rabbit@new-machine'</span><span style="font-size: 12pt;"> log. The Javascript consumer has code re-subscribe to the queue if it encounters an error or disconnects, but this didn't appear to work for this case. "queue01" was missing from the port </span><span style="font-size: 12pt;">15672</span><span style="font-size: 12pt;"> management web console.</span></div></div><div><br></div><div><div>=INFO REPORT==== 15-Dec-2013::01:42:31 ===</div><div>Server startup complete; 10 plugins started.</div><div> * amqp_client</div><div> * cowboy</div><div> * mochiweb</div><div> * rabbitmq_management</div><div> * rabbitmq_management_agent</div><div> * rabbitmq_stomp</div><div> * rabbitmq_web_dispatch</div><div> * rabbitmq_web_stomp</div><div> * sockjs</div><div> * webmachine</div><div><br></div><div>=ERROR REPORT==== 15-Dec-2013::01:48:47 ===</div><div>connection <0.473.0>, channel 1 - soft error:</div><div>{amqp_error,not_found,</div><div> "home node 'rabbit@existing-machine' of durable queue 'queue01' in vhost '/' is down or inaccessible",</div><div> 'queue.declare'}</div><div><br></div><div>=ERROR REPORT==== 15-Dec-2013::01:48:47 ===</div><div>STOMP error frame sent:</div><div>Message: not_found</div><div>Detail: "NOT_FOUND - home node 'rabbit@existing-machine' of durable queue 'queue01' in vhost '/' is down or inaccessible\n"</div><div>Server private detail: none</div></div><div><br></div><div>On the other hand, the log on 'rabbit@existing-machine' didn't show anything unusual, and I was able to still see "queue01" on the <span style="font-size: 12pt;">port </span><span style="font-size: 12pt;">15672</span><span style="font-size: 12pt;"> management web console for the existing-machine.</span></div><div><br></div><div><div>=INFO REPORT==== 15-Dec-2013::00:19:44 ===</div><div>Server startup complete; 10 plugins started.</div><div> * amqp_client</div><div> * cowboy</div><div> * mochiweb</div><div> * rabbitmq_management</div><div> * rabbitmq_management_agent</div><div> * rabbitmq_stomp</div><div> * rabbitmq_web_dispatch</div><div> * rabbitmq_web_stomp</div><div> * sockjs</div><div> * webmachine</div><div><br></div><div>=INFO REPORT==== 15-Dec-2013::00:29:57 ===</div><div>accepting STOMP connection <0.491.0> (127.0.0.1:1207 -> 127.0.0.1:61613)</div><div><br></div><div>=INFO REPORT==== 15-Dec-2013::01:39:52 ===</div><div>Statistics database started.</div><div><br></div><div>=INFO REPORT==== 15-Dec-2013::01:39:54 ===</div><div>rabbit on node rabbit@new-machine down</div><div><br></div><div>=INFO REPORT==== 15-Dec-2013::01:42:20 ===</div><div>rabbit on node rabbit@new-machine up</div></div><div><br></div><div><br></div><div>Other than the queue problem, I was able to add and delete virtual hosts on '<span style="font-size: 12pt;">rabbit@existing-machine' and this was reflected on '</span><span style="font-size: 12pt;">rabbit@new-machine' </span><span style="font-size: 12pt;">management web console.</span></div><div><br></div><div><span style="font-size: 12pt;">Finally, I was only able to recover the queue problem by restarting rabbitmq on</span><span style="font-size: 12pt;"> </span><span style="font-size: 12pt;">'</span><span style="font-size: 12pt;">rabbit@existing-machine'.</span></div><div><span style="font-size: 12pt;"><br></span></div><div><span style="font-size: 12pt;">Rgds,</span></div><div><span style="font-size: 12pt;">Joshua</span></div><div><br></div><div> </div> </div></body>
</html>