[rabbitmq-discuss] Home node of durable queue is down or inaccessible

Sun Dec 15 08:59:18 GMT 2013

Hi,
While testing my 2 nodes cluster ( 'rabbit at existing-machine' +  'rabbit at new-machine'), I encountered a fatal situation in which the queue on one of the nodes ( 'rabbit at new-machine') was "lost" after I restarted that node.  Before restarting, everything worked.  Also, I was able to restart the nodes previously without any issue until that error appeared.
1. I'm using RabbitMQ 3.2.1 on Erlang R16B02 on Windows 7 for both machines.2. The config for both nodes uses:       	         {cluster_nodes, {['rabbit at existing-machine'], disc}},       	        {cluster_partition_handling, autoheal}3. A Javascript JSOCK-STOMP consumer running on my Android phone was connected to 'rabbit at new-machine' and subscribed to queue "queue01".4. I used NCAT (command line) to connect to 'rabbit at existing-machine' to send messages to "queue01" which I'll read off my Android phone if cluster mirroring works.
After restarting 'rabbit at new-machine', the following error appeared on the 'rabbit at new-machine' log.  The Javascript consumer has code re-subscribe to the queue if it encounters an error or disconnects, but this didn't appear to work for this case.  "queue01" was missing from the port 15672 management web console.
=INFO REPORT==== 15-Dec-2013::01:42:31 ===Server startup complete; 10 plugins started. * amqp_client * cowboy * mochiweb * rabbitmq_management * rabbitmq_management_agent * rabbitmq_stomp * rabbitmq_web_dispatch * rabbitmq_web_stomp * sockjs * webmachine
=ERROR REPORT==== 15-Dec-2013::01:48:47 ===connection <0.473.0>, channel 1 - soft error:{amqp_error,not_found,            "home node 'rabbit at existing-machine' of durable queue 'queue01' in vhost '/' is down or inaccessible",            'queue.declare'}
=ERROR REPORT==== 15-Dec-2013::01:48:47 ===STOMP error frame sent:Message: not_foundDetail: "NOT_FOUND - home node 'rabbit at existing-machine' of durable queue 'queue01' in vhost '/' is down or inaccessible\n"Server private detail: none
On the other hand, the log on  'rabbit at existing-machine' didn't show anything unusual, and I was able to still see "queue01" on the port 15672 management web console for the existing-machine.
=INFO REPORT==== 15-Dec-2013::00:19:44 ===Server startup complete; 10 plugins started. * amqp_client * cowboy * mochiweb * rabbitmq_management * rabbitmq_management_agent * rabbitmq_stomp * rabbitmq_web_dispatch * rabbitmq_web_stomp * sockjs * webmachine
=INFO REPORT==== 15-Dec-2013::00:29:57 ===accepting STOMP connection <0.491.0> (127.0.0.1:1207 -> 127.0.0.1:61613)
=INFO REPORT==== 15-Dec-2013::01:39:52 ===Statistics database started.
=INFO REPORT==== 15-Dec-2013::01:39:54 ===rabbit on node rabbit at new-machine down
=INFO REPORT==== 15-Dec-2013::01:42:20 ===rabbit on node rabbit at new-machine up

Other than the queue problem, I was able to add and delete virtual hosts on 'rabbit at existing-machine' and this was reflected on 'rabbit at new-machine' management web console.
Finally, I was only able to recover the queue problem by restarting rabbitmq on 'rabbit at existing-machine'.
Rgds,Joshua

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131215/6d12e989/attachment.html>