[rabbitmq-discuss] queue gets removed from a node

Tue Aug 30 16:06:03 BST 2011

On 30/08/2011 12:50, Matthew Sackman wrote:
> Hi Cezary,
>
> On Thu, Aug 25, 2011 at 01:09:47PM +0100, Cezary Siwek wrote:
>> I'm facing an issue with my rabittmq cluster where a queue
>> disappears from a node.
>> I have 2 nodes in a cluster working in "disk mode".
>> Node1 - only consumes messages from the queue.
>> Node2 - only publishes messages to the queue.
>>
>> The consumer script on Node1 creates a durable queue and waits for
>> messages.
>> Everything works fine until some no-activity time. I can't say how
>> long it needs to be but usually after more than 24h the queue
>> disappears from Node2 and the consumer stops receiving messages. The
>> list_queues command shows the queue exists on Node1 but not on
>> Node2.
>> I've done a packet trace when it happened and I can see some packets
>> are being exchanged between nodes. Also cluster_status commands
>> shows that both nodes are up and running in the cluster.
>> When I try to declare_queue on the Node2 i get 'NOT_FOUND - no queue
>> 'msgs' in vhost '/vhost1'.
>> All I need to do to have the Node2 running is to run stop_app and start_app.
>>
>> Both nodes are sitting behind firewalls (in separate networks) but
>> both firewalls have been granted to pass all the traffic between
>> these two boxes.
>> It happens on my dev platfrom. On production I don't think I will
>> ever have such long quiet periods but I need to find out what is
>> causing this.
> Hmm, this is odd. It suggests that the two nodes have lost contact with
> each other which is why Node2 is responding with the NOT_FOUND when you
> try to redeclare the queue. However, if at this point the cluster_status
> on both nodes is suggesting everything is clustered and happy, then this
> is very odd indeed. Could you check that you can achieve these sets of
> circumstances?
>
> Also, do you have the logs for both nodes during the no-activity time?
> I'm curious whether there are entries in there that suggest the cluster
> has split apart. If they're large, then maybe send them to us off-list.
>
>

Hi Matthew,

Thanks for your response.

Yes, you are right. This happens when the nodes become unavailable.
I have reproduced the issue by breaking up the link between two nodes. 
The queue on node2 gets deleted when the node1 becomes unavailable. I 
was wrong saying the cluster_status commands shows that cluster is up 
and running. In fact, two nodes stay detached until stop_app/start_app 
executed.

I know that rabbitmq cluster is not designed to work on unreliable links 
but is it a correct behavior when a queue gets deleted from the node? At 
least the node should allow producer to re-create the queue and keep all 
the messages until connection between nodes is back up.
Is there any way to re-establish connection between two nodes when the 
link goes up?

Regards,