[rabbitmq-discuss] When rabbitmq is clustered with one other node we see a very slow dequeue of messages

GENTLING Gregory Gregory.GENTLING at us.thalesgroup.com
Wed Dec 4 01:52:24 GMT 2013


Classification: Open
When rabbitmq is clustered with one other node we see a very slow dequeue of messages. The scenario is simple, Node A and Node B in the cluster. They are clustered with the auto_heal option and default netticktime. Steps to repeat are:


(These are all local connections)

(1)    Connect client A1 to Node A

a.       Client A1 creates a topic exchange

b.      Client A1 is a publisher with 1msg/sec

(2)    Connect client A2 to Node A

a.       Client A1 listens for the messages in the exchange

(3)    Connect client B to Node B   (this is important, the issue does not occur unless you have this remote client)

a.       Client B listens for the messages in the exchange

(4)    Pull the plug on Node B (you will not see the issue with a graceful shutdown), alternately you can just use "route" to now make Node B not routable from Node A

a.       If you kill rabbitmq, you will not see the issue

(5)    Wait for netticktime (or until you see NodeB being removed from the cluster in Node A's log)

(6)    Client A2 no longer receives messages at 1msg/sec, it will fall considerably behind but recover in about 10 mins.

We have two setups with slightly different network setups (two pairs of Node A and B). One we see this issue on, the other we do not, so this is not an issue that can be always reproduced.

Other issues observed in this state:

*         rabbitmqctl cluster_status/list_queues/list_connections/list_exchanges all hang, rabbitmq status does not hang

*         declareQueue, declareExchange, declareExchangePassive all hang

*         disabling auto_heal does not help

*         tested with both Erlang 5.9 and 5.10.3

*         tested with both RabbitMq 3.1.5 and 3.1.3, same issue in both

*         don't see this issue with direct exchange

*         nothing in vmstat out of the ordinary, CPU is not pegged, system is not thrashing


Things we have ruled out:

*         Iptables, tested with no rules

*         Selinux, tested in permissive

*         Java drivers


Same thing as described here:

http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-June/027674.html

Thank you,

Greg Gentling
Principal Software Architecture - Avant CommonApps
Thales Avionics, Inc.
In-Flight Entertainment and Connectivity
Irvine, CA 92618
949-595-4943


[@@OPEN@@]

This email was classified by GENTLING Gregory on Tuesday, December 03, 2013 5:52:25 PM.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131203/129aa6f9/attachment.html>


More information about the rabbitmq-discuss mailing list