[rabbitmq-discuss] Weird behaviour with mirrored queues

Tim Watson tim at rabbitmq.com
Mon Sep 2 08:50:03 BST 2013


Hi Dan,

Some questions and points of clarification (inline, below) are in order first, I think...

On 2 Sep 2013, at 02:07, Dan_b wrote:
> We have a 4 node rabbitmq cluster with 2 master/slave pairs.

RabbitMQ nodes do not exhibit master/slave characteristics - those apply only to individual queues (where the master process resides on a specific node, and replicas/slaves on others).

>  We have our
> queues set up using a nodes policy, and we use priorities to have a catch
> all mirror policy, e.g:
> 
> ha_appx        ^.+appx_.+    
> {"ha-mode":"nodes","ha-params":["rabbit at mq3","rabbit at mq4"],"ha-sync-mode":"manual"}    
> 10
> ha_appcatchall        ^.+    
> {"ha-mode":"nodes","ha-params":["rabbit at mq1","rabbit at mq2"],"ha-sync-mode":"manual"}    
> 1
> 

Just to clarify then, any queue (name) that matches the `appx_' regex will be mirrored over the nodes on mq3 and mq4, whereas everything else will be mirrored across those on hosts mq1 and mq1.

> If we then add another policy for app_y with priority 10 the majority of the
> queues for app_y disappear(from rabbitmqctl list_queues).

To be clear - you are saying that you already have some (existing) queues that have names of the form app_y* that were declared on one of your nodes. You add another ha-mode policy that matches names of that form, and subsequently when running rabbitmqctl, you can no longer see these in list_queues - is that right? What exactly does the app_y policy that you're setting look like?

>  The bindings
> still exist

I would expect that if the bindings exist then the queues are still there, if not visible.

> ... stopping and restarting the consumers (which create the
> queues if they don't exist) doesn't help.

That's not really telling us much, since queue.declare is idempotent and won't fail even if the queue already exists.

>  The consumers still connect fine,
> even for the queues that don't exist.

Have you determined in any other way (besides `rabbitmqctl list_queues') that these queues no longer exist? Do you have the management plugin installed and if so, can you see them in the web UI? If your consumers have consumer cancel notifications enabled, are they getting fired at all?

>  If the queue does exist the consumer
> doesn't consume from that queue.  It almost seems like the queue is bouncing
> around between the policies somehow.
> 

That seems very unlikely. What exactly does the policy matching regex look like, and can you give a specific example of one of the queue names that you expect to match, which is also exhibiting this behaviour?

> If we create the priority 10 policies first there are no problems (all the
> queues are created on rabbit at mq1 so the master is still being moved in that
> scenario).

Hmn. Perhaps there is some bug at work here that only presents itself when the policy is created after the queue is declared. Are publishers and/or consumers already using the queue when the policy is introduced (at runtime) or not? You mention "... so the master is still being moved in that scenario..." - what exactly do you mean by this? Are you saying that at the point when the policy is introduced, the queue process is running on a node which is *not* part of the app_y policy's ha-params array? That is the only reason why a queue process that's already running (i.e., has been declared on a certain node) would be re-located if/when a policy is added or changed. If that is the case, how did you identify where the queue process (which will become the master) was originally running?  

>  We can obviously work around by deleting the priority 1 policy
> if we want to move queues, but after the issue has occurred the only way I
> can find to fix it is to completely reset the cluster.

Why do you want to "move queues" at all? What exactly are you trying to achieve here? Are you deliberately using these policies to relocate queues that were running on a node that is not part of the ha-params for a new policy? Why would you want to do that? Not that I'm saying there's nothing wrong here - when the `nodes' policy is introduced, if the existing (and previously non-mirrored) queue is running elsewhere, the mirrors ought to be synchronised *first* and then the master will migrate once we're sure no message loss will ensue. 

>  The nodes do not
> shutdown cleanly, they just hang without stopping rabbit.
> 

That sounds worrying. Can you provide log files (both the rabbit log *and* the sasl logs) from all of the nodes involved in these operations please?

Cheers,
Tim
 



More information about the rabbitmq-discuss mailing list