[rabbitmq-discuss] rabbitmq connections blocking but memory is blow watermark

Fri Mar 1 17:21:45 GMT 2013

On 01/03/13 14:39, xzhang84 at gmail.com wrote:
> 1.  after  one node of whole cluster over watermark, but other node is
> work fine, my application can't connect to rabbitmq cluster?  ps:  we
> use spring.amqp & spring-rabbit with version 1.1.0.RELEASE

Yes. When *any* node of the cluster goes over the watermark, *all* nodes 
in the cluster stop accepting messages. This is because any node can 
accept messages which will be queued (and hence contribute to memory 
pressure) in the node which is under pressure.

> 2. node will down for what reason when over watermark?

Well, it shouldn't. I would guess the node went down because it was out 
of memory - while blocking publishes is a good way to stop using more 
memory, it's not perfect - a node could still use memory for other 
reasons, and other operating system processes could exhaust memory 
anyway. The (sasl?) logs may give a hint.

> 3.  why after restart node, there is still blocking connection, but with
> rabbitmqctl they all in running state?

Oh gosh.

So it looks like when a cluster node that is *not* running the 
management database crashes, the management database does not clean up 
records of connections and channels that were on that node. That's quite 
an obvious bug, and has existed since RabbitMQ 2.2.0 without anyone 
before you complaining, so congratulations.

Most of the effects you talk about are caused by management listing 
connections which no longer really exist, hence why you saw a 
discrepancy between management and rabbitmqctl.

We'll fix this bug in the next release, but in the mean time you can invoke:

$ rabbitmqctl eval 'exit(global:whereis_name(rabbit_mgmt_db), bang).'

This will forcibly kill the management database, after which it will 
then restart and reconstruct itself, thus clearing the ghost connections.

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, VMware