[rabbitmq-discuss] rabbitmq connections blocking but memory is blow watermark
Simon MacMullen
simon at rabbitmq.com
Fri Mar 1 17:21:45 GMT 2013
On 01/03/13 14:39, xzhang84 at gmail.com wrote:
> 1. after one node of whole cluster over watermark, but other node is
> work fine, my application can't connect to rabbitmq cluster? ps: we
> use spring.amqp & spring-rabbit with version 1.1.0.RELEASE
Yes. When *any* node of the cluster goes over the watermark, *all* nodes
in the cluster stop accepting messages. This is because any node can
accept messages which will be queued (and hence contribute to memory
pressure) in the node which is under pressure.
> 2. node will down for what reason when over watermark?
Well, it shouldn't. I would guess the node went down because it was out
of memory - while blocking publishes is a good way to stop using more
memory, it's not perfect - a node could still use memory for other
reasons, and other operating system processes could exhaust memory
anyway. The (sasl?) logs may give a hint.
> 3. why after restart node, there is still blocking connection, but with
> rabbitmqctl they all in running state?
Oh gosh.
So it looks like when a cluster node that is *not* running the
management database crashes, the management database does not clean up
records of connections and channels that were on that node. That's quite
an obvious bug, and has existed since RabbitMQ 2.2.0 without anyone
before you complaining, so congratulations.
Most of the effects you talk about are caused by management listing
connections which no longer really exist, hence why you saw a
discrepancy between management and rabbitmqctl.
We'll fix this bug in the next release, but in the mean time you can invoke:
$ rabbitmqctl eval 'exit(global:whereis_name(rabbit_mgmt_db), bang).'
This will forcibly kill the management database, after which it will
then restart and reconstruct itself, thus clearing the ghost connections.
Cheers, Simon
--
Simon MacMullen
RabbitMQ, VMware
More information about the rabbitmq-discuss
mailing list