Marek Majkowski majek04 at gmail.com
Fri Feb 4 11:53:42 GMT 2011

On Thu, Feb 3, 2011 at 20:36, Ivan Sanchez <s4nchez at gmail.com> wrote:
>  Yesterday I've noticed some very strange behaviour in one of our
> rabbitmq cluster nodes. Its queues became unresponsive and running
> "rabbitmqctl list_connections" was returning that all the connections
> were either "blocking" or "blocked". The documentation doesn't mention
> these states. Does anyone know what they mean?
>  To give a bit of context: we noticed this problem when good portion
> of our clients stopped receiving messages. Looking at all the servers
> we found one that was using too much CPU and also swapping to disk.
> This node was the only presenting this behaviour, but it seemed like
> this problem compromised the whole cluster. We haven't touched these
> servers for ages (they are dedicated to rabbitmq) and the system load
> was completely under normal levels. We use simple DNS round-robin for
> clients to connect to the cluster and none of our messages are
> persistent, so seeing swapping really scared me.
>  After restarting the whole cluster a few times the problem
> persisted, always on the same server. We even tried force_reset in all
> nodes, but that also didn't help. Things just went back to normal
> after we removed the problematic node from the cluster. Now my task if
> figure out what can be the problem.
>  Did anyone have experience with this kind of behaviour? I'm even
> considering hardware problem, but so far didn't find anything
> indicating that was the case.


When RabbitMQ is using more memory than it should,
to avoid crashing, it stops accepting new messages.

In AMQP publishing messages is asynchronous, and
it's illegal to 'reject' a published message.

Instead, when RabbitMQ is under a memory pressure
it stops receiving data from tcp connections that try to
do 'basic_publish'.

That's what the 'blocked' connection state means.

It's perfectly legal to open second connection and
consume messages using it as long as it doesn't do 'basic_publish'.

Please check for memory alerts in your RabbitMQ logs,
check if memory watermark is being set properly
and if you have enough memory for your Rabbit.


