[rabbitmq-discuss] RabbitMQ timing out under small load

Thu Jan 8 00:51:40 GMT 2009

Greg,

On Wed, Jan 7, 2009 at 10:56 PM, gfodor <gfodor at gmail.com> wrote:
> That said, I was having connectivity problems on both of them for sure, from
> running consumers through list_queues. At the end of the day the thing I
> would have been able to determine is basically if one node was dropping them
> less frequently than others (ie, node 2 may have just simply been
> terminating them upon connection, whereas the other one may only have
> terminated them once the connection tried to access a queue on node 2.)

What you mean exactly with connectivity problems? What I am trying to
isolate with this question is whether the Rabbit instance was
unresponsive or whether you were having general network issues.

> Honestly I am not sure where the queues lived, since I don't think RabbitMQ
> can really tell you directly.

There is no API call to tell you the node (mainly because this should
be transparent to the API consumer) but if you use the Erlang shell,
you could get the PID of the queue and from that, Erlang can tell on
which node the queue resides. I'm not going to go into the details of
this because this would involve a learning curve and I'm trying to
narrow down the possibilities, but suffice is to say that one of nice
things about Erlang/OTP is all of the features that allow you to poke
around in a running system.

> I was not able to drain them by performing get operations due to the
> constant timeouts and the fact that the get operations were taking 2-3
> seconds a piece. I ended up fixing it by performing a delete queue
> operations and then creating the queue over again.

This may be OT but have you tried purging the queue?

The timeouts that you are seeing *may* be due to network latency, but
I can't know for sure. For example, if you connect to a node in the
cluster and issue a basic.get for a queue that exists on a different
node, internally this will send a message to a remote process and wait
for the response, with a default timeout of 5 seconds.

> Here's a potentially more productive question: if I am able to get to the
> point where I am sure that rabbitMQ is causing the problem and not some
> external TCP factors, is there any other way to see what RabbitMQ is doing
> internally other than by looking at the log?

In general yes. As I mentioned before, one of features of OTP is all
of the built in monitoring tools, which if you know how to use, can
help troubleshoot a running Erlang application. But I don't want to
just say, do this, do that, poke around there, because there is a
learning curve associated with doing this and we may be able to get
the bottom of this issue for you. It would be a bit like telling a
non-Unix person to go to the shell and type in "ps -elf | grep foo".

As indicated beforehand, it would helpful if you were able to
reproduce this in some way.

Ben