[rabbitmq-discuss] list_queues times out

Tyler Williams tyler at echonest.com
Thu Jun 10 20:23:23 BST 2010


Mathew,

On Jun 10, 2010, at 2:52 PM, Matthew Sackman wrote:

> Hi Tyler,
> 
> On Thu, Jun 10, 2010 at 02:17:56PM -0400, Tyler Williams wrote:
>> I'm having a problem where after running for a while, and with a very large number of messages (10M+, using about 4G RAM), rabbit will get into a state where it is still working, but I cannot list queues. I've been using the rabbitmq-status plugin, and this also stops working, with this message in the logs when I hit the status page:
>> 
>> =ERROR REPORT==== 8-Jun-2010::01:29:00 ===
>> {mochiweb_socket_server,235,
>>    {child_error,
>>        {timeout,{gen_server2,call,[rabbit_status_web,get_context]}}}}
>> 
>> Can I do anything else to debug this? Is this a known issue?
> 

Thanks for your reply.

> That's a coding error either in rabbit_status. It should not be doing any
> gen_server2:calls without setting the optional timeout to infinity.
> 
> I would advise not using rabbit_status if this issue is causing rabbit
> to crash.
> 

I don't think this is *causing* rabbit to crash, it just shows up in the logs after rabbit has 'crashed' or gone into this weird state. 

> The rabbitmqctl correctly sets timeouts and will not crash.
> 
> There are times when rabbit can be unresponsive and can take a long time
> to respond to even rabbitmqctl - on the order of several minutes. This
> is easily considered a bug, and fixing this sort of issue is on our todo
> list. However, unresponsiveness does not mean anything's gone wrong -
> especially on the 21673 branch. Rabbit should always be able to recover
> eventually and eventually move to a state where it can accept more
> messages.
> 

I've run rabbitmqctl list_queues and waited for hours with no response. I still see activity in the logs, and I can still connect to rabbit and do stuff, but I don't get any output from list_queues.

> That said, if you do have 4GB RAM and over 10M messages, you may need to
> start using the rabbitmq-toke plugin which cuts down the per-message RAM
> cost to very little (in combination with the 21673 branch), by using
> tokyo cabinet for a particular index table which otherwise can take a
> couple of hundred bytes per entry. Basically, if you see the high
> watermark set in the logs, and then a while later Rabbit reaches a point
> where it is inactive (no CPU and no disk use) but the high watermark has
> not been cleared, then it's likely you need to start investigating the
> toke plugin.
> 

Ok, I'll look into this. This is not exactly the failure mode we're seeing though, because in our case, rabbit is still using cpu and disk. It even clears and resets the watermark. I just can't list the queues at all.

As an addendum, after this happened today, I restarted rabbitmq, and it recovered it's previous queues without error. After using it for about 5 minutes though, it died, this time with an error in the log, which I've posted here: http://pastebin.com/BAaGpJab

> Best wishes,
> 
> Matthew

Thanks again for your help.

--tyler



More information about the rabbitmq-discuss mailing list