[rabbitmq-discuss] RabbitMQ hanging on "starting queue supervisor and queue recovery" after system went down (using new persister)

Wed Aug 18 12:10:38 BST 2010

Hi Christian,

On Mon, Aug 16, 2010 at 03:18:57PM -0700, Christian Legnitto wrote:
> I've been running default with the new persister. I went away for the weekend and saw that my RabbitMQ instance died (not sure what happened, doesn't look to be in logs). In any case, I went to restart the server and it was hanging at "starting queue supervisor and queue recovery". The VM this on isn't speedy but I let it go for ~30 mins. I moved the mnesia db out of the way, tried again,  and it started instantly.

Hmm. Any idea what it was doing - was the disk thrashing or CPU very
busy? In the event of a crash, rabbit has to do various checks on start
up, which can be time consuming, in order to validate the state of the
messages in the queues. However, I think last time I benchmarked this,
it was of the order of 100s of thousands per second. Otoh, that's on an
8-core machine with oodles of RAM and decent hard drives.

I'd be curious as to what it was doing.

> I can send the mnesia db, but it is very large. I didn't have many queues (3?) but each probably had thousands (or even 100k+ messages) queued up.

That might be useful though the startup/recovery process is at various
points pruning, so the data you have may very well now be different to
the data that was there when you restarted Rabbit. You didn't happen to
take a backup *before* restarting Rabbit did you?

Matthew