[rabbitmq-discuss] HA - missing or incompletely replicated queues

Mon Nov 7 23:19:40 GMT 2011

On Mon, Nov 07, 2011 at 10:29:36PM +0000, Ashley Brown wrote:
> Have got a similar problem running tests this evening, although this time
> *without* any shutdowns (3 nodes all running fine since full restart). I've
> had some queues get fairly long (1/2 million messages or so, 3 of them),
> due to broken consumers. The management interface stopped updating message
> counts as CPU use hit the roof. I was testing so decided to delete the
> queues and start again. I stopped the producers and consumers running, so
> the queues were in a steady state and CPU came down. Deleting most queues
> worked fine, apart from the last one. The management interface just hangs
> (Chrome offers to kill the page), whether I try to delete or purge the
> queue. As can be seen below, beam processes are hogging two of the cores on
> the 'master' for this queue:
> 
> http://www.evernote.com/shard/s53/sh/1ba2b59b-913f-4a00-bf3a-03067c0a1f29/faa3fc9c1b250b10b5d8491c367aa991
> 
> Beam processes are also hogging one core on the slave nodes.
> 
> The logs show all three nodes are switching between being clear of the
> watermark and over it, every few minutes.
> 
> Watermark is 5.9GB, management interface shows the slaves < 2GB, the master
> on 4.4GB. (Log snippet at the end of the email).

Hmm. There's nothing wrong with the entries in the logs. I'm not sure
whether it's just "doing what it's meant to be doing" or whether there's
a bug - sure, the hogging of CPU is undesireable, but I don't know
whether the cause of that is a bug or whether it's just some unoptimal
algorithm.

Could you describe accurately what your tests are so that we can try and
reproduce?

Matthew