[rabbitmq-discuss] HA - missing or incompletely replicated queues

Mon Nov 7 22:29:36 GMT 2011

>
>
>
> > Trying to check if this is a plugin
> > bug with list_queues (list_queues name slave_pids
> synchronised_slave_pids),
> > I can't get the cluster to list them at all, it just sits there (for the
> > last 15 minutes).
>
> Curious. Are any of the nodes hammering CPU or disk? Were any of the
> queues particularly long before shutdown?

Have got a similar problem running tests this evening, although this time
*without* any shutdowns (3 nodes all running fine since full restart). I've
had some queues get fairly long (1/2 million messages or so, 3 of them),
due to broken consumers. The management interface stopped updating message
counts as CPU use hit the roof. I was testing so decided to delete the
queues and start again. I stopped the producers and consumers running, so
the queues were in a steady state and CPU came down. Deleting most queues
worked fine, apart from the last one. The management interface just hangs
(Chrome offers to kill the page), whether I try to delete or purge the
queue. As can be seen below, beam processes are hogging two of the cores on
the 'master' for this queue:

http://www.evernote.com/shard/s53/sh/1ba2b59b-913f-4a00-bf3a-03067c0a1f29/faa3fc9c1b250b10b5d8491c367aa991

Beam processes are also hogging one core on the slave nodes.

The logs show all three nodes are switching between being clear of the
watermark and over it, every few minutes.

Watermark is 5.9GB, management interface shows the slaves < 2GB, the master
on 4.4GB. (Log snippet at the end of the email).

=INFO REPORT==== 7-Nov-2011::22:20:01 ===
vm_memory_high_watermark set. Memory used:6970525768 allowed:6296176230

=INFO REPORT==== 7-Nov-2011::22:20:01 ===
    alarm_handler: {set,{{vm_memory_high_watermark,clusterrabbit at hermes02},
                         []}}

=INFO REPORT==== 7-Nov-2011::22:20:03 ===
vm_memory_high_watermark clear. Memory used:5018976928 allowed:6296176230

=INFO REPORT==== 7-Nov-2011::22:20:03 ===
    alarm_handler: {clear,{vm_memory_high_watermark,clusterrabbit at hermes02}}

=INFO REPORT==== 7-Nov-2011::22:22:00 ===
vm_memory_high_watermark set. Memory used:6583230408 allowed:6296176230

=INFO REPORT==== 7-Nov-2011::22:22:00 ===
    alarm_handler: {set,{{vm_memory_high_watermark,clusterrabbit at hermes02},
                         []}}

=INFO REPORT==== 7-Nov-2011::22:22:02 ===
vm_memory_high_watermark clear. Memory used:5025939688 allowed:6296176230

=INFO REPORT==== 7-Nov-2011::22:22:02 ===
    alarm_handler: {clear,{vm_memory_high_watermark,clusterrabbit at hermes02}}

=INFO REPORT==== 7-Nov-2011::22:24:50 ===
vm_memory_high_watermark set. Memory used:6587145104 allowed:6296176230

=INFO REPORT==== 7-Nov-2011::22:24:50 ===
    alarm_handler: {set,{{vm_memory_high_watermark,clusterrabbit at hermes02},
                         []}}

=INFO REPORT==== 7-Nov-2011::22:24:51 ===
vm_memory_high_watermark clear. Memory used:5029821304 allowed:6296176230

=INFO REPORT==== 7-Nov-2011::22:24:51 ===
    alarm_handler: {clear,{vm_memory_high_watermark,clusterrabbit at hermes02}}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20111107/aa7896ef/attachment.htm>