[rabbitmq-discuss] HA - missing or incompletely replicated queues

Ashley Brown ashley at spider.io
Mon Nov 7 23:46:26 GMT 2011


>
>
> Could you describe accurately what your tests are so that we can try and
> reproduce?
>
>
I'm afraid the tests are of the system overall, and I haven't been able to
produce a minimal test case so far. This evening's test involved dropping
backed up messages into the queues quickly, while draining them slowly.
Effectively:

5 HA queues, replicated to all nodes. 3 nodes in the cluster, 15GB
machines, high water mark at 5.9GB.

Start producers and consumers, consumers running slowly, allowing queues to
build to 500k.

Stop producers and consumers, delete queues. Deletions take a long time.
Although I've also seen this with no-HA queues - it can take tens of
minutes to delete a queue with 250k+ messages in.

Previously we've had queues in a steady state, with approximately 25,000
unacked messages (they take several minutes to process, aren't acked until
complete). Then kill some nodes off, forcing the messages to be requeued
and replayed on the slaves. It all gets out of sync after that.

I might be able to give you a better test case once we've pushed our non-HA
release out and I have a bit more time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20111107/66fdcfcd/attachment.htm>


More information about the rabbitmq-discuss mailing list