[rabbitmq-discuss] RabbitMQ broker crashing under heavy load with mirrored queues

Tue Jan 10 16:27:52 GMT 2012

Hi Venkat,

I'm glad things are better under 2.7.1.

> I have one question, referring to http://www.rabbitmq.com/ha.html:
>> As a result of the requeuing, clients that re-consume from the queue
>> must be aware that they are likely to subsequently receive messages
>> that they have seen previously

This is an accurate quote, and is still true.  Acknowledgements are only sent
to the master and then copied to the slaves, so the slaves might not know
about some of them if the master goes down before some acknowledgements can
be forwarded.  If you are lucky (and it appears that you are, or else you are
using auto acknowledgements) then none of the acknowledgements are lost
(or none are required!).

> You notice that there are two lines displaying 1999, this is because
> two messages were lost. Otherwise you see 2000 messages processed
> from each thread.
> 
> From this, does it mean that I don't have to worry about duplicate
> messages due to requeing?

No, it doesn't mean that. If you have explicit acknowledgements by your
consumers, then when the master fails the slave may redeliver some messages
that were acknowledged, as well as the ones that weren't.

What interests me is the messages that are lost. If I understand it
correctly, messages are published to the master and all the slaves
simultaneously, so the failure of the master shouldn't lose any messages.

Having said that, you haven't said to which broker your test apps connect.
If they were connected to the master at the time, then what do they do when
the master fails?  Do they automatically reconnect (I presume this is in
the tests' logs)? Do they resend the last message (which will have failed
because the connection will have been dropped)?

If they do not resend, then this could be the source of the lost messages
-- they were not sent in the first place.

Please can you explain just a little more about the test thread connection
history, and to which broker they are connected?  I would expect that, if
they are connected to the slave, then you won't see any lost messages in
this test scenario.

> I assuming that HA Proxy
> was not quick enough to detect about the Crashed Node A and thus those
> 2/3 messages were routed to crashed NodeA. Please correct me if I am
> wrong.

I don't think this is the problem, as messages are published to all the
brokers mirroring the queue.

> The other thing that I just wanted to bring it to your attention (it
> doesn't bother me). It is as follows:
> I have NodeA in the beginning of the cluster then I join NodeB to the
> cluster.
> If I run rabbitmqctl report on NodeA, it throws an error saying that
> NodeA is down (when it is really not down). But it works fine on
> NodeB.

This is interesting, too. Can you supply us with the complete output from
rabbitmqctl status for both nodes, and explain exactly what you mean by
'run rabbitmqctl on NodeA'?

Thank you for reporting these issues.

Steve Powell  (a curious bunny)
----------some more definitions from the SPD----------
avoirdupois (phr.) 'Would you like peas with that?'
distribute (v.) To denigrate an award ceremony.
definite (phr.) 'It's hard of hearing, I think.'
modest (n.) The most mod.

On 9 Jan 2012, at 23:39, Venkat wrote:

> Hi Steve I have run some tests using RabbitMQ 2.7.1 please find the
> following:
...(elided)