[rabbitmq-discuss] When are persistent messages flushed to disk?

Fri Sep 2 16:25:39 BST 2011

Hi Iain,

On Fri, Sep 02, 2011 at 10:14:17AM -0500, Iain Hull wrote:
> I have a simple test that pushes 1,000,000 messages to the broker as
> fast as possible.  Once this is finished I kill the broker.  
> 
> * When the messages are small < 100 bytes only 250,000 of the message
> survive.  I can see in the Web console the number of messages in the
> queue raises very slowly compared to the rate the client sends new
> messages.
> * When the message is larger = 1Kb all or most of the messages are
> persisted.  I can see in the Web console the number of messages in the
> queue closely matches the number of messages the client has sent.
> 
> I know that my test is a little unrealistic but we are planning to use
> RabbitMQ to smooth out bursts of activity on our ESB caused by batch
> updates, I expect most messages to arrive in large bursts and be less
> than 1 kb.  I am trying to understand the nature of the window where
> persistent messages could be lost.  

If you are not using publisher confirms, then you have no way of telling
whether the packets have even reached the broker, yet alone the broker
having done anything about those messages. Many of those messages might
even still be in the TCP stack on the client's OS.

> * Is it time based? flushed to disk every n milliseconds.
> * Is it memory based? Flushed after a buffer of a certain size is
> filled.  (Our messages will be small < 1k)
> * Does the rate that messages arrive change the window?
> * Is most of the buffering at the socket level or in side RabbitMQ?

The answer to nearly all of those is "yes".

Rabbit will request some TCP buffers. From time to time when it runs out
of memory and is waiting disks to catch up, it'll stop draining those
buffers in order to halt producers.

In order to avoid doing lots of tiny little writes, we do a lot of
buffering internally so as to be able to drive hard disks as fast as
possible.

We also have timers to make sure that once a message has reached a
queue, it'll be written and fsync'd to disk promptly. That said, a queue
can only go so fast, and so messages can back up as they're passed
between processes within Rabbit.

All of which is essentially why we introduced publisher confirms (well,
that, plus the fact that transactions are almost never what you actually
want - think of publisher confirms as being there to solve the fact you
can't pipeline transactions). Thus I'd recommend you read
http://www.rabbitmq.com/extensions.html#confirms and use them, and
architect your clients such that they understand that until they receive
confirmation from the broker that the broker has taken responsibility
for the message (which in the case of a persistent msg sent to durable
queues (normally) means the msg has been fsync'd to disk), the
publishing client is still responsible for the message and should be
prepared to republish the message should its connection to the broker
fail before it receives the confirm.

Now of course, the confirm could be lost in mid-flight, and you'll then
get duplicates, but there are almost always ways of getting duplicates -
you normally have to choose between "at least once" and "at most once"
delivery - "exactly once" is normally unobtainable except in very
restricted circumstances. So that means you may have to make sure your
clients are doing idempotent operations or at least can detect
duplicates etc etc, but that's all standard stuff.

Hope that's of help,

Matthew