[rabbitmq-discuss] Claim on new ocamlmq broker...

Tue Jun 15 00:18:49 BST 2010

On Mon, Jun 14, 2010 at 04:01:38PM -0700, mfp wrote:
> >> "RabbitMQ did not guarantee that persistent messages had been saved to
> >> disk before sending the message receipt, which could lead to data
> >> loss"
> 
> Does that mean that these comments by Matthew Sackman (who AFAIK works for
> LShift and is a RabbitMQ developer) no longer apply?

After the buyout of RabbitMQ, by SpringSource/VMware, I am now employed
by VMware, and continue to work full time on RabbitMQ.

> > When you publish a message with delivery mode 2 you are *not* _guaranteed_
> > that it hits disk. Publishing is an async operation and you get no
> > confirmation that it goes to disk. The new persister does very aggressive
> > caching in order to avoid doing lots of tiny and expensive writes. As
> > such,
> > there will frequently be times where if you restart the broker, you will
> > lose
> > several (maybe hundreds) of messages. 
> 
> Note that I'm referring to what happens in case of a hard RabbitMQ/system
> crash. The behavior described by Matthew Sackman is consistent with what I
> observed in the tests I did before writing ocamlmq: RabbitMQ accepting
> persistent messages at fairly high rates, with quickly growing memory usage
> and no disk activity.

The behaviour you have described is an intentional design of AMQP. Yes,
you could decide you want to write every message to disk and fsync it,
but if you do that then you'll have utterly atrocious performance.
Anything less that this leaves open the possibility of data loss in the
event of a hard system crash. How much data can be lost is, in the case
of AMQP, left to the client to decide: they can vary the size of the
transactions as they wish - if they can tolerate at most one message
being lost, then they must tx.commit after every publish. This will
likely result in an fsync per message, and performance will be very
poor, but the fact is that it's the client who is able to make decisions
as to the amount of data that can be lost.

Without transactions, publishing is an async activity, and whilst you
can indicate that the message should be written to disk if it ends up in
a durable queue, this is merely guidance: we use quite big buffers in
some carefully chosen places (in the new persister branch) because of
the liklihood that the message will have been delivered to a consumer
and acknowledged quite quickly. As a result, by writing lazily rather
than eagerly, we eliminate 3 disk writes (msg published, msg delivered,
msg acknowledged).

I have no desire to pass comment on ocamlmq. RabbitMQ tries to solve a
very broad category of messaging patterns and requirements. By no means
is it perfect - for some it is too slow, whilst for others it does not
offer enough guarantees. But AMQP does have the advantage of giving a
great amount of flexibility to the user, which is why we believe
RabbitMQ is a sound, general purpose solution for a very broad class of
messaging needs. Careful use of the client libraries and the features
made available by AMQP is frequently sufficient to satisfy most needs.

Matthew