[rabbitmq-discuss] Persister log size calculation and rolling persister log frequency?

Fri Apr 2 17:30:15 BST 2010

Hi John,

On Thu, Apr 01, 2010 at 06:53:57PM +0200, John Apps wrote:
> I see that the size of the log varies tremendously, as does the frequency
> with which it is 'rolled'.
> The size is presumably determined by the number of messages pending, i.e.,
> not yet acknowledged?

Oh the logic there is quite complex. It's something like "it shouldn't be
rolled too often, and we also have a minimum number of operations that
should have to go into it before we roll, and probably we should not roll
whilst we're really busy". The last time Matthias and I looked at that code,
it took the two of us some hours to figure out what was going on. The new
persister is a fair bit more predictable in this regard.

> QueuesOverviewMessagesTransactionsParametersVhostNameConsumersMemoryCountMsg
> readyMsg unackCountAcks uncommitedDurableAuto deleteArguments/
> amq.gen-EIK8rQQfEX9IBiqLwF6h7Q==10MB22702271227falsetrue[]

I know some people aren't a fan of semantically significant spaces in
programming languages, but they're generally favoured in English...

> The log file at
> the above moment in time is 128KB. (I paused the producer long enough for
> the receiver to catch up.)
> 
> QueuesOverviewMessagesTransactionsParametersVhostNameConsumersMemoryCountMsg
> readyMsg unackCountAcks uncommitedDurableAuto deleteArguments/
> amq.gen-EIK8rQQfEX9IBiqLwF6h7Q==1105MB2493432483431000158falsetrue[]
> 
> The log file at the above moment is some 111,529KB, the rolled over log file
> about the same size.
> 
> What I am wondering about is
> 
>    1. when does the roll-over take place?

Not totally sure - it's not completely clear from the code what the goals
were. Tony G is probably the only person who can answer this sufficiently.

>    2. what role does the dump_log_write_threshold parameter play here? (It
>    is set to 1,000 in these tests)

That's an mnesia parameter and has nothing to do with the persister log
that Rabbit uses.

>    3. why is the previous version deleted before the current one is rolled
>    over? (GFS would appear to be the algorithm which should be used in order
>    not to lose anything)

It's not. On taking a snapshot, the data in the current file is correct.
There may be a .previous file, which is removed if it exists. Then
disk_log:reopen/3 is called.
http://www.erlang.org/doc/man/disk_log.html#reopen-3
"The reopen functions first rename the log file to File and then re-create a
new log file."

There should be no possibility of data loss during this process.

>    4. the new version of the log file is created with a very small initial
>    allocation? Is there a way to influence this size?  (yes, I know, modern
>    file systems should be able to happily expand very fast, and I have asked
>    this question before. However, the time taken to 'rename' the current to
>    previous and create a new log causes quite significant waves on the client
>    side. These are noticeable even to the human eye.)

Renaming a file should be an instantaneous operation. Nothing other than
meta data needs writing to do this, and depending on your filesystem, that
may not even hit disk. We do not do any preallocation - for file systems
that support holes, it gains nothing, and for those that don't, it's very
expensive (HFS is a good example here). The persister files are only ever
appended to - there is no rewriting in place. Preallocation won't help
here.

>    5. The severe delay is only experienced by the receiver. Here are a few
>    samples:
>    - recving rate: 507 msg/s
>       - recving rate: 506 msg/s
>       - recving rate: 475 msg/s
>       - recving rate: 377 msg/s
>       - *recving rate: 5 msg/s*

Yup, that's the cost of writing out the snapshot again. What you're seeing
here is a bug that is one of the main reasons for the existance of the new
persister, which is that with the current persister, the same data gets
written out over and over again - If you have one long queue which is doing
no work, and one short and fast queue, then whenever the persister decides
to take a new snapshot, it writes out the entire contents of the long queue
as well.

The new persister is much much better in this regard (and several others
too). In particular, messages are only rewritten on disk for GC reasons,
and there is a finite bound on the number of times a message will be
rewritten. In short, I'd recommend you compile, test and compare with the
new persister branch (bug21673).

Matthew