[rabbitmq-discuss] Queue durability

Mon Mar 1 10:38:08 GMT 2010

On Sat, Feb 27, 2010 at 12:20:07PM -0600, tsuraan wrote:
> > Do you know why your systems are running out of memory? It certainly
> > shouldn't be due to rabbit, if you have the memory thresholds configured
> > correctly.
> 
> They're running out of memory because they have too much stuff running
> on them that uses memory :)  Rabbit is one of the processes
> (apparently the hungriest single process), but it's far from the only
> process.  To be honest, I'm not sure why it's the hungriest process
> either, but perhaps my other thread will help resolve that.

The vm_memory_high_watermark threshold sets the point at which
channel.flow is raised. Internally, we aim to use 0.4 of that threshold,
to give some space for GC and generally some buffer room, given the
async nature of the controls.

As Matthias said, disks are not fast. Rabbit will happily accept
messages far faster than it can write them to disk, hence the need for
the channel.flow mechanism. If you need to maximise the speed at which
you're writing to disk (and you should use some disk-monitoring tools to
work out how fast rabbit is writing to disk. Also, use the deadline
scheduler, not cfq), then you really will probably want to give an SSD
to Rabbit alone, and have nothing else write to that disk. Even just
giving Rabbit its own hard disk is pretty beneficial. Running rabbit on
the same box as other services, and having them all access the same
disks at the same time is not a good idea, if you really care about
performance.

Also, OS caching is a very good idea. Generally, try to make sure you've
always got a couple of GB RAM free for the OS to use as disk cache -
generally, on my 8GB box here, the default 0.4 vm_memory_high_watermark
threshold gives rabbit 3151MB, and pretty much the remainder is used as
disk cache. As a result, most of the time, reads can be satisfied from
RAM, avoiding expensive disk seeks (one of the reasons why SSDs are
beneficial).

> > You certainly shouldn't be losing entire queues, and if you do I would
> > at the very least expect there to be some errors in the logs.

Indeed. The existance or not of a queue really has nothing to do with
the new persister - it's still stored in mnesia just as it was before.

You should make sure you update to the latest bug21673 at least weekly
if not more frequently.

Matthew