[rabbitmq-discuss] of queues, memory and heartbeats...

Tue Sep 1 09:19:00 BST 2009

Anthony,

On Tue, Sep 1, 2009 at 8:36 AM, Anthony<anthony-rabbitmq at hogan.id.au> wrote:
> We're in a particular situation where occasionally a client will be
> disconnected from the server (often through some unclean network break, such
> as the random stuff that can happen with 3G data links and virtual
> machines), but the queues associated with that client don't die and continue
> to grow.
>
> Eventually some of the queues grow so large, that we start seeing txamqp
> (the client side library we use) start whining that it doesn't know what to
> do with "channel_flow" - presumably this is the memory based throttling
> kicking in (though I don't recall explicitly enabling it - unless it's
> something auto-enabled by default in the .deb packages?)...

Although it is was written a while ago, the behavior it describes
w.r.t channel.flow is still current:

http://hopper.squarespace.com/blog/2008/11/9/flow-control-in-rabbitmq.html

However, the actual client library needs to be aware of the
channel.flow command (which is a reverse RPC from the client's
perspective). IIRC txamqp is based on the qpid python client - it
might be worth looking it that client's support for channel.flow (or
just ask the txamqp guys because I might be wrong).

> Even though we haven't explicitly enabled it, if we do start to see
> channel_flow messages, this would be the memory based throttling, correct?

Yes. Exceeding a high water mark will trigger an alarm which then
sends out the channel flow command.

> Besides rabbit noticing when a TCP connection dies - is there any
> "heartbeat" functionality built into rabbit and/or the AMQP spec that would
> actually test whether or not a connection is active, and if it isn't, kill
> the relevant queues?

There are heartbeats as part of the protocol, but would only propagate
a connection getting killed. This *may* kill queues, for example if
they are not marked as durable.

> Do I as the person who monitors the server configure
> this, or do the programmers who write our clients need to incorporate
> something into their code?

ATM the best why to monitor queues and connections is via rabbitmqctl,
which is an admin centric tool. There is very little exposed in the
client libraries on this matter.

> In calculating the "5%" of memory available threshold, what would rabbit be
> considering memory.. Real? Swap? Virtual? All of the above?
> Ie.
> # cat /proc/meminfo | grep -i "total\|free"
> MemTotal:       514056 kB
> MemFree:        144860 kB
> HighTotal:           0 kB
> HighFree:            0 kB
> LowTotal:       514056 kB
> LowFree:        144860 kB
> SwapTotal:     1044184 kB
> SwapFree:       838244 kB
> VmallocTotal:   442360 kB
> HugePages_Total:     0
> HugePages_Free:      0

The rabbit_memsup_linux module parses these values out of /proc/meminfo:

'MemTotal', 'MemFree', 'Buffers', 'Cached'

such that

MemUsed = MemTotal - MemFree - Buffers - Cached

> As a failsafe against runaway queues, I'm thinking of implementing something
> that culls queues over a given order of magnitude on a timed basis, or
> triggered by a lower memory threshold than rabbit's..

You might be interested in the new persister that is coming out in the
next release (there are many posts on this - google the archive for
queue paging). With this, you won't be bounded by memory any more.

On a similar note, recently we implemented a queue drain command as
part of the BQL plugin (which is dependent on the plugin mechanism
that is part of the next release) - for one system I just wrote a cron
job that drained the queue to a disk log every hour.

HTH,

Ben