[rabbitmq-discuss] RabbitMQ memory management

Fri Sep 12 18:36:16 BST 2008

Ben,

Thanks for clarifying. Disclaimer: what follows is not an attack on Rabbit,
or you, or anything like that. It's simply explaining my situation without
judgment and asking for suggestions.

I now understand that in the worst-case scenario as discussed, the Rabbit
Erlang VM could crash due to out of memory conditions. What I should then do
is probably set the rabbit node up with a heartbeat so that in the extremely
unlikely event of this happening, at least the system will auto-recover. All
persistent messages will be on disk, so nothing crucial will be lost.
Presumably, if there are a grillion (that's a metric SI term for "a lot of"
;)) persistent messages in Mnesia for a specific queue, RabbitMQ won't try
to load these all into memory and cause the crash to repeat itself.
*That*would be unfortunate.

I think it would be a really solid addition to RabbitMQ to limit the number
of messages kept in memory and hence reduce considerably the likelihood of
out of memory conditions. It is very difficult for me to know exactly when I
will need this feature, because I don't yet know the volumes that could be
hitting the system when it goes into full production. It's currently in
limited production. However, the sooner the better. In my opinion, and I
really like RabbitMQ, so don't get me wrong, the lack of this feature
severely limits the usefulness of RabbitMQ in a store and forward scenario.
Basically, it means that barring other solution, I would have to store the
messages myself in a database, and dequeue them myself. This means I would
have to write a mini-queuing system myself, which I was trying to avoid. In
particular, I didn't want to write it because Mnesia has a serious
limitation in that if your table is a set, it cannot be disk-only, which
means... that all the messages in the database are kept in memory!! There
are some ways around this, but I just didn't want to have to handle these
issues myself.

Look, it's not your fault that I didn't understand that RabbitMQ keeps all
the queued messages in memory until they are delivered (although this should
be prominently featured in the documentation, if it is not already, but I
haven't seen it). But now I do.

The system that I am using RabbitMQ is part of is an SMS message aggregator.
It stores and forwards of the order of a million SMS text messages and other
kinds of messages per day. I had a choice when designing the system: either
write my own database queuing mechanism, or find a messaging and queuing
system. I had used WebSphereMQ for very similar systems in the past, so I
was averse to reinventing the wheel. When I evaluated RabbitMQ, I saw that
it did persistent messaging and made an erroneous assumption that it worked
the same way as WebSphereMQ in how it dealt with persistent messages. My
testing never showed any different, but I probably never stressed RabbitMQ
to the point where it would have shown up.

In fact, I originally had a problem, which I posted on this mailing list,
where my consumers were being overwhelmed by the speed at which RabbitMQ was
pushing messages to them. This caused the messages to be buffered in the
consumer's Erlang queue, which made the consumer's memory usage go through
the roof. There was no way to throttle the sender because QoS was not
implemented, so I changed my consumers to use a basic.get. This, I thought,
pushed the problem back into RabbitMQ's camp, which I thought would solve
the problem because I had believed it was all going to disk and did not know
it was being shadowed in memory too. So it didn't solve the problem, it
simply pushed it to a different part of the system.

I put a great deal of effort into ensuring that the consumers stay running
at all times, but the thing that is out of my control is whether they can
keep up with the messages that are being put into the system. The reason for
this is that the consumers deliver messages to various URLs over http, and
the server behind the URL might be offline for any period of time, or
delivery might be very slow. This can cause the messages to back up if they
are being added at a high rate. Messages are added at a high rate sometimes
when a client wants to send a batch of tens of thousands of messages in a
store and forward mode.

So... in the absence of the "feature" discussed above, do you have any
suggestions as to how I can dig myself out of this hole (without using a
different m&q system) ? :)

Regards,
Edwin

On Fri, Sep 12, 2008 at 12:27 PM, Ben Hood <0x6e6562 at gmail.com> wrote:

> Edwin,
>
> On Fri, Sep 12, 2008 at 2:31 AM, Edwin Fine
> <rabbitmq-discuss_efine at usa.net> wrote:
> > I think you misunderstood my message. I was saying RabbitMQ should be
> able
> > to store a non-memory-limited number of persistent messages in the
> absence
> > of a consumer draining the queue. So to be exact, these would be
> > non-transactional persistent messages. You replied to me about
> > non-persistent messages, which I know are memory-limited. I am confused,
> > because I was at one time (when I was doing my due diligence) convinced
> that
> > RabbitMQ had no practical limit (other than any set by Mnesia and hard
> disk
> > space) for storing non-transactional persistent messages. I am simply
> > concerned that perhaps I was mistaken and was asking you to elaborate.
>
> Thanks for the clarification.
>
> The status quo is that irrespective of whether we are talking about
> persistent or non-persistent messages, messages are queued up in
> memory until they are drained. The difference between persistent and
> non-persistent is that with persistent messages a replica of each
> message is written to disk. This disk copy will only be used in a
> recovery situation for messages that have not been acknowledged.
>
> Obviously the limit to which you carry on queuing messages without
> draining them depends on the physical resources available to the
> Erlang VM and how virtual memory is utilized, which in turn depends on
> your OS and production setup.
>
> So to draw a line in the sand, you could go through and calibrate your
> setup wrt to the point at which it cannot allocate any more memory. By
> doing so, you've worked out approximately what your absolute bottom
> line will be. If, for example, your expected capacity is a small
> fraction of this known limit for your particular setup, and that in
> practice, you would notice undrained messages long before you hit the
> limit, you *might* decide that for all intents and purposes this is a
> acceptable and manageable risk.
>
> Whilst doing your testing, you may not have encountered any problems,
> because you may not have pushed it to the limit. All I am saying is
> that in doing stress testing for the scalability improvements I am
> working on, I have pushed Rabbit to its limits at various stages and
> am just informing you of the current theoretical worse case scenario -
> ATM queue depth is bounded by memory and there is no overflow
> facility.
>
> However, you may be more risk averse and not want to entertain the
> possibility of exhausting your system resources, ever.
>
> In that case, as you have rightfully pointed out, an obvious
> improvement to Rabbit is to overflow to disk when a certain queue
> depth is reached.
>
> This could apply to persistent and non-persistent messages alike.
>
> I'll take this down as an enhancement suggestion which may factored in
> the work that we have planned to make queues pluggable. An alternative
> is to incorporate this into the current queue implementation, but
> whether we do this would basically depend on what resources we have
> available and how acute it is time wise.
>
> Another related improvement includes implementing message expiry.
>
> Please let us what your time scales are for when you absolutely need
> this in production.
>
> HTH,
>
> Ben
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20080912/4b6bcefa/attachment.htm