[rabbitmq-discuss] RabbitMQ memory management

Sun Sep 14 12:43:53 BST 2008

Hi Ben,

Actually, we have a bit of a different use case. Our company (my client, as
it were) provides SMS aggregation services. Essentially, we provide a
unified interface for SMS, MMS, USSD and other types of messages on behalf
of our customers, who are content providers. The main value add is that they
only have to deal with us and our interface, and not the interfaces of
multiple cellular companies. Without disclosing confidential information, I
can tell you that the way it works is as follows:

   1. SMS and other kinds of messages (MMS, USSD, and others) sent by the
   content providers (i.e. MT messages, not from cell phones) arrive at a
   non-Erlang front end where they are captured into a database.
   2. The messages are forwarded to the appropriate cell phone companies.
   3. At the same time, MO messages are sent from the cell phone companies
   which are destined for the content providers (our customers). These MO
   messages go thought the Erlang portion of the system.
   4. Status notifications are also sent to the Erlang portion of the system
   as the messages go through various processing states (e.g. Queued,
   Acknowledged, Delivered, Failed etc).
   5. The front-end sends these status notifications as fast as the Erlang
   back-end can take them. The back end then splits the messages into multiple
   RabbitMQ queues based on content and provider (e.g. SMS for company X goes
   to one queue, while MMS for company Y will go to another). Each queue has a
   consumer that delivers the status messages to a content-provider's web
   site\URL. The status notifications are essentially the final step in the
   process, confirming what the content provider initially sent. It is
   necessary to split the initial transmission from the status messages like
   this because delivery could take minutes or hours in some cases, and the
   sender can't wait around for that long to get a response. So the request is
   sent in one operation to us, and the actual response is sent in a separate
   operation back to them.
   6. The rationale behind having queues like this is to avoid a bottleneck
   where, if there was just one delivery process and a URL choked up, all
   deliveries would be held back unnecessarily.
   7. If one of the URLs is offline, or incorrectly specified, the
   associated queue will build up persistent messages until the situation is
   rectified. This is where the scenario we discussed could come into play.
   Where the real problem comes in is that sometimes a client will send a batch
   of a few hundred thousand messages. Ideally, this batch would be queued up
   by Rabbit and drained as the system is able to process the load. It may even
   be kept for off-peak times.
   8. I was hoping that I could use Rabbit the way I used to use MQ, which
   is as a database-backed queue. Now that I understand I cannot, I must make
   other design decisions. The batches, for example, will have to be stored in
   files or database, and trickled into the system at the correct rate. Now
   here is the second kicker: without any flow control, it is not trivial to
   figure out what the optimum rate is. Too slow, and the batch does not
   compete quickly enough. Too fast, and I risk excessive loading the Rabbit
   node's memory.
   9. Distributing the load across multiple Rabbit nodes may solve
   individual node's memory issues, but will place more pressure on the overall
   system's memory load. This will have cost implications because additional
   hardware will need to be purchased, and additional complexity added to
   distribute traffic to, and manage and monitor, the additional nodes. Sure,
   we have os-mon and SNMP and all that, but it has to be set up and
   configured, and ultimately someone has to sit and watch that. With more
   nodes, it just becomes more of an administrative burden, especially if
   traffic-wise, a single node would do the job just fine, but because of
   implementation-specific behavior, it will not be good enough without
   incurring some risk.
   10. The bottom line is, having all the persistent data needing to be in
   memory is a regrettable situation for the reasons outlined above, a
   situation which I accept, but the consequences of which I wish you, the
   developers, to be fully aware; not to feel bad or be beaten up, but simply
   to know and to use in decision-making processes as appropriate.

Thanks for your time. I hope the above information gives you a better feel
for what I am trying to achieve, and perhaps will generate some more useful
thoughts about how I can do so using your very excellent product (which I am
committed to using, by the way).

Best regards,
Edwin Fine

On Sun, Sep 14, 2008 at 7:05 AM, Ben Hood <0x6e6562 at gmail.com> wrote:

> Alexis,
>
> On Sat, Sep 13, 2008 at 10:07 AM, Alexis Richardson
> <alexis.richardson at cohesiveft.com> wrote:
> > Note that case (a) is solved by 2 above.  Add more nodes.  How often
> would you
> > have to add more nodes?  Due to 1, you can work this out based on your
> message
> > size.  For almost all use cases the consumers will have to lag
> > producers by several
> > days.  Think about it.  And don't forget you can add more consumers.
>
> Good point.
>
> The main reason why I asked Edwin about his realsitic expectations
> surrounding volumetrics was to find out what the breaking point was
> for a simple OTS Rabbit installation to do a *very* naive reality
> check.
>
> So in the absense of better knowledge, I just thought to myself that
> an SMS is roughly 160 bytes long (160 chars with an encoding that is
> something less than 8 bit/char plus some routing headers) and just
> created an infinite loop to publish them. A sinlge instance of Rabbit
> got overfed after publishing 2.5 million of these messages (on a
> simple pizzabox setup).
>
> So under the assumption that you may also use more than one logical
> queue (by way of natural application partitioning), you may be
> spreading the total system load over multiple queues that reside in
> memory on different nodes.
>
> In the degenerate case that you send 1 million messages per day to a
> single instance, you still have a day and a bit to find some way to
> drain the queues. Presumeably, if no SMS's were getting delivered to
> the downstream consumers over the course of a day, somebody would
> start to care about the fact that the system wasn't actually doing
> something. This person would still have a fair amount of time to find
> out what is going on and drain the queues before resource consumption
> becomes acute.
>
> Ben
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20080914/9f51a35b/attachment.htm