[rabbitmq-discuss] Implementing User Messaging Using RabbitMQ

Fri Jan 25 02:48:43 GMT 2013

Hi, Chris...

Answers inline.  Apologies for such a slow response to your question.
It was information dense enough that finding time to respond carefully
took a little while.

> High-Level Requirements
> =======================
> - users can send messages to another user, several users, a group, or
> everyone
>   - users and groups are managed in a proprietary user management system
> - no chat rooms-- each user just has a message feed or "timeline" of
>   incoming & outgoing messages
> - message history must be preserved (in a database)

A natural way to do this would be to, somewhere early in your routing,
shunt every message being published through an initial filtering
exchange.  One binding would carry it off to a queue from which a
worker/consumer would pull the messages as work items, and update
whatever your back-end data store is.  Another binding could carry
your messages onward to another exchange, whose type and binding keys
would be chosen to give you whatever routing you want for messages for
their 'real' (i.e. not just for persistence) handling.

> - if a user is offline, the message should be forwarded to the
> user's email

Depending on how you decide to represent the notion of
online/offline-ness in your app there are a number of ways you might
do this.  One idea would be to use 'alternate exchanges' as described
here:

http://www.rabbitmq.com/ae.html

In brief, when you configure an AE for an exchange, then any message
that the latter exchange can't route to a queue will be published to
the specified AE instead.  If your user's queues are created on
demand, and set to auto-delete so that they go away when users
disconnected, and that's a sufficiently strong notion of
"on-line-ness" for you, this allows messages for which there's no
recipient to be sent off somewhere else.  You can then put them in
another work queue, that a consumer process in your system can pull
out and convert into emails.

> - 5000 concurrent users sending a few messages every few minutes
> (20,000 users total in system).

If there's one user per queue, 5000 queues isn't really a terrifying
number, since the processes that represent queues are fairly
lightweight.  Performance will be affected by such factors as whether
the queues are durable, whether persistence is requested for messages, etc.

> In order to store every message in a database (for historical
> purposes), all messages are initially sent to a direct exchange with a
> worker queue.  Consumers of the worker queue (let's call them
> messaging service workers) receive the message and store it to the
> database.  Then they distribute the message to the intended recipients
> (this is the part I think I need most help with).

Sounds reasonable so far, and well aligned with what I said near the
start of this post.

Now to look at the options you outline, and try to add some hopefully
useful remarks...

> Message Distribution Option 1: Direct Exchange w/ Queue Per User
> =================================================================
> Message is sent to a direct exchange with username as the routing key.
> For every logged in user, the webapp declares a queue and binds with
> the username.

Entirely reasonable way to represent logged-in-ness and get the right
messages to the right places.

> Issue 1: How to support groups?  The messaging service worker could
> resolve the group to a list of users and post the message multiple
> times (once for each user/routing key)-- but this would not scale very
> well.

And indeed, you may be able to use AMQP mechanisms to implement this
within your message fabric.  For example, some entity in your system,
with suitable permissions could create a fanout exchange per group, bind it
to the queues corresponding to the user's in the group, and then
messages published to the group would be broadcast to all members.
Alas, if the sender was one of those user's he'd also hear his own
message.  An advantage of this is that resource usage will be more
modest as internally the broker can de-duplicate the representations
of all of the 'copies' of the message sent group-wide.

If a group member receiving his own messages is something you can't
tolerate in the client side code handling that group member, you could
imagine using one of the other exchange types, along with some routing
key chicanery to be more selective.  See if the previous paragraph's
idea agrees with you, and if you find it lacking, we can chip away at
it a bit more.

> Or we could create individual queues for every group too, but
> this increases complexity in the web app (and later in our other
> clients) since they need to determine what groups they need to listen
> for and who to distribute to.

This is another option.  If the idea of fanout-exchange-per-group
isn't agreeable, you could consider this, although the former is
probably simpler.

> Issue 2: Is 5000 queues a bit much?  Especially when they are all
> ultimately going to the same consumer (in the webapp case)?  This
> isn't the only use of RabbitMQ, so there will be other
> queues/exchanges/bindings as well...

Probably not.

> Message Distribution Option 2: Topic Exchange w/ Single Queue per Webapp
> ========================================================================
>
> Message is sent to topic exchange with routing key containing ALL
> addressed users (i.e., "|usera|userb|userc|").  Webapp declares single
> queue and a separate binding to the exchange for each user (i.e.,
> "#|usera|#", "#|userb|#", etc.).  This is similar to Option 1, but
> with the benefit that each message only has to be sent once.

Right, and this is along the lines of the alternate solution I hinted
at above.

> Issue 1: Routing key can get huge (20000 names if addressed to "everyone"
> group).
> Issue 2: It just seems like the wrong use of topic exchange.

I'd worry mainly that this could get very awkward to manage and push a
lot of complexity into your application, although it's not an a priori
insane use of the topic exchange conceptually, given that topic
exchanges are intended to route messages based on pattern matches
against the routing keys stamped on them.  I may not be following your
description but you may find the actual matching behavior isn't quite
what you want in what you're describing.

> Message Distribution Option 3: Fanout Exchange w/ Single Queue per Webapp
> =========================================================================
>
> Message is sent to fanout exchange.  Webapp listens on fanout exchange
> and simply discards all messages for users that aren't logged in.

Actually if a not-logged-in user, has no queue that he owns bound to
that exchange, your web app doesn't even have to do that much.  The
unroutable messages will be quietly dropped.

> Issue 1: Doesn't scale well (or even if it does, there is lots of
> waste filtering on the client side).

I'd worry more about the waste filtering.  The amount of message
duplication you'd think that this scenario requires doesn't really
happen, as Rabbit's internals aren't totally naive about how to handle
such a case.

> Issue 2: Not at all secure (although, for now, we won't try to solve
> security).

I was actually going to ask about that... You may want to carefully
consult the table in:

http://www.rabbitmq.com/access-control.html

which states which of the AMQP commands require which permissions.
Also, keep in mind that rather than there being ACLs, which list who
can do things to a given broker resource, Rabbit works the other way.
A user gets a regex-ish list of resource *names* to which he has each
of the three permissions.  Different AMQP commands require different
privileges on different resources, e.g. to bind a queue means you must
be able to write the queue and read the exchange.

This model works fine for most enterprise uses of Rabbit and leads to
reasonable user management.  It can break down in the fringes though
if you have lots of ephemeral users, or users with sort of dubious
trust relationships between them, or users that can vouch for other
entities, etc.  In cases such as that, one might solve one's problem
by writing custom authorization backends, although the decision is
best not taken too lightly.

> Message Distribution Option 4: Direct Exchange w/ Single Queue per Webapp
> ========================================================================
> Webapp must notify messaging service worker as users log in and out--
> and of the single queue on which it wants messages.  When a message
> needs to be distributed, messaging service worker checks to see which
> users are logged in where.  It sends message to the queue of each
> webapp that has logged in users that the message is addressed to
> (message itself contains "to:" list so webapp can send to right user
> UI).
>
> Issue 1: Messaging Service Worker now has a lot more state and
> processing messages is a lot heavier.  Need to make sure it stays in
> synch with webapp.

Agreed.  It's also not totally clear to me from what you've said that
you really need to do this, and, as you point out, it's going to be
non-trivial to implement and get right.

> Issue 2: Messaging Service Worker is now doing the routing-- which
> just seems wrong since that is what RabbitMQ is so good at!

Absolutely true.  Which probably leads us to favor the proposals
enumerated earlier. :-)

> Message Distribution Option 5: Fanout Exchanges for Groups
> ==========================================================
> Same as option 1 (queue per user), but also create a fanout exchange
> for every group.  The individual user queues need to be bound to the
> fanout exchange corresponding to each group they belong to.
>
> Issue 1: Could result in a lot of exchanges (not sure how many
> groups there are).

Worry not here: Exchanges are cheap in their implementation.  Unlike
queues, each of which has a live, resource-consuming Erlang process
babysitting it, an exchange is essentially just metadata inside
Rabbit's internal database which defines the exchange's configuration.
The actual work of routing and decision making that happens "in" an
exchange is really done on the channel process associated with the
publisher who's pushing stuff into the exchange.

> Issue 2: Still seems like a lot of queues too (5000).

But that's not that frightening a number probably.  An Erlang VM on
well provisioned hardware can accomodate many Erlang processes, which
are sort of lightweight 'green threads' that get distributed across a
pool of real OS threads when they're scheduled.

> Message Distribution Option 6: Custom Exchange
> ==============================================
> Create our own exchange that allows consumers to bind with usernames.
> When a message is posted to the exchange, it knows how to read the
> "to:" list from it, can even resolve groups, and then sends to the
> correct queues based on the bindings.
>
> Issue 1: We don't have any Erlang experience.
>
> Issue 2: Resolving groups in an Erlang custom exchange could be tricky
> (not sure about the APIs to our User Management).

If your needs really do turn out to be too exotic for the stuff we
talked about earlier you could consider this, although it is work.
There are nice examples of how to write a custom exchange in the
Manning book "RabbitMQ in Action" and we can likely point you at some
other examples like the 'presence exchange' and others on GitHub.

One would of course have to get a bit comfortable in Erlang to do so.

> Whatever the correct approach is, how do we determine if a user is
> "offline" and we need to forward the message to their email?  If we
> are using queue-per-user, then we can set the mandatory flag, I guess?
> If we are using another approach, what then?

Start by taking a peek at the documentation I point to above and see
if you can get what you want out of the alternate exchange mechanism.
It seems quite plausible to me that it will get you all or most of the
way there.  The mandatory flag will also catch you if there's no queue
into which your message can be routed, but in the case of a fanout
exchange, the presence of any queue will keep it from doing anything.
Also, your publisher then has to be somewhat aware of what's going on
and the mandatory-flagged publish fails.  You might find bouncing
stuff to an alternate exchange, where some consumer/worker transforms
those messages into emails gets tricky, case-wise work out of your
publisher and into a consumer that pretty much only does one, simple thing.

> Thanks to anyone who can help point me in the right direction (or
> steer me away from the wrong direction)...  I'm not familiar enough
> yet with RabbitMQ performance characteristics to know what is a bad
> idea, and not familiar enough yet with its many features to know what
> is a good idea... ;-)

It sounds like you're off to a good start.  You're already turning
over an entirely reasonable set of stones looking for pragmatic ways
to map your needs on to how Rabbit and AMQP works.

Take some time to digest the commentary above, and please return to
the list as your thinking evolves, other questions spring to mind,
etc.

Best regards,
Jerry

--
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Implementing-User-Messaging-Using-RabbitMQ-tp24479p24681.html
Sent from the RabbitMQ mailing list archive at Nabble.com.