[rabbitmq-discuss] Exchanges, Routing, and AMQP

Thu Nov 13 22:56:33 GMT 2008

I came up with a reasonable design for this a couple of weeks ago, but  
I originally wanted to allow the passions to die down a little before  
responded, and then The Real World (e.g. where I'm not paid to blather  
on about MOM) interfered.

The solution is actually somewhat simple conceptually. Essentially,  
assume that you have the following organization:

- Exchange (namespace for messages)
feeds into
- Router (distributor of messages)
feeds into
- Queue (deliverer of messages)

Each one of these is a 0..n relationship.

The key difference here is that an exchange feeds messages as-is to  
all relevant router instances. And you have one router instance per  
TYPE of routing logic (e.g. topic, fanout, direct, least-full).  
Binding results in a modification to a router instance to support that  
binding, which might result in an additional exchange-->router binding.

So let's consider the degenerate case where there are no subscribers,  
but a publisher onto Exchange1. A new subscriber comes in.
1) Consumer asserts Queue1. This doesn't exist, so it's constructed.
2) Consumer asserts a binding between Queue1 to Exchange1 using Direct  
routing on namespace Foo.Bar.Baz.
    2a) There is no Direct Router instantiated, so a new instance is  
instantiated
    2b) DirectRouter instance is bound to receive all messages from  
Exchange1
    2c) DirectRouter is configured to deliver Foo.Bar.Baz to Queue1.
3) Message flows from Exchange1 to DirectRouter, which determines  
whether or not to deliver to Queue1 based on the message topic.

In this case, you have an O(nRoutingTypes) operation on delivery (the  
number of different router instances configured to listen to messages  
from that exchange), but this in C could be extremely optimized by  
passing a ref-counted pointer to the message from the Exchange to the  
Router, meaning that you don't even have a memory copy involved. O(n)  
on copies of pointers I don't consider meaningful. In addition, you  
have whatever logic the router needs to map to its queues. But that's  
non-unique, because that complexity you need no matter what.

So let's assume we have the same scenario as at the end of Step 3  
above, but now we add some LeastFull logic.
4) Consumer asserts Queue2. This doesn't exist, so it's constructed.
5) Consumer asserts Queue3. This doesn't exist, so it's constructed.
6) Consumer asserts a binding between Queue2 to Exchange1 using  
LeastFull routing with QueueGroup parameter set to #1
    6a) There is no LeastFull Router instantiated, so a new instance  
is instantiated
    6b) LeastFull Router instance is bound to receive all messages  
from Exchange1
    6c) LeastFull Router is configured to add Queue2 to the queue  
group parameter #1
7) Consumer asserts a binding between Queue3 to Exchange1 using  
LeastFull routing with QueueGroup parameter set to #1
    7a) Already existing LeastFull Router is configured to add Queue3  
to the queue group parameter #1
8) Message is delivered to Exchange1
    8a) Message is delivered to DirectRouter, which processes as per  
(3) above
    8b) Message is delivered to LeastFullRouter, which chooses either  
Queue2 or Queue3 to deliver based on existing queue size

Again, you have the message internally delivered (through pointer copy  
with ref counting) twice (O(nRoutingTypes==2)). LeastFullRouter has an  
O(nQueueGroups * nQueuesPerGroup) operation, which in this case is  
O(1*2).

The key thing here is that the act of binding arbitrarily establishes  
a router instance and binds that instance to an exchange, and that the  
act of delivery between an exchange and a router is deliver-all-to- 
all. This is going to be a fast operation, and then any additional  
complexity is up to the internal operation of the router.

You end up being able to support extremely fast (e.g. Direct) routing  
types with fast delivery, and slower (e.g. Topic) routing types with  
more complex logic, but only take that cost if a binding is done, and  
not have it affect the delivery latency on the fast delivery modes.

You are definitely right, in that no particular router would be able  
to make global decisions with respect to bindings that haven't been  
made to it. However, if a particular queue hasn't said that it's  
interested in receiving messages subject to a particular set of  
routing logic, why should that routing logic be able to affect  
delivery to that queue? It sounds almost like you're arguing that if I  
bind Exchange2 to Queue4, that should be allowed to affect flow from  
Exchange1 to Queue2. That shouldn't be allowed: the client in charge  
of Queue2 needs to have the ability to control flow of messages to  
Queue2. So the global operation needs to be within the confines of  
cooperating clients, who have explicitly said that they want to allow  
global decisions to be made.

Yes, I know code speaks more highly than words, but this isn't  
something that I think I have time to code up, and I think the basic  
conceptual framework that a fast solution potentially exists is one  
which allows the spec to move forward.

Also, I know this is a little rambling. If this isn't clear, let's  
pick apart individual parts and tomorrow I should have more time to  
clarify what's unclear.

Kirk

(original context below. Digest subscribers, apologies, but I wanted  
to maintain full history here and not line-by-line it as it's a  
complex argument).

On 24 Oct 2008, at 14:50, Ben Hood wrote:

> Kirk,
>
> On Tue, Oct 21, 2008 at 5:40 PM, Kirk Wylie <kirk at kirkwylie.com>  
> wrote:
>> Well, no, actually, that's not true. That's only true of a naive
>> implementation of a binding-based system. The RabbitMQ-based
>> implementation where you're modelling all the primitive AMQP concepts
>> as direct Erlang processes would definitely suffer from this problem,
>> but I'm sure you could work out a system where invoking a binding
>> dynamically modifies the routing table of the exchange.
>
> Don't get me wrong, the thing I that I would love the most is to be
> proven wrong on this point.
>
> The reason why I say this is because I have an interest in pushing
> back the boundaries with Rabbit so I would quite humbly accept the
> suggestion of an algorithm that allows you to match routes in
> sublinear time over an arbitrarily large routing table whilst
> deferring the specification of the algorithm to the bindings.
>
> So if you really think that I do not possess the level of abstraction
> required to think beyond the implementation I am currently working on,
> please set me straight.
>
> And lets set a few things straight, lest untruths manifest themselves
> as technical arguments :-)
>
> - Rabbit does not model every AMQP concept as a process - bindings or
> routes are not modeled as processes;
> - What do you mean excatly by "invoking a binding"?
> - The routing table is dynamically modified by issuing the Bind and
> Unbind commands to the broker.
>
> Setting the rebuttal aside now to concentrate on the real design and
> engineering concerns, do you think you can come up with a simple model
> on paper for this?
>
> Here's a suggestion of where one *may* start (requires a fixed width  
> font):
>
>                        -----------
>                        | Message |
>                        | key = x |
>                        -----------
>                             |
>                             |
>                        ------------
>                        | Exchange |
>                        | name = y |
>                        ------------
>                             |
>        ------------------------------------------
>        |                    |                   |
> ------------------  ------------------  ------------------
> | Binding        |  | Binding        |  | Binding        |
> | key = foo      |  | key = bar      |  | key = baz      |
> | algorithm = A1 |  | algorithm = A2 |  | algorithm = A3 |
> ------------------  ------------------  ------------------
>
> My point is that by deferring the algorithm to the bindings you are
> removing any potential of using root metadata in when you compute the
> routing graph.
>
> Although this is slightly OT, this is reminiscent of the way that svn
> stores metadata on a per-directory basis as opposed to storing it in
> the root - the difference being the ease to which you can compute the
> transitive closure of the graph.
>
>>> Furthermore, you can bring the consumer driven-ness into the realm  
>>> of
>>> your app by having a convention that prevents a producer from
>>> declaring exchanges - if the exchange doesn't exist, then the  
>>> producer
>>> will not be able to send messages to it.
>>
>> Then the "createExchange" call shouldn't be an assertion, it should
>> check for a pre-defined configuration point.
>
> You can currently use the passive flag to achieve this.
>
> Ben