[rabbitmq-discuss] Exchanges, Routing, and AMQP
Kirk Wylie
kirk at kirkwylie.com
Thu Nov 13 22:56:33 GMT 2008
I came up with a reasonable design for this a couple of weeks ago, but
I originally wanted to allow the passions to die down a little before
responded, and then The Real World (e.g. where I'm not paid to blather
on about MOM) interfered.
The solution is actually somewhat simple conceptually. Essentially,
assume that you have the following organization:
- Exchange (namespace for messages)
feeds into
- Router (distributor of messages)
feeds into
- Queue (deliverer of messages)
Each one of these is a 0..n relationship.
The key difference here is that an exchange feeds messages as-is to
all relevant router instances. And you have one router instance per
TYPE of routing logic (e.g. topic, fanout, direct, least-full).
Binding results in a modification to a router instance to support that
binding, which might result in an additional exchange-->router binding.
So let's consider the degenerate case where there are no subscribers,
but a publisher onto Exchange1. A new subscriber comes in.
1) Consumer asserts Queue1. This doesn't exist, so it's constructed.
2) Consumer asserts a binding between Queue1 to Exchange1 using Direct
routing on namespace Foo.Bar.Baz.
2a) There is no Direct Router instantiated, so a new instance is
instantiated
2b) DirectRouter instance is bound to receive all messages from
Exchange1
2c) DirectRouter is configured to deliver Foo.Bar.Baz to Queue1.
3) Message flows from Exchange1 to DirectRouter, which determines
whether or not to deliver to Queue1 based on the message topic.
In this case, you have an O(nRoutingTypes) operation on delivery (the
number of different router instances configured to listen to messages
from that exchange), but this in C could be extremely optimized by
passing a ref-counted pointer to the message from the Exchange to the
Router, meaning that you don't even have a memory copy involved. O(n)
on copies of pointers I don't consider meaningful. In addition, you
have whatever logic the router needs to map to its queues. But that's
non-unique, because that complexity you need no matter what.
So let's assume we have the same scenario as at the end of Step 3
above, but now we add some LeastFull logic.
4) Consumer asserts Queue2. This doesn't exist, so it's constructed.
5) Consumer asserts Queue3. This doesn't exist, so it's constructed.
6) Consumer asserts a binding between Queue2 to Exchange1 using
LeastFull routing with QueueGroup parameter set to #1
6a) There is no LeastFull Router instantiated, so a new instance
is instantiated
6b) LeastFull Router instance is bound to receive all messages
from Exchange1
6c) LeastFull Router is configured to add Queue2 to the queue
group parameter #1
7) Consumer asserts a binding between Queue3 to Exchange1 using
LeastFull routing with QueueGroup parameter set to #1
7a) Already existing LeastFull Router is configured to add Queue3
to the queue group parameter #1
8) Message is delivered to Exchange1
8a) Message is delivered to DirectRouter, which processes as per
(3) above
8b) Message is delivered to LeastFullRouter, which chooses either
Queue2 or Queue3 to deliver based on existing queue size
Again, you have the message internally delivered (through pointer copy
with ref counting) twice (O(nRoutingTypes==2)). LeastFullRouter has an
O(nQueueGroups * nQueuesPerGroup) operation, which in this case is
O(1*2).
The key thing here is that the act of binding arbitrarily establishes
a router instance and binds that instance to an exchange, and that the
act of delivery between an exchange and a router is deliver-all-to-
all. This is going to be a fast operation, and then any additional
complexity is up to the internal operation of the router.
You end up being able to support extremely fast (e.g. Direct) routing
types with fast delivery, and slower (e.g. Topic) routing types with
more complex logic, but only take that cost if a binding is done, and
not have it affect the delivery latency on the fast delivery modes.
You are definitely right, in that no particular router would be able
to make global decisions with respect to bindings that haven't been
made to it. However, if a particular queue hasn't said that it's
interested in receiving messages subject to a particular set of
routing logic, why should that routing logic be able to affect
delivery to that queue? It sounds almost like you're arguing that if I
bind Exchange2 to Queue4, that should be allowed to affect flow from
Exchange1 to Queue2. That shouldn't be allowed: the client in charge
of Queue2 needs to have the ability to control flow of messages to
Queue2. So the global operation needs to be within the confines of
cooperating clients, who have explicitly said that they want to allow
global decisions to be made.
Yes, I know code speaks more highly than words, but this isn't
something that I think I have time to code up, and I think the basic
conceptual framework that a fast solution potentially exists is one
which allows the spec to move forward.
Also, I know this is a little rambling. If this isn't clear, let's
pick apart individual parts and tomorrow I should have more time to
clarify what's unclear.
Kirk
(original context below. Digest subscribers, apologies, but I wanted
to maintain full history here and not line-by-line it as it's a
complex argument).
On 24 Oct 2008, at 14:50, Ben Hood wrote:
> Kirk,
>
> On Tue, Oct 21, 2008 at 5:40 PM, Kirk Wylie <kirk at kirkwylie.com>
> wrote:
>> Well, no, actually, that's not true. That's only true of a naive
>> implementation of a binding-based system. The RabbitMQ-based
>> implementation where you're modelling all the primitive AMQP concepts
>> as direct Erlang processes would definitely suffer from this problem,
>> but I'm sure you could work out a system where invoking a binding
>> dynamically modifies the routing table of the exchange.
>
> Don't get me wrong, the thing I that I would love the most is to be
> proven wrong on this point.
>
> The reason why I say this is because I have an interest in pushing
> back the boundaries with Rabbit so I would quite humbly accept the
> suggestion of an algorithm that allows you to match routes in
> sublinear time over an arbitrarily large routing table whilst
> deferring the specification of the algorithm to the bindings.
>
> So if you really think that I do not possess the level of abstraction
> required to think beyond the implementation I am currently working on,
> please set me straight.
>
> And lets set a few things straight, lest untruths manifest themselves
> as technical arguments :-)
>
> - Rabbit does not model every AMQP concept as a process - bindings or
> routes are not modeled as processes;
> - What do you mean excatly by "invoking a binding"?
> - The routing table is dynamically modified by issuing the Bind and
> Unbind commands to the broker.
>
> Setting the rebuttal aside now to concentrate on the real design and
> engineering concerns, do you think you can come up with a simple model
> on paper for this?
>
> Here's a suggestion of where one *may* start (requires a fixed width
> font):
>
> -----------
> | Message |
> | key = x |
> -----------
> |
> |
> ------------
> | Exchange |
> | name = y |
> ------------
> |
> ------------------------------------------
> | | |
> ------------------ ------------------ ------------------
> | Binding | | Binding | | Binding |
> | key = foo | | key = bar | | key = baz |
> | algorithm = A1 | | algorithm = A2 | | algorithm = A3 |
> ------------------ ------------------ ------------------
>
> My point is that by deferring the algorithm to the bindings you are
> removing any potential of using root metadata in when you compute the
> routing graph.
>
> Although this is slightly OT, this is reminiscent of the way that svn
> stores metadata on a per-directory basis as opposed to storing it in
> the root - the difference being the ease to which you can compute the
> transitive closure of the graph.
>
>>> Furthermore, you can bring the consumer driven-ness into the realm
>>> of
>>> your app by having a convention that prevents a producer from
>>> declaring exchanges - if the exchange doesn't exist, then the
>>> producer
>>> will not be able to send messages to it.
>>
>> Then the "createExchange" call shouldn't be an assertion, it should
>> check for a pre-defined configuration point.
>
> You can currently use the passive flag to achieve this.
>
> Ben
More information about the rabbitmq-discuss
mailing list