[rabbitmq-discuss] Millions of Queues

Wed Feb 18 00:39:16 GMT 2009

Thanks for your reply, Matthias!

Matthias Radestock wrote:
> That should be possible on a single machine with a decent amount of 
> memory - just.
>
Yes, we basically got almost there with our 2 GB of RAM.  The 
interesting question for us is how to grow beyond this when our needs 
increase.

>> ideally we would have 20+ million and 80+ million respectively.
>> Basically we want infinite scalability along the # of queues axis.
>
> One problem you are going to run into here is that while queue 
> processes reside on single nodes (and hence adding more nodes gives 
> you more room to host the queue processes), all the routing info - 
> i.e. queue, exchange and binding records - is held in memory on each 
> node. 80 million binding records are unlikely to fit into physical 
> memory.
>
Really?  Each node contains a complete copy of the entire mnesia 
database?  For some reason I thought that it did partitioning of the data.

>> We repeated the experiment with a cluster of two machines and 
>> achieved basically the same result -- except the "primary" machine 
>> (the one ScalabilityTest was interacting with) had most of its memory 
>> consumed, and the "secondary" machine had 40% of its memory consumed 
>> by beam processes.  Would we have achieved better results by manually 
>> targeting ScalabilityTest at the secondary machine as well?
>
> What you are observing here follows directly from the explanation 
> above - the queue and binding records will consume memory on both 
> machines whereas the queue processes will only consume memory on the 
> machine on which they were created. So by targeting ScalabilityTest at 
> both machines you'd be able to balance the memory usage.

OK, we'll give a run at the test on both hosts.  One thing that we'd 
really like is for the client to not have too much knowledge of the 
configuration of the cluster; so it would connect to some well-known 
host which would then make sure the queues created by the client are 
made on a host that has spare load.  I imagine that connection.redirect 
would be helpful here.  What I'm a little mystified about is that the 
clustering guide implies that rabbit already does this sort of load 
balancing -- this was why we didn't think it was necessary to run 
ScalabilityTest on multiple hosts.  But, now that I think about it, 
ScalabilityTest only opens up one connection for the entire duration of 
the test so it's not weird that said connection only points at one host.
>> Are we doing something wrong in our setup here?  What's the maximum 
>> number of queues that has been achieved by anyone on this list, and 
>> how did you get there?
>
> I am pretty sure Ben managed to get to 100s of thousands of queues and 
> bindings in his tests on a single node.
>
>> I see that in this email: 
>> http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-October/002150.html 
>> , Ben Hood mentions that routing complexity is O(n), where n is the 
>> number of bindings, which doesn't bode well for our particular 
>> application.  Assuming that I'm interpreting that correctly, is there 
>> anything we can do to tackle that problem to enable huge numbers of 
>> queues?
>
> Ben was referring to topic exchanges. For direct exchanges the routing 
> cost is linear (or possibly O(n * log n)) in the number of *matching* 
> bindings. Does your application definitely require the use of topic 
> exchanges or would direct exchanges be sufficient?
>
Direct exchanges are perfectly sufficient for our use case; so that's 
wonderful news!

> Also, regarding the 20/80 million queues/bindings, would it be 
> possible to partition these s.t. rather than having a single RabbitMQ 
> cluster with that number of queues/bindings you could have n 
> individual RabbitMQ servers, each with 1/nth of the queues/bindings? 
> That might involve your producers having to publish messages to more 
> than one broker, and consumers consuming from more than one broker, 
> but depending on the exact nature of your app that may not be too 
> arduous.
Yes, we could do this, but it kind of defeats the point of having a 
cluster if we have to partition very much manually.  To be more clear 
about our use case, we want to create a chat room system, where each 
user is a member of K rooms.  The way we're thinking of modeling this is 
that each room is represented as a (direct) routing key, and each user 
has an individual queue which is then bound to K routing keys.  There's 
not really any pattern between users and rooms, so if we partitioned the 
queues (and thus users) among clusters, any message sent to an 
individual room would have to get sent to every cluster on the (small) 
chance that a listening user was in that cluster.  It'd work for a 
while, but it can't possibly scale indefinitely.  Is there perhaps a 
better way to structure this application?