[rabbitmq-discuss] Stress testing RabbitMQ

Sun May 15 18:16:55 BST 2011

Matthias,

I didn't suffer from the add/remove binding problem because in my tests I was just creating the bindings, –which yes, was really slow (30 secs for 5000 bindings)–, and then routing messages.

So as I said, I was just *stressing* the routing part. That is, I didn't have any new bindings once I started publishing messages.

Regarding the caching… yes I know it can be pretty hard to manage. I already had many nightmares doing caching in web apps. In RabbitMQ's case if you are routing many thousands of messages per second and then there's a bindings coming and going I think things can start to get out of sync pretty easily.

Anyway… it was a lot of fun going around the rabbit internals. I really like all the Erlang specs you have there. That makes it really easy to follow along with the code, understand which parameters to expect and so on.

Regards,

-Alvaro

On May 15, 2011, at 6:58 PM, Matthias Radestock wrote:

> Alvaro,
> 
> Alvaro Videla wrote:
>> Yes, the binding add/removal is slow. In that case I thought that is
>> more common to be routing messages than adding removing bindings.
> 
> Given that you were trying to optimise massive fanout, the binding count will be high, so the O(n^2) time complexity on add/remove really hurts and imposes a limit on the fanout degree which is way lower than it would be otherwise. I am surprised you haven't come up against this in your tests.
> 
>> Also my goal was to easily get an implementation to see if the route
>> part could be improved by skipping the mnesia:dirty_select. One
>> reason I thought Mnesia could be a problem was because I started
>> getting logs like: Mnesia(rabbit at mrhyde): ** WARNING ** Mnesia is
>> overloaded…
> 
> These warnings are issued when mnesia's write-behind logic gets too far, er, behind. They are generally harmless. More importantly though they cannot originate from dirty_select/read operations. So if you are seeing these warnings in your tests they are most likely due to binding addition/removal, not routing.
> 
>> My reasoning was that scanning the table every time a message is
>> routed could be slower than just returning a somehow cached list of
>> bindings, which is what I tried to do.
> 
> Caching routing results in, say, the channel's process dictionary, can indeed yield significant performance gains for some use cases. We have experimented with that but trying to apply caching generally turns out to be hard. The problem, unsurprisingly, is keeping such caches up to date. Particularly when trying to maintain AMQP's quite strict visibility guarantees. Another problem is keeping the caches from taking up too much memory.
> 
> 
> Regards,
> 
> Matthias.

Sent form my Nokia 1100