[rabbitmq-discuss] [Minimum Air Induction] Introducing Shovel: An AMQP Relay

Sat Sep 20 20:18:18 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 20, 2008, at 11:17 AM, Ben Hood wrote:

> What difference does it make when you do multiple publishes within  
> the same TX?

Well, a transaction each 100-500 messages is basically as fast as no  
transaction at
all with the difference that the connection is used in bursts rather  
than continuously.
But when I tried a transaction per message the end result was about  
30-40 messages
per second which is just too slow but tells me that I can have  
basically up to 30-40
transactions per second and still keep a very high throughput (that's  
about 100 messages
per transaction).

>> One thing that I wasn't able to understand is what does a
>> transaction
>> give me in rabbitmq?
>
> It provides an atomic barrier for sending messages - when you get the
> commit.ok back from the broker, you know that the messages you have
> sent have been routed to the queues that the routing key matches on.

Ok so I would rewrite the interaction in the following way:

1. mochiweb generates messages and publishes them without transaction
2. shovel reads up to 500 messages and starts a transaction with the  
remote
     node sending all of them and committing the transaction.
3. after the transaction is commit it acks the server for multiple  
messages.

There is still the duplicates problem but this is a minor issue right  
now IMHO,
at least it's solvable. (It wouldn't hurt to have 2PC :P).

>> In the event of rabbitmq crashing I would like the whole thing to  
>> crash so
>> that
>> I'm sure that there won't be lines generated without being also  
>> handled.
>> This
>> is the embedded rabbitmq of course.
>
> IIRC you can have the OTP application do this for you, provided
> Mochiweb is packaged as an OTP application. Rabbit and Shovel are both
> OTP apps themselves.

Yes also mochiweb is packaged as an OTP application so I suppose this  
won't be
a problem at all.

>>> Maybe you also want to compress stuff if you're sending it over a  
>>> WAN.
>>
>> Yes, one thing that I was thinking is to just gzip the body of the  
>> message
>> myself before sending it, but I haven't looked into rabbitmq to see  
>> if it
>> already
>> supports this feature.
>
> Not really. Rabbit treats the payload as an opaque object.

Ok, so gzipping the content of each message is also part of the
plan and can be done, shouldn't be slow either although slower.

> Rabbit will nuke it. Logging the message to disk is done by the queue
> process, so if nothing gets routed, nothing gets persisted.

Ok, so I need durable queues and will wait for the overflow on disk  
functionality.

>> I suppose my tests weren't too accurate then now... is a persistent  
>> message
>> much
>> slower than a non persistent one?
>
> Yep. Don't know what the exact factor is though.

Ouch... I'll try later today or tomorrow to see what this value  
actually is without
shovel in the middle... I also wonder which header I should use to  
make a message
persistent, I'll look into this I guess.

>> Because I obtained wonderful numbers from
>> messages not explicitly marked as being persistent, like 8000  
>> messages per
>> second,
>> with the bottleneck being in the saturated network, on the write  
>> side of the
>> connection
>> and about 3-4K messages per second on the read side with the  
>> bottleneck
>> being the
>> python client most probably. So would these numbers confirm  
>> themselves
>> pretty much
>> or are they simply completely wrong?
>
> It really depends how you set things up, but those numbers do look OK.

They look OK in the sense that even persistent messages can reach  
those rates?
If so then I'm already more than happy, in the test each message was  
about 1KB
that should be a bit bigger than the actual ones but not too much.

> I think that you need to work out what ingress requirements you have
> (this will be determined by the capacity of the http server) and what
> egress you need (so as to avoid stuff queuing up too much).
>
> Remember that ATM Rabbit does not implement QoS, so your egress will
> be bound by the slowest consumer.

What if I have multiple consumers on a queue? Does speed scales almost  
linearly?
My ingress is, on the extreme, about 1000-1500 requests per second on  
each mochiweb
server. Let's say that in the absolute best case we can have 8-9  
webservers that reach
800-1000 req/sec and all talk to something remotely. (Now... before  
even reaching that
point we would need a connection between the remote data center and  
the main one
that is wide enough for all that information, but this issue aside).  
So basically I would
expect the embedded RabbitMQ to be able to take about 1500-2000  
persistent
messages and then, network notwithstanding, transmit those with some  
transactions
(3-4 or maybe some more per second) to the central location that will  
have multiple
consumers. I can see having one central RabbitMQ per webserver (or  
each 2 webservers
if it can handle the load) and each central rabbitMQ would have some  
consumers
that basically write stuff in a database (which would be the  
bottleneck here).

>> At least I need about 2500-3000
>> requests per
>> second because, given the constraint with memory bound queues, the  
>> component
>> should
>> be as fast as the webserver otherwise the messages start to pile up.
>
> Sounds sensible. At some stage we will get around to landing QoS and
> queue overflowing.

You are really helpful, I thank you a lot!

- --
Valentino Volonghi aka Dialtone
Now running MacOS X 10.5
Home Page: http://www.twisted.it
http://www.adroll.com

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjVTHoACgkQ9Llz28widGXaGwCgxM23RdoClHWTjijW0qW0igYD
V68AoIffM1kltK7OKr/gfnyUUigCMvqP
=y5TK
-----END PGP SIGNATURE-----