[rabbitmq-discuss] Scalability?

Fri May 7 23:07:38 BST 2010


Matthias Radestock wrote:
> Wayne,
>
> Wayne Van Den Handel wrote:
>> I tried the high water mark setting and it now prevents too much 
>> memory from being used (barely) but as my test continued it 
>> eventually had a hard crash for another reason. I was hitting it with 
>> 2 producers throttled to 1000 messages each to a separate queue and 5 
>> consumers. 
>
> Just to clarify ... the producer limits the publishing rate by 
> checking on the queue size with a passive queue.declare, and only 
> sends more messages when the size is less than 1000, right?
>
> How often is that check performed?
This check is done each time prior to adding another message to the 
queue. It ensures there is always only ever 1000 max messages. I did 
this so as to not run of our memory. There is a 10 second sleep if it 
finds the queue "full" before checking again and adding more records.
>
>> There was always 1000 messages each in the 2 queues and after 15 
>> minutes one of the producers crashed with an error (from the logs) 
>> {amqp_error,not_found,"no queue 'xxx' in vhost '/'", queue.declare'}. 
>> I figured this was a timeout or something as the queue did exist and 
>> was being emptied by 3 different consumers actively.
>
> How many queues do you see when running "rabbitmqctl list_queues"?
There are 9 queues. Only 2 have messages and are being used in this 
test. Both of the 2 have always ~1000 records. The producers are 100x 
faster than the consumers. In a production mode there would be a few 
producers but 50+ consumers.
>
>> 5 seconds later the entire process died with an error (from the logs) 
>> "exception on TCP connection ... connection_closed_abruptly" and then 
>> "closing TCP connection".
>
> Which process died? The producers/consumers? Rabbit?
One of the 2 producer python processes dies and then 5 seconds later 
Rabbit died.
>
> Are there any other interesting messages in the rabbit.log or 
> rabbit-sasl.log?
There are no other interesting messages.
>
> If you could
> 1) clear the log files,
> 2) restart rabbit,
> 3) do a complete run of your tests until they fail,
> 4) put the logs somewhere we can see them
I am rerunning and will post the logs if/when it crashes.
>
> then we may be able to determine the cause of the problem from looking 
> at the logs.
>
> And, as Colin suggests, if you can post the code then we can try to 
> reproduce the problem.
My code is reading data from a MySQL database as the source so this 
"scenario" is not portable.
>
>> I am new to queuing but frankly I don't know what to do. My 
>> Experience with this and other MQ products is less than compelling. 
>> Does this take a ton of tweaking to get stable? How do people manage 
>> to push 500k messages a day?
>
> How many messages did you actually send in your above test? Rabbit can 
> easily route tens of thousands of messages a second. So if your test 
> was running for 15 minutes and the producers/consumers weren't a 
> bottleneck then rabbit will have processed tens of millions of 
> messages in that time.
The messages are composed of a batch of 10000 records from a database 
query (6 fields) pickled with python. Performance of Rabbit is great 
relative to everything else going on (reading from MySQL, python, 
Cassandra writes). Performance is great, stability is the issue. Could 
it be the size of the messages? Each Queue with 1000 messages shows a 
size of 2,914,608 using rabbitmqctl.
>
> Regards,
>
> Matthias.
>
Thanks for your help!

Wayne