[rabbitmq-discuss] Scalability?

Sat May 8 02:27:01 BST 2010

Here are the logs exactly the same as I reported before right before it 
crashed (my python code) after 15-20 min of running. The process is 
still running taking up 79% of memory and 100% of a single core's cpu 
even though the high water mark was set to 10%. When trying to connect 
with python the error is socket error 111, connection refused. It 
crashed (python client) after publishing to the queue a little less than 
6000 messages.

===
Rolling persister log to 
"/var/lib/rabbitmq/mnesia/rabbit/rabbit_persister.LOG.previous"

=ERROR REPORT==== 7-May-2010::22:13:07 ===
connection <0.9530.0> (running), channel 1 - error:
{amqp_error,not_found,"no queue 'Instance.4' in vhost '/'",'queue.declare'}

=WARNING REPORT==== 7-May-2010::22:13:08 ===
exception on TCP connection <0.9530.0> from 10.4.0.151:46046
connection_closed_abruptly

=INFO REPORT==== 7-May-2010::22:13:08 ===
closing TCP connection <0.9530.0> from 10.4.0.151:46046

Matthias Radestock wrote:
> Wayne,
>
> Wayne Van Den Handel wrote:
>> I tried the high water mark setting and it now prevents too much 
>> memory from being used (barely) but as my test continued it 
>> eventually had a hard crash for another reason. I was hitting it with 
>> 2 producers throttled to 1000 messages each to a separate queue and 5 
>> consumers. 
>
> Just to clarify ... the producer limits the publishing rate by 
> checking on the queue size with a passive queue.declare, and only 
> sends more messages when the size is less than 1000, right?
>
> How often is that check performed?
>
>> There was always 1000 messages each in the 2 queues and after 15 
>> minutes one of the producers crashed with an error (from the logs) 
>> {amqp_error,not_found,"no queue 'xxx' in vhost '/'", queue.declare'}. 
>> I figured this was a timeout or something as the queue did exist and 
>> was being emptied by 3 different consumers actively.
>
> How many queues do you see when running "rabbitmqctl list_queues"?
>
>> 5 seconds later the entire process died with an error (from the logs) 
>> "exception on TCP connection ... connection_closed_abruptly" and then 
>> "closing TCP connection".
>
> Which process died? The producers/consumers? Rabbit?
>
> Are there any other interesting messages in the rabbit.log or 
> rabbit-sasl.log?
>
> If you could
> 1) clear the log files,
> 2) restart rabbit,
> 3) do a complete run of your tests until they fail,
> 4) put the logs somewhere we can see them
>
> then we may be able to determine the cause of the problem from looking 
> at the logs.
>
> And, as Colin suggests, if you can post the code then we can try to 
> reproduce the problem.
>
>> I am new to queuing but frankly I don't know what to do. My 
>> Experience with this and other MQ products is less than compelling. 
>> Does this take a ton of tweaking to get stable? How do people manage 
>> to push 500k messages a day?
>
> How many messages did you actually send in your above test? Rabbit can 
> easily route tens of thousands of messages a second. So if your test 
> was running for 15 minutes and the producers/consumers weren't a 
> bottleneck then rabbit will have processed tens of millions of 
> messages in that time.
>
> Regards,
>
> Matthias.
>

-- 
Wayne Van Den Handel, DataRaker Inc

Phone:  703.996.4891
Mobile: 305.849.1794
Skype:  wayne.van.den.handel
Email:  wvandenhandel at dataraker.com