[rabbitmq-discuss] Scalability?

Thu May 6 18:49:50 BST 2010

I am evaluating RabbitMQ for purposes of parallelization on top of a 
Cassandra data store. I have created a simple test scenario of a set of 
Queues that are given data to be loaded from a single Python publisher 
and 3-4 Python Consumer applications take the data from the Queues and 
load into Cassandra. The entire scenario was easily set up and runs 
great for about 10 minutes when RabbitMQ proceeds to use up all 
available memory and crashes.  I then discovered the passive mode to 
create a queue (and find out how many messages it has) and now only add 
more work to the queue when there is less than 1000 messages in the 
queue (which easily fit into memory). I start up my test again and still 
blow RabbitMQ up in 10 minutes. I am watching with the admin console the 
entire time and there is never more than a total 1000 messages in all 
queues at any given time. Watching top I see RabbitMQ take up more and 
more memory over time. It seems that it can only process 30-40k messages 
in total/aggregate before it crashes (even though there is never more 
than 1000 messages in all queues at one time).

Am I missing something here? The product seems very easy to use and 
works great but it totally un-scalable. Is RabbitMQ not meant for high 
data volumes/traffic? What would better serve this purpose? We need 
something on top of Cassandra to provide high volume parallelization. I 
understand that we can only hold what fits in memory right now (when 
will that be fixed?), but even that is not true as memory is never given 
back.

Environment:
CentOS 5.4 64 Bit
RabbitMQ v1.7.2-1.el5 installed from yum
py-amqplib

Create Queue
chan.queue_declare(queue="dr_load.1", durable=True, exclusive=False, 
auto_delete=False)
chan.exchange_declare(exchange="dr_load", type="direct", durable=True, 
auto_delete=False)
chan.queue_bind(queue="dr_load.1", exchange="dr_load", 
routing_key="Instance.1")

Publish Data
chan.basic_publish(msg,exchange="dr_load",routing_key="Instance.1",mandatory=True)

Consume Data
msg = chan.basic_get("Instance.1")
chan.basic_ack(msg.delivery_tag)

Thanks!

-- 
Wayne Van Den Handel, DataRaker Inc

Phone:  703.996.4891
Mobile: 305.849.1794
Skype:  wayne.van.den.handel
Email:  wvandenhandel at dataraker.com