[rabbitmq-discuss] Problems with rabbit and hoping I can get some help

Mon Feb 13 15:47:08 GMT 2012

Hi, Dathan:

If you're not seeing the attendant memory alarms in your Rabbit logs,
then the state you're describing isn't the TCP back-pressure we
speculated about earlier.

When you get into one of the states where you say Rabbit is "blocked," 
what happens if you:

1. Try to create a new connection (perhaps using another client other
then the Python one you're using)

2. Just trying telnetting to port 5672 on the broker, and banging out
some junk.  Does the broker answer the incoming telnet connection?
Does it puke an AMQP error promptly to your console?

3. Does the broker respond when you use rabbitmqctl to ask it 
questions about it's state?

4. Are you running the management plugin?  If so, is the broker
healthy enough to bring up the management web UI and let you poke
around?

Depending on the answer to these, I might wonder if we've got a
client-side problem, either in the library you're using, or perhaps
a quirk in how it's being used...

Best regards,
Jerry

----- Original Message -----
From: "Dathan Pattishall" <dathan at schoolfeed.com>
To: "Jerry Kuch" <jerryk at vmware.com>
Cc: rabbitmq-discuss at lists.rabbitmq.com
Sent: Tuesday, January 31, 2012 1:19:29 PM
Subject: Re: [rabbitmq-discuss] Problems with rabbit and hoping I can get some help

CPU is rather Idle, 
6% CPU system on average 
8% CPU User on average 

No CPU wait IO 

so plenty of CPU / plenty of Memory 

On Tue, Jan 31, 2012 at 1:13 PM, Jerry Kuch < jerryk at vmware.com > wrote: 

Hi, Dathan... 

For dropping messages, you might consider setting message TTLs, but that 
may not give you quite what you want in all cases. 

What does the CPU consumption of your Rabbit node look like when you're 
seeing these pauses? If you wait, do they relent, with things getting 
moving again? 

Best regards, 
Jerry 

----- Original Message ----- 
From: "Dathan Pattishall" < dathan at schoolfeed.com > 

To: "Jerry Kuch" < jerryk at vmware.com > 
Cc: rabbitmq-discuss at lists.rabbitmq.com 
Sent: Tuesday, January 31, 2012 1:05:39 PM 
Subject: Re: [rabbitmq-discuss] Problems with rabbit and hoping I can get some help 

Hi Jerry, 

I neglected to mentioned that I am not hitting the memory-based flow control according to my memory alarm stat from 
rabbitmqadmin.py list nodes 

What threshold would the TCP back pressure logic hit? I assume when the memory limit is reached rabbit pushes back on the publishers? Rabbit did not use more then 2G of RAM out of the 5G allowed for it, if that is the case. 

Also is there a way to tell rabbit drop the messages instead of block? 

On Tue, Jan 31, 2012 at 12:56 PM, Jerry Kuch < jerryk at vmware.com > wrote: 

Hi, Dathan... 

What are your consumers doing with the published messages? If you use rabbitmqctl or 
the management plugin to look at what's going on in your queues, do you see messages 
accumulating but not being delivered? Or delivered but not ACKed? If messages are 
building up (either undelivered or unACKed) faster than consumers are draining them, 
you might be hitting memory-based flow control, which will use TCP back pressure to 
stop the publishers. 

See here for more information: 

http://www.rabbitmq.com/memory.html 

To get an idea whether this is happening to you, check out your queue contents as 
suggested above, and see if memory alarms are being set in your rabbit logs.../ 

Best regards, 
Jerry 

----- Original Message ----- 
From: "Dathan Pattishall" < dathan at schoolfeed.com > 
To: rabbitmq-discuss at lists.rabbitmq.com 
Sent: Tuesday, January 31, 2012 12:52:55 PM 
Subject: [rabbitmq-discuss] Problems with rabbit and hoping I can get some help 

Let me first describe my setup. 

root at webnode1]# rabbitmqadmin.py show overview 
+--------------------+-----------------+--------------------+------------------+ 
| management_version | node | statistics_db_node | statistics_level | 
+--------------------+-----------------+--------------------+------------------+ 
| 2.7.1 | rabbit at webnode1 | rabbit at webnode1 | fine | 
+--------------------+-----------------+--------------------+------------------+ 

Rabbit MQ's producers comes from PHP 5.3.8 http://www.php.net/manual/en/book.amqp.php . Each apache process could produce a rabbit message, I am producing around 1000 messages a second on c1.xtralarge instance at ec2. 

My erlang version is 

/usr/local/bin/erl -v 
Erlang R15B (erts-5.9) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:false] 

The PROBLEM: 

After about 40 mins of rabbit accepting messages all connections block causing a rather bad error on the front ends killing traffic. Turning rabbit off and restarting the web servers forces a recovery. 

Stats from Rabbit: 

Roughly 5000 queues are made 
Roughly 3600 exchanges are made 
Each exchange can have at most 1200 queues bound to it. 
Each Queue is setup for autodelete and so is the exchanges with delivery type 1. 
All data passed is JSON 

The consumer is NODE and its keeping up with the consumption 

RabbitMQ memlimit is around 5.3G 
RabbitMQ mem used hits around 1.9G when it freezes produces 
RabbitMQ proc used hits around 220K 
RabbitMQ fd_total is 50K 
RabbitMQ socks_total is around 45K and Socks used is 4K 
mem_ets hists 100M // not sure what this is 

Any idea what is going on? What limit am I hitting? Why does RabbitMQ block? How can I detect that I am about to hit a block state? Any suggestions or request of additional data would be great. 

_______________________________________________ 
rabbitmq-discuss mailing list 
rabbitmq-discuss at lists.rabbitmq.com 
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss