[rabbitmq-discuss] Often failed to produce message due to TIMEOUT WAITING FOR ACK

Matthias Radestock matthias at rabbitmq.com
Wed Sep 26 08:17:28 BST 2012


Wong,

On 26/09/12 07:18, Wong Kam Hoong wrote:
> *_Master Node Usage_*
> Memory : 300M
> CPU %  : Average 30%
>
> _*Slave Node Usage*_
> Memory : 130M
> CPU %  : Average 30%
>
> there are only few messages during that time, e.g.
> 2012-09-26 03:28:32 => 4 messages

If the CPU is averaging 30% when the load is so low then the machines 
must be doing something else too. I notice from...

> =INFO REPORT==== 26-Sep-2012::03:29:37 ===
> accepting AMQP connection <0.8897.18> (192.168.0.100:43836
> <http://192.168.0.100:43836/> -> 192.168.0.100:5672
> <http://192.168.0.100:5672/>)

...that your clients are on the same machine as one of the nodes. Could 
it be that the machine was busy with other tasks, such as your application?

Also, 3:29 in the morning is just the sort of time that I'd expect 
systems to run various housekeeping tasks which could hammer the CPU or 
disk. Those can result in significant delays, particularly when your 
messages persistent.

I recommend increasing the timeout substantially, e.g. to 1 minute. That 
way if the problem really is just due to load it will go away.

> Are you using mirrored/HA queues at all?
> Yes, I'm using HA in all queues.

We are aware of two bugs in 2.8.x that can cause confirms to go missing. 
One of them involved setting an x-message-ttl=0 on queues and use of the 
'immediate' flag on publish. So if you are doing that then, well, don't 
:). The 2nd issue involves (re)starting of nodes. So check whether any 
of the nodes in your cluster (re)started around the time of the problem.

Regards,

Matthias.


More information about the rabbitmq-discuss mailing list