Hi Matthias,<div><br></div><div>Thanks for your advise, after I set the timeout to 1 minute the problem gone, but 1 minute seems too long.. </div><div><br></div><div>Regards,</div><div>Wong</div><div><br><br><div class="gmail_quote">
On Wed, Sep 26, 2012 at 3:17 PM, Matthias Radestock <span dir="ltr"><<a href="mailto:matthias@rabbitmq.com" target="_blank">matthias@rabbitmq.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Wong,<br>
<br>
On 26/09/12 07:18, Wong Kam Hoong wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
*_Master Node Usage_*<div class="im"><br>
Memory : 300M<br>
CPU % : Average 30%<br>
<br></div>
_*Slave Node Usage*_<div class="im"><br>
Memory : 130M<br>
CPU % : Average 30%<br>
<br></div><div class="im">
there are only few messages during that time, e.g.<br>
2012-09-26 03:28:32 => 4 messages<br>
</div></blockquote>
<br>
If the CPU is averaging 30% when the load is so low then the machines must be doing something else too. I notice from...<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
=INFO REPORT==== 26-Sep-2012::03:29:37 ===<br>
accepting AMQP connection <0.8897.18> (<a href="http://192.168.0.100:43836" target="_blank">192.168.0.100:43836</a><br></div>
<<a href="http://192.168.0.100:43836/" target="_blank">http://192.168.0.100:43836/</a>> -> <a href="http://192.168.0.100:5672" target="_blank">192.168.0.100:5672</a><br>
<<a href="http://192.168.0.100:5672/" target="_blank">http://192.168.0.100:5672/</a>>)<br>
</blockquote>
<br>
...that your clients are on the same machine as one of the nodes. Could it be that the machine was busy with other tasks, such as your application?<br>
<br>
Also, 3:29 in the morning is just the sort of time that I'd expect systems to run various housekeeping tasks which could hammer the CPU or disk. Those can result in significant delays, particularly when your messages persistent.<br>
<br>
I recommend increasing the timeout substantially, e.g. to 1 minute. That way if the problem really is just due to load it will go away.<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Are you using mirrored/HA queues at all?<br>
Yes, I'm using HA in all queues.<br>
</blockquote>
<br></div>
We are aware of two bugs in 2.8.x that can cause confirms to go missing. One of them involved setting an x-message-ttl=0 on queues and use of the 'immediate' flag on publish. So if you are doing that then, well, don't :). The 2nd issue involves (re)starting of nodes. So check whether any of the nodes in your cluster (re)started around the time of the problem.<br>
<br>
Regards,<br>
<br>
Matthias.<br>
</blockquote></div><br></div>