Hi Matthias,<div><br></div><div>Thanks for your advise, after I set the timeout to 1 minute the problem gone, but 1 minute seems too long.. </div><div><br></div><div>Regards,</div><div>Wong</div><div><br><br><div class="gmail_quote">

On Wed, Sep 26, 2012 at 3:17 PM, Matthias Radestock <span dir="ltr">&lt;<a href="mailto:matthias@rabbitmq.com" target="_blank">matthias@rabbitmq.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Wong,<br>

<br>

On 26/09/12 07:18, Wong Kam Hoong wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

*_Master Node Usage_*<div class="im"><br>

Memory : 300M<br>

CPU %  : Average 30%<br>

<br></div>

_*Slave Node Usage*_<div class="im"><br>

Memory : 130M<br>

CPU %  : Average 30%<br>

<br></div><div class="im">

there are only few messages during that time, e.g.<br>

2012-09-26 03:28:32 =&gt; 4 messages<br>

</div></blockquote>

<br>

If the CPU is averaging 30% when the load is so low then the machines must be doing something else too. I notice from...<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

=INFO REPORT==== 26-Sep-2012::03:29:37 ===<br>

accepting AMQP connection &lt;0.8897.18&gt; (<a href="http://192.168.0.100:43836" target="_blank">192.168.0.100:43836</a><br></div>

&lt;<a href="http://192.168.0.100:43836/" target="_blank">http://192.168.0.100:43836/</a>&gt; -&gt; <a href="http://192.168.0.100:5672" target="_blank">192.168.0.100:5672</a><br>

&lt;<a href="http://192.168.0.100:5672/" target="_blank">http://192.168.0.100:5672/</a>&gt;)<br>

</blockquote>

<br>

...that your clients are on the same machine as one of the nodes. Could it be that the machine was busy with other tasks, such as your application?<br>

<br>

Also, 3:29 in the morning is just the sort of time that I&#39;d expect systems to run various housekeeping tasks which could hammer the CPU or disk. Those can result in significant delays, particularly when your messages persistent.<br>


<br>

I recommend increasing the timeout substantially, e.g. to 1 minute. That way if the problem really is just due to load it will go away.<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Are you using mirrored/HA queues at all?<br>

Yes, I&#39;m using HA in all queues.<br>

</blockquote>

<br></div>

We are aware of two bugs in 2.8.x that can cause confirms to go missing. One of them involved setting an x-message-ttl=0 on queues and use of the &#39;immediate&#39; flag on publish. So if you are doing that then, well, don&#39;t :). The 2nd issue involves (re)starting of nodes. So check whether any of the nodes in your cluster (re)started around the time of the problem.<br>


<br>

Regards,<br>

<br>

Matthias.<br>

</blockquote></div><br></div>