Just to follow up with this, the issue was due to <a href="http://lwn.net/Articles/358910/" target="_blank">GRO</a> being turned on the NIC for our load balancers, which are running an older version of LVS (prior to <a href="http://archive.linuxvirtualserver.org/html/lvs-users/2011-11/msg00024.html">2.6.39</a>).<div>
<br></div><div>In essence, this caused the high throughput publishers to have their data streams fragmented and they fell further behind with the observed send queue overflow on the clients. We'd previously seen something similar with large HTTP streaming uploads.</div>
<div><div><br></div><div>GRO can be turned off using:<font face="courier new, monospace" color="#000000"> <span style="background-color:transparent;font-size:12px;line-height:19px">ethtool -K <NIC> gro off</span></font></div>
<div><br></div><div>Cheers,</div><div>Brendan</div><div><br></div><div>(It's not the flow control, I know, I know ..)</div><div><br></div><div><br></div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Oct 26, 2012 at 2:41 PM, Simon MacMullen <span dir="ltr"><<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 26/10/12 11:23, Brendan Hay wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Yes, so what should the expected observation be - the client code<br>
carries on publishing into a black hole,<br>
which means the send queue on the client/peer socket should keep<br>
growing,<br>
</blockquote>
<br></div>
Well, the send queue is limited in size. So the publisher should block fairly quickly.<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
and when the rabbit reader is<br>
issued new credits, it will lap it all up?<br>
</blockquote>
<br></div>
Yes.<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
It just seems the 'flow' status in the UI stays on permanently, it<br>
doesn't seem to be toggling at high speed, just locked.<br>
</blockquote>
<br></div>
Well, the flow status *in the UI* is designed not to toggle at high speed, since that would not be very readable - it shows "flow" if the connection has been blocked in the last 5 seconds. This is driven by "last_blocked_age" and "last_blocked_by".<br>
<br>
The rabbitmqctl command results you posted show some connections which had blocked some time in the past, all more than 5 seconds ago. Unfortunately I forgot to ask you to add "state" to the list of columns, to determine if they were blocked now. If we see any connections that were blocked by flow control a long time ago, and are still blocked, then I'm concerned.<div class="HOEnZb">
<div class="h5"><br>
<br>
Cheers, Simon<br>
<br>
-- <br>
Simon MacMullen<br>
RabbitMQ, VMware<br>
</div></div></blockquote></div><br></div>