<div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Apr 30, 2012 at 2:38 AM, Emile Joubert <span dir="ltr">&lt;<a href="mailto:emile@rabbitmq.com" target="_blank">emile@rabbitmq.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Chris,<br>

<div class="im"><br>

On 30/04/12 04:51, Chris Schmidt wrote:<br>

&gt; I&#39;m running RabbitMQ 2.71/R14B04 on RHEL 6.2 and am encountering scaling<br>

&gt; issues with (I believe) RMQ. I&#39;m interested in using the hipe_compile or<br>

&gt; other configurations to tune RMQ&#39;s throughput. Right now we get around<br>

&gt; 20k messages per second, but anything beyond that the queues begin to<br>

&gt; grow (there are a number of producer/consumer workers and around 15<br>

&gt; queues/exchanges they read/write to). Each worker is acknowledging<br>

&gt; messages as they are processed and sent on to the next process. The<br>

&gt; messages have variable size.<br>

<br>

</div>If the queues start to grow then you should focus on speeding up the<br>

consumers. If the consumers can&#39;t keep up then increasing the speed of<br>

the broker won&#39;t keep the queues short.<br>

<br>

Versions of the broker later then 2.7.1 feature internal flow control<br>

which helps to limit latency. Your problem might benefit from this.<br>

<div class="im"><br></div></blockquote><div><br></div><div>The workers are chained together A --&gt; B --&gt; C. One worker type processes data, sends to an exchange which gets picks up from a queue by a downstream worker. What I see is that the workers are consuming messages and then spending a large amount of time publishing to the next exchange. This causes the number of unacknowledged messages to grow and eventually RMQ hits the memory limit and everything grinds almost to a halt. I am limiting the number of messages held within each worker to a max of 50k (using Java, basic queueing consumer with a blocking queue). Originally the internal queue of messages grew unbounded and the workers would die with an out of memory error.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

&gt;  I&#39;ve changed the frame_max setting but that doesn&#39;t appear to<br>

&gt; help. I&#39;ve tried increasing the number of workers across additional<br>

&gt; hardware as well, but Rabbit still seems to cap at 20k message/sec. When<br>

<br>

</div>The maximum framesize allows you to choose between better latency (small<br>

framesize) or throughput (large framesize). If many messages are larger<br>

than 128Kb then increasing framesize may improve throughput (provided<br>

consumers can keep up).<br>

<br>

When you increased the number of workers, were you able to determine<br>

whether the load was effectively spread amongst all of them? The QoS<br>

prefetch count setting will help to ensure fair distribution. Try a<br>

small number (e.g. 10) as a starting point for tuning this value.<br>

<div class="im"><br></div></blockquote><div><br></div><div>I can see an even distribution of messages across the workers through the RMQ management console. As the workers increase the number of messages consumed per worker drops equally. I&#39;ll try the QoS setting for consumption to see if that helps, but it does appear to be a fair distribution on the consumption side.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

&gt; I set hipe_compile to true, the systems states &#39;Not HiPE compiling: HiPE<br>

&gt; not found in this Erlang installation.&#39; What&#39;s odd is<br>

&gt; that erlang-hipe-R14B-04.1.el6.x86_64 is installed. Is there something<br>

</div>&gt; else that needs to be done for RHEL systems to enable hipe_compile? I<br>

<br>

That message means that your installation of Erlang lacks the hipe.beam<br>

file in the code loading path. The name of the package that includes<br>

this file depends on how Erlang was packaged in your system. Making use<br>

of HiPE won&#39;t address the core problem though.<br>

<br>

&gt; think that it may help get beyond the current problem. If that doesn&#39;t<br>

&gt; help, are there other settings or something I can look at to determine<br>

&gt; where the bottleneck is? The RMQ server is 60% idle, doesn&#39;t have a<br>

&gt; large amount of I/O wait, and doesn&#39;t seem to be saturating its network<br>

&gt; cards (the server has a bonded ethernet interface). The worker machines<br>

&gt; are relatively idle as well.<br>

<br>

Are *all* the workers idle, or are a small number taking all the load?<br>

Uneven worker load is a potential cause for the problem you describe and<br>

can be addressed using prefetch count:<br>

<br>

<a href="http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.qos.prefetch-count" target="_blank">http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.qos.prefetch-count</a><br>

<span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div><div>No, the workers are processing messages, the machine(s) in use are fairly idle though (these are pretty beefy 12 core servers so  have room to run more workers if necessary) I did a profile and the majority of the time is spent in the basicPublish call. There&#39;s definitely a bottleneck here, I just haven&#39;t found it yet. I&#39;m going to verify that there isn&#39;t something within the network causing the RMQ server to not be able to communicate with the other servers appropriately.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="HOEnZb"><font color="#888888">

<br>

-Emile<br>

<br>

<br>

</font></span></blockquote></div><br></div><div class="gmail_extra">Thanks!</div><div class="gmail_extra"><br></div><div class="gmail_extra"> Chris</div>