<div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Apr 30, 2012 at 2:38 AM, Emile Joubert <span dir="ltr"><<a href="mailto:emile@rabbitmq.com" target="_blank">emile@rabbitmq.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Chris,<br>
<div class="im"><br>
On 30/04/12 04:51, Chris Schmidt wrote:<br>
> I'm running RabbitMQ 2.71/R14B04 on RHEL 6.2 and am encountering scaling<br>
> issues with (I believe) RMQ. I'm interested in using the hipe_compile or<br>
> other configurations to tune RMQ's throughput. Right now we get around<br>
> 20k messages per second, but anything beyond that the queues begin to<br>
> grow (there are a number of producer/consumer workers and around 15<br>
> queues/exchanges they read/write to). Each worker is acknowledging<br>
> messages as they are processed and sent on to the next process. The<br>
> messages have variable size.<br>
<br>
</div>If the queues start to grow then you should focus on speeding up the<br>
consumers. If the consumers can't keep up then increasing the speed of<br>
the broker won't keep the queues short.<br>
<br>
Versions of the broker later then 2.7.1 feature internal flow control<br>
which helps to limit latency. Your problem might benefit from this.<br>
<div class="im"><br></div></blockquote><div><br></div><div>The workers are chained together A --> B --> C. One worker type processes data, sends to an exchange which gets picks up from a queue by a downstream worker. What I see is that the workers are consuming messages and then spending a large amount of time publishing to the next exchange. This causes the number of unacknowledged messages to grow and eventually RMQ hits the memory limit and everything grinds almost to a halt. I am limiting the number of messages held within each worker to a max of 50k (using Java, basic queueing consumer with a blocking queue). Originally the internal queue of messages grew unbounded and the workers would die with an out of memory error.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
> I've changed the frame_max setting but that doesn't appear to<br>
> help. I've tried increasing the number of workers across additional<br>
> hardware as well, but Rabbit still seems to cap at 20k message/sec. When<br>
<br>
</div>The maximum framesize allows you to choose between better latency (small<br>
framesize) or throughput (large framesize). If many messages are larger<br>
than 128Kb then increasing framesize may improve throughput (provided<br>
consumers can keep up).<br>
<br>
When you increased the number of workers, were you able to determine<br>
whether the load was effectively spread amongst all of them? The QoS<br>
prefetch count setting will help to ensure fair distribution. Try a<br>
small number (e.g. 10) as a starting point for tuning this value.<br>
<div class="im"><br></div></blockquote><div><br></div><div>I can see an even distribution of messages across the workers through the RMQ management console. As the workers increase the number of messages consumed per worker drops equally. I'll try the QoS setting for consumption to see if that helps, but it does appear to be a fair distribution on the consumption side.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
> I set hipe_compile to true, the systems states 'Not HiPE compiling: HiPE<br>
> not found in this Erlang installation.' What's odd is<br>
> that erlang-hipe-R14B-04.1.el6.x86_64 is installed. Is there something<br>
</div>> else that needs to be done for RHEL systems to enable hipe_compile? I<br>
<br>
That message means that your installation of Erlang lacks the hipe.beam<br>
file in the code loading path. The name of the package that includes<br>
this file depends on how Erlang was packaged in your system. Making use<br>
of HiPE won't address the core problem though.<br>
<br>
> think that it may help get beyond the current problem. If that doesn't<br>
> help, are there other settings or something I can look at to determine<br>
> where the bottleneck is? The RMQ server is 60% idle, doesn't have a<br>
> large amount of I/O wait, and doesn't seem to be saturating its network<br>
> cards (the server has a bonded ethernet interface). The worker machines<br>
> are relatively idle as well.<br>
<br>
Are *all* the workers idle, or are a small number taking all the load?<br>
Uneven worker load is a potential cause for the problem you describe and<br>
can be addressed using prefetch count:<br>
<br>
<a href="http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.qos.prefetch-count" target="_blank">http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.qos.prefetch-count</a><br>
<span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div><div>No, the workers are processing messages, the machine(s) in use are fairly idle though (these are pretty beefy 12 core servers so have room to run more workers if necessary) I did a profile and the majority of the time is spent in the basicPublish call. There's definitely a bottleneck here, I just haven't found it yet. I'm going to verify that there isn't something within the network causing the RMQ server to not be able to communicate with the other servers appropriately.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="HOEnZb"><font color="#888888">
<br>
-Emile<br>
<br>
<br>
</font></span></blockquote></div><br></div><div class="gmail_extra">Thanks!</div><div class="gmail_extra"><br></div><div class="gmail_extra"> Chris</div>