[rabbitmq-discuss] Enabling hipe_compile in RHEL 6

Mon Apr 30 15:48:26 BST 2012

On Mon, Apr 30, 2012 at 2:38 AM, Emile Joubert <emile at rabbitmq.com> wrote:

> Hi Chris,
>
> On 30/04/12 04:51, Chris Schmidt wrote:
> > I'm running RabbitMQ 2.71/R14B04 on RHEL 6.2 and am encountering scaling
> > issues with (I believe) RMQ. I'm interested in using the hipe_compile or
> > other configurations to tune RMQ's throughput. Right now we get around
> > 20k messages per second, but anything beyond that the queues begin to
> > grow (there are a number of producer/consumer workers and around 15
> > queues/exchanges they read/write to). Each worker is acknowledging
> > messages as they are processed and sent on to the next process. The
> > messages have variable size.
>
> If the queues start to grow then you should focus on speeding up the
> consumers. If the consumers can't keep up then increasing the speed of
> the broker won't keep the queues short.
>
> Versions of the broker later then 2.7.1 feature internal flow control
> which helps to limit latency. Your problem might benefit from this.
>
>
The workers are chained together A --> B --> C. One worker type processes
data, sends to an exchange which gets picks up from a queue by a downstream
worker. What I see is that the workers are consuming messages and then
spending a large amount of time publishing to the next exchange. This
causes the number of unacknowledged messages to grow and eventually RMQ
hits the memory limit and everything grinds almost to a halt. I am limiting
the number of messages held within each worker to a max of 50k (using Java,
basic queueing consumer with a blocking queue). Originally the internal
queue of messages grew unbounded and the workers would die with an out of
memory error.

> >  I've changed the frame_max setting but that doesn't appear to
> > help. I've tried increasing the number of workers across additional
> > hardware as well, but Rabbit still seems to cap at 20k message/sec. When
>
> The maximum framesize allows you to choose between better latency (small
> framesize) or throughput (large framesize). If many messages are larger
> than 128Kb then increasing framesize may improve throughput (provided
> consumers can keep up).
>
> When you increased the number of workers, were you able to determine
> whether the load was effectively spread amongst all of them? The QoS
> prefetch count setting will help to ensure fair distribution. Try a
> small number (e.g. 10) as a starting point for tuning this value.
>
>
I can see an even distribution of messages across the workers through the
RMQ management console. As the workers increase the number of messages
consumed per worker drops equally. I'll try the QoS setting for consumption
to see if that helps, but it does appear to be a fair distribution on the
consumption side.

> > I set hipe_compile to true, the systems states 'Not HiPE compiling: HiPE
> > not found in this Erlang installation.' What's odd is
> > that erlang-hipe-R14B-04.1.el6.x86_64 is installed. Is there something
> > else that needs to be done for RHEL systems to enable hipe_compile? I
>
> That message means that your installation of Erlang lacks the hipe.beam
> file in the code loading path. The name of the package that includes
> this file depends on how Erlang was packaged in your system. Making use
> of HiPE won't address the core problem though.
>
> > think that it may help get beyond the current problem. If that doesn't
> > help, are there other settings or something I can look at to determine
> > where the bottleneck is? The RMQ server is 60% idle, doesn't have a
> > large amount of I/O wait, and doesn't seem to be saturating its network
> > cards (the server has a bonded ethernet interface). The worker machines
> > are relatively idle as well.
>
> Are *all* the workers idle, or are a small number taking all the load?
> Uneven worker load is a potential cause for the problem you describe and
> can be addressed using prefetch count:
>
> http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.qos.prefetch-count
>
>
No, the workers are processing messages, the machine(s) in use are fairly
idle though (these are pretty beefy 12 core servers so  have room to run
more workers if necessary) I did a profile and the majority of the time is
spent in the basicPublish call. There's definitely a bottleneck here, I
just haven't found it yet. I'm going to verify that there isn't something
within the network causing the RMQ server to not be able to communicate
with the other servers appropriately.

>
> -Emile
>
>
>
Thanks!

 Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120430/35e8ff9a/attachment.htm>