[rabbitmq-discuss] RabbitMQ / Erlang not running on all cores?

Matt Pietrek mpietrek at skytap.com
Fri Oct 25 22:00:10 BST 2013


Following up on this to close the loop on what happened, and hopefully
provide some useful information if anybody else run into a similar problem.

Long story short, I was able to find/fix the problem without needing to
upgrade from RabbitMQ 3.0.2 and Erlang R15B01. (Although I hope to upgrade
soon.)

Our server had ~800 connections with associated subscriptions from clients.
Using top, I found a disproportionate amount of time spent in system code
on core 0. Using strace, I found that poll() was being called, specifying
~800 file descriptors (i.e., all our connections).

Turns out that Erlang by default doesn't use the much more efficient
epoll() mechanism. However, RabbitMQ's startup of Erlang includes the "+K
true" option, which tells Erlang to use epoll.

For reasons I won't go into, I was setting the RABBITMQ_SERVER_ERL_ARGS
environment variable, not realizing that what I put there *replaced* what
RabbitMQ passes to Erlang. I mistakenly believed it got *added* to the
Erlang start parameter.

Once I changed the RABBITMQ_SERVER_ERL_ARGS variable to include the
RabbitMQ defaults as well, things started working much better.

For what it's worth, it might be useful for RabbitMQ to log the Erlang
related stuff and highlight if any thing's been changed from the defaults.

Thanks Matthias, Michael and Zhibo for your help!


On Wed, Oct 23, 2013 at 12:08 AM, Matthias Radestock
<matthias at rabbitmq.com>wrote:

> Matt,
>
>
> On 22/10/13 23:44, Matt Pietrek wrote:
>
>> The load on our box (it's in production) has dropped a little bit, but
>> core 0 is still running nearly flat out.
>>
>
> If you are seeing *less* than 100% utilisation of one core then your
> system is *not* CPU bound.
>
> Check that RabbitMQ is actually a bottlneck here, rather than, say, your
> producers or consumers.
>
>
>  Are there other useful "eval" commands that I could use to ferret out
>> what's happening? This is a box in production and we have an upcoming
>> release that's only going to add more load on this box.
>>
>
> We could come up with some eval code to identify the top running
> processes. Or perhaps find a way to get etop to run (Erlang's equivalent of
> 'top'). But...
>
>
>  Also, we are running this in a cluster with another box, with all queues
>> in HA mode.
>>
>
> There have been tons of improvements *and bug fixes* to mirrored queues.
> Some relating to performance. So I strongly recommend you upgrade RabbitMQ
> to the lastest version. And Erlang too, due to the SMP scheduler
> improvements I mentioned in my earlier email.
>
> If you still see performance anomalies after the upgrade then we can
> investigate further.
>
>
> Regards,
>
>
> Matthias.
> ______________________________**_________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.**rabbitmq.com<rabbitmq-discuss at lists.rabbitmq.com>
> https://lists.rabbitmq.com/**cgi-bin/mailman/listinfo/**rabbitmq-discuss<https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131025/504c794b/attachment.htm>


More information about the rabbitmq-discuss mailing list