[rabbitmq-discuss] Throughput observation with RabbitMQ-3.1.3 and Erlang R16B01 - Single Node and Cluster Node
Priyanki Vashi
vashi.priyanki at gmail.com
Thu Aug 1 20:06:03 BST 2013
Hi There ,
I have done many small small tests to understand how the throughput of
rabbitMQ node scales with respect to cores as well as number of producers
and consumers. Here are my observations. Tests are for both single node as
well as Cluster configuration.
Based on this observations, first thing I would like to confirm if these
are expected behaviour/results or I can do something more to improve
throughput. Secondly there are some specific questions with respect to them
so would help me to have clarification on them to continue further.
Just so you know, I have also checked the performance statistics, which
were published on rabbitMQ site for 2.8.1 version, and based on the points
there I tried different things. Like changing prefetch_count values,
non-persistent type of messages as well as persistent type of messages,
DISK node and RAM node etc. etc.
My test configuration are as follows.
RabbitMQ version . 3.1.3 with Erlang version - R16B01
I have one virtual machine with 8 GB of RAM and 20 cores - This is
dedicated for mainly rabbit nodes.
I have another virtual machine with 8 GB of RAM and 20 cores - This is
dedicated mainly my producer and consumers. Also my producer and consumer
are single threaded type (using python pika library) so I have try to go 10
producer and 10 consumer by giving each of them 1 dedicated core to find
out system limit of RabbitMQ. I start linearly. first with 1P and 1C and so
on.....
Since my interest is to benchmark performance and find system limits on
RabbitMQ, I have simulated producer and consumer and basically there is no
processing with messages after consumer receives the messages. This means,
I believe that I am producing as fast as above configuration of VM supports
as well as I am consuming as fast as I can.
I am using non-blocking method of connection (using select.connection)
method of pika.
Message size is 100 bytes. I have configured 'direct' type of exchange
I also have enabled publisher confirm as well as consumer ack since I am
interested in reliable delivery and confirmation of messages till
application layer. Hence I explicitly use publisher confirm and consumer
ack.
Here comes my statistics
Test-1) With Single Node configuration:
The maximum throughout I can get is around 5000 msg/sec - with 1
publisher and 1 consumer, with prefetch_count = 0
Node type = Disk
Both producer and consumer are given dedicated cores using linux
taskset command. (if I leave core assignment on linux then throughput is
only around 3500 msg/sec) Core assignment for Rabbit Nodes are left to the
linux and not touched.
Here the limiting factor is publisher since it loads it's assigned core
to almost 100%. So to have better throughput , I started another producer
assigning another core and also started corresponding consumer with
it's own dedicated core. Try to publish to same queue as well as different
queue.
But still I see the overall throughout remains around the same value
and increases little bit and its division is roughly as follows for each of
the producer and receiver.
P1 and P2- publish rate - roughly 2500-2700 msg/sec. Same for
consumption, which adds to the total of around 5000-5500 msg/sec.
Even if I introduce prefetch_count value it hardly changes the
throughput.
Also, I tried with both persistent and non-persistent messages,
throughput does not change much. It's almost the same as listed above with
node of type DISK.
So from this it feels that maximum capacity of a single node of type DISK
is limited to 5000 msg/sec when publisher confirm and consumer ack is
enabled in this version. I thought main reason for this could be server
latency. Is it correct understanding or I am missing something here to
consider ?
And my specific questions on this particular observations are as follows.
1) Is this expected behaviour on throughput scaling when number of producer
and consumer increases linearly ?
2) Can something be done to improve throughout with single node
configuration without changing publisher confirm and comsumer ack
configuration ( means keeping them enabled) ?
3) How to calculate server latency in approximate way ? here I thought by
adding round trip time (RTT) for both publisher confirm and consumer ack,
one can get latency. Is this correct understanding ? What is the effective
method to calculate RTT ?
Test-2) With Cluster configuration:
First I tried, Cluster with 1 DISK node and 1 RAM node
Here when my producer and receiver try to connect to DISK node, statistics
are almost similar to Test-1.
I tried with single producer-single consumer, 2 publisher and 2consumer and
so on. Not any observable diff in throughput
Now when my producer and consumer connect to RAM node, I see following.
1 P and 1 C - throughout is around 4500-5000 msg/sec
2P and 2C - throughout is around 9000-10000 msg/sec
beyond this if I increase producer and consumer throughput starts to drop
little bit with overall throughput to 12000 msg/sec with each
producer/consumer having 4000 msg/sec.
So again I feel, after certain number of producer and consumer, server
latency do come into picture even for RAM type of node and slowly drops the
throughout.
Instead of having multiple of 5000 msg/sec for every increase in producer
and consumer it becomes roughly 4000 , 3500 , 3000 msg/sec per
producer-consumer pair.
After this I added third node, fourth node and so on in the cluster, All
are of also of type RAM. And maximum throughout I can get is around
22000-24500 msg/sec.
Changing prefetch_count or delivery_mode ( from persistent to
non-persistent and vice versa) do not really makes any big difference.
So then my specific questions on Test-2) observation are as follows.
1) Why there is no linear increase in throughput with DISK type of node as
it's seen with RAM type of node ?
2) At least for messages of type non-persistent I believe DISK type and
RAM type should behave similar but they are not so what are the main
difference in the way DISK type and RAM type of node handles
non-persistent messages ?
3) What can be done to improve throughput in both the Tests ?
4) Since I have VM with 20 cores dedicated for rabbitMQ execution, how can
I load the CPU to it's limit ? with the current tests I can load CPU
maximum to 800% with above mentioned throughput. currently the limiting
factor seems to be server latency so how to overcome that ?
Best Regards,
Priyanki.
On Tuesday, June 25, 2013 1:09:28 PM UTC+2, hyperthunk wrote:
>
> Reposting
>
> On 25 Jun 2013, at 09:38, Tim Watson <t... at rabbitmq.com <javascript:>>
> wrote:
>
> What does your publishing code look like? The figures below are expected
> in that the consumer can keep pace with the producer - it could hardly be
> expected to consume faster than messages are arriving in the queue(s). So
> the slowness is very likely on the producing side.
>
> Are you using persistent messages and either publisher confirms or
> transactions? If so, how often are you waiting on confirms/commits?
>
> With the official clients we typically see avg rates of 50 - 60Khz with
> non-persistent messages. Persistence slows things down a tad, as do
> confirms (and more so transactions) but even with persistent messages and
> confirms, rates >= 5Khz are expected. It /sounds/ like you might be
> publishing persistent messages with confirms enabled and waiting for a
> confirm (ack) from the broker for each message. That involves disk I/O on
> the server for each message plus network latency, effectively making
> publishing synchronous (and very slow by comparison).
>
> Cheers,
> Tim
>
> On 25 Jun 2013, at 08:44, Priyanki Vashi <vashi.p... at gmail.com<javascript:>>
> wrote:
>
> Hi there,
>
> I am doing a performance study of RabbitMQ-3.1.1 and this is my first time
> to do such a performance study with any messaging broker :))-
>
> 1) I have thoroughly gone through rabbitMQ in action' and learnt important
> concepts.
>
> 2) Tried single node broker to get a feel of how it is working and then
> set up a four node cluster (with two disk and two RAM type of node). Also
> configured HAproxy TCP Load balancer so that I can just provide single port
> to connect to the Cluster.
>
> 3) I am simulating producer and consumers through Python scripts ( using
> Python-pika library methods to connect to server , publish subscribe etc.)
>
> 4) My scripts are working fine but where I am stuck is no matter what I do
> my throughout is always 300 msg/sec.
>
> 5) I have defines durable exchanges and queues
>
> My final requirement is to run atleast 10 to 15 producer and 60 to 70
> consumer simultaneously and I want to start with linear increase in number
> of producer and consumer so that I can make conclusions about throughout,
> fault handling, processor utilization etc. etc but I am seriously stuck now
> after trying to start in initial steps only. This group's help would be
> really appreciated.
>
> I have started with following different scenarios but no matter what I do
> my throughput is more or less remaining same (300 msg/sec) except for
> Scenario-1
>
> Scenario-1
> -1 producer and No consumer and no queue binded to exchange
> -Producer is running in infinite loop and publishing to one fanout exchange
> - publisher/confirm disabled
> -Publisher rate - 6200 msg/sec ( checked through web management plugin)
>
> Tried scenario-1 with also fanout type of exchange and it's the same
> publish rate
> I know that Scenario-1 is not really useful, since there are no queues and
> ultimately messages will be dropped but as a part of debugging process I
> tried this and I see above mentioned results.
>
> Scenario-2
> -1 producer and 1 consumer
> -Producer is running in infinite loop and publishing to one direct exchange
> -A consumer has it's own dedicated queue and listening to above exchange
> - publisher/confirm and consumer ack are disabled
> Throughput - 300 msg/sec ( which is basically publish rate = 300 msg/sec
> and deliver rate - 300 msg/sec)
>
> Tried Scenario-2 also with fanout type of exchange and enabling publisher
> confirm and consumer ack
> Still the same throughput as 300 msg/sec
>
> Scenario-3
> -1 producer and 4 consumer
> -Producer is running in infinite loop and publishing to four direct
> exchange
> -A consumer has it's own dedicated queue and listening to respective
> exchange
> - publisher/confirm and consumer ack are disabled
> Throughput - 300 msg/sec ( which is basically publish rate = 300 msg/sec
> and deliver rate - 300 msg/sec)
>
> Tried Scenario-3 also with fanout type of exchange and enabling publisher
> confirm and consumer ack
> Still the same throughput as 300 msg/sec
>
> Tried configuring prefetch_count parameters also to 100 but it still gives
> me same throughput of 300 msg/sec.
> I am honestly going crazy with this.
>
> After seeing this behavior, I am seriously suspecting that there is some
> serious limitation with my simulated producers and consumers.
> Has anyone else has tried Python-pika client and any idea on throughput
> with this version of rabbit ?
> Did anyone have rough idea about throughout with rabbitMQ-3.1.1 ?
>
> I can also share my python scripts if required but I would really
> appreciate some light on this situation
> Also what points to take care, in order to improve throughput ?
>
> Best Regards,
> Priyanki
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq... at lists.rabbitmq.com <javascript:>
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130801/5874b5ed/attachment.htm>
More information about the rabbitmq-discuss
mailing list