[rabbitmq-discuss] Lower delivery rate than publish rate - why?
Alvaro Videla
videlalvaro at gmail.com
Sun Dec 15 13:00:29 GMT 2013
Hi Mike,
Yes, RabbitMQ queues are designed for fast delivery of messages and for
being as empty as possible, as that blog post explains.
Another interesting blog post, about consumer strategies and basic.qos
settings is this one:
http://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/#more-276
re multi ack: yes, that might help.
Regards,
Alvaro
On Sat, Dec 14, 2013 at 2:15 AM, MikeTempleman <mike at meshfire.com> wrote:
> I realized that was a bad interpretation. Sorry. The exchange is just
> successfully routing all the messages to the target queues.
>
> After reading a number of posts and this blog entry (
> http://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/) I wonder if
> the issue is that each message is ack'd. It seemed that the issue occurred
> when I had a large backlog in the queues. When Rabbit is empty, performance
> is fine. When the consumers tried to run at much higher speeds, we
> encountered this cycling.
>
> We have run a brief test with no-ack, not on production, and the
> performance is excellent even under load. But this is not a viable solution
> (appservers crash, autoscaling shuts servers down that have prefetched
> messages and are still connected to rabbit) without a full redesign.
>
> Assuming each queue is only one thread (I assume it handles both receipt,
> delivery, and ack cleanup) then I can understand what might happen when the
> consumers generate ~500 acks/s while new messages are coming in at a low
> 50-100/s rate on a specific queue. I will move out some events that tend to
> generate peaks into their own queue and accept that queue processing more
> slowly. As for separating the real worker queue, I suppose I could create 2
> or so static queues to divide the load up.
>
> So what I think I can do is:
> 1. bump the default tcp buffer from 128KB to around 10MB. The added
> buffering may help a little
> 2. see if I can find out how to set the multiple ack flag. We are using
> Grails so maybe that is just creating a custom bean. I don't know
> 3. create a couple of queues for lower-priority events specifically events
> chosen to be less time critical.
> 4. if all that doesn't work then probably create 4 queues for the high
> priority events, randomly publish to those queues, and put consumers for
> each queue.
> 5. Also, upgrade the server to the latest version.
>
> Mike Templeman
>
> --
>
> *Mike Templeman*
> *Head of Development*
>
> T: @talkingfrog1950 <http://twitter.com/missdestructo>
> T: @Meshfire <http://twitter.com/meshfire>
>
>
>
> On Fri, Dec 13, 2013 at 1:42 PM, Mike Templeman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32089&i=0>
> > wrote:
>
>> I noticed something else very odd.
>>
>> Currently, one queue has 43,000 messages backed up in its queue. But when
>> I look at the exchange (there is only one exchange) I see that the message
>> rate in exactly matches the message rate out.
>>
>> With such a huge backlog, why would that be? I would have thought that
>> the consumers (there are 16 total distributed across 4 systems for that
>> queue with a prefetch of 100) would run at a much higher steady state.
>>
>> This exchange also seems to cycle regularly. It appears to run from a low
>> of around 60/s in and out to 500+ a second in and out.
>>
>> --
>>
>> *Mike Templeman*
>> *Head of Development*
>>
>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>> T: @Meshfire <http://twitter.com/meshfire>
>>
>>
>>
>> On Fri, Dec 13, 2013 at 10:40 AM, Mike Templeman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32089&i=1>
>> > wrote:
>>
>>> Also, from observing the Connections screen on the web UI shows that no
>>> flow control has been recently turned on for any of the four current
>>> connections (four app servers).
>>>
>>> --
>>>
>>> *Mike Templeman *
>>> *Head of Development*
>>>
>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>> T: @Meshfire <http://twitter.com/meshfire>
>>>
>>>
>>>
>>> On Fri, Dec 13, 2013 at 10:17 AM, Mike Templeman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32089&i=2>
>>> > wrote:
>>>
>>>> Hi Alvaro
>>>>
>>>> I would be more than happy to provide logs. But all they have in them
>>>> is connection and shutdown information. Nothing more. I have just enabled
>>>> tracing on the vhost and will send the logs shortly. We encounter this
>>>> issue when under load every day now.
>>>>
>>>> Let me tell you our architecture and deployment:
>>>>
>>>> rabbitMQ:
>>>>
>>>> - m1.large ec2 instance. Version: RabbitMQ 3.1.5, Erlang R14B04
>>>> - 23 queues (transaction and direct)
>>>> - 3 exchanges used, two fanout and one topic exchange
>>>> - topic exchange
>>>> - Topic exchange overview is attached.
>>>> - 46 total channels.
>>>>
>>>>
>>>> AppServers
>>>>
>>>> - m1.large tomcat servers running grails application
>>>> - 2-7 servers at any one time.
>>>> - Consume + publish
>>>> - On busy queues, each server has 16 consumers with prefetch at 100
>>>> - message sizes on busy queues are ~4KB.
>>>> - publishing rates on busiest queue ranges from 16/s to >100/s. (We
>>>> need to be able to support 1000/s).
>>>>
>>>>
>>>> Each AppServer connects to a sharded mongodb cluster of 3 shards. Our
>>>> first suspicion was that something in Mongo or AWS was causing the periodic
>>>> delay but AWS techs looked into our volume use and said we were only use
>>>> 25% of available bandwidth.
>>>>
>>>> At this moment, we have a modest publish rate (~50-60/s) but a backlog
>>>> of 50,000 messages for the queue "user". You can see a 10 minute snapshot
>>>> of the queue and see the cycling.
>>>>
>>>> I turned on tracing but the results don't seem to becoming into the
>>>> log. Is there another way to enable reporting of flow control?
>>>>
>>>> Mike Templeman
>>>>
>>>>
>>>> --
>>>>
>>>> *Mike Templeman*
>>>> *Head of Development*
>>>>
>>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>>> T: @Meshfire <http://twitter.com/meshfire>
>>>>
>>>>
>>>>
>>>> On Fri, Dec 13, 2013 at 6:03 AM, Alvaro Videla-2 [via RabbitMQ] <[hidden
>>>> email] <http://user/SendEmail.jtp?type=node&node=32089&i=3>> wrote:
>>>>
>>>>> Mike,
>>>>>
>>>>> Would you be able to provide information more information to help us
>>>>> debug the problem?
>>>>>
>>>>> Tim (from the rabbitmq team) requested more info in order to try to
>>>>> find answers for this.
>>>>>
>>>>> For example, when consumption drops to zero, are there any logs on the
>>>>> rabbitmq server that might tell of a flow control mechanism being
>>>>> activated?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Alvaro
>>>>>
>>>>>
>>>>> On Fri, Dec 13, 2013 at 2:19 AM, MikeTempleman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32063&i=0>>
>>>>> wrote:
>>>>>
>>>>> > Tyson
>>>>> >
>>>>> > Did you ever find an answer to this question? We are encountering
>>>>> virtually
>>>>> > the exact same problem.
>>>>> >
>>>>> > We have a variable number of servers setup as producers and
>>>>> consumers and
>>>>> > see our throughput drop to zero on a periodic basis. This is most
>>>>> severe
>>>>> > when there are a few hundred thousand messages on rabbit.
>>>>> >
>>>>> > Did you just drop Rabbit? Ours is running on an m1.large instance
>>>>> with RAID0
>>>>> > ephemeral drives, so size and performance of the disk subsystem is
>>>>> not an
>>>>> > issue (we are still in beta). We have spent untold hours tuning our
>>>>> sharded
>>>>> > mongodb subsystem only to find out that it is only being 25%
>>>>> utilized (at
>>>>> > least it will be blazing fast if we ever figure this out).
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > View this message in context:
>>>>> http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32040.html
>>>>> > Sent from the RabbitMQ mailing list archive at Nabble.com.
>>>>> > _______________________________________________
>>>>> > rabbitmq-discuss mailing list
>>>>> > [hidden email] <http://user/SendEmail.jtp?type=node&node=32063&i=1>
>>>>> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>>> _______________________________________________
>>>>> rabbitmq-discuss mailing list
>>>>> [hidden email] <http://user/SendEmail.jtp?type=node&node=32063&i=2>
>>>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>>>
>>>>>
>>>>> ------------------------------
>>>>> If you reply to this email, your message will be added to the
>>>>> discussion below:
>>>>>
>>>>> http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32063.html
>>>>> To unsubscribe from Lower delivery rate than publish rate - why?, click
>>>>> here.
>>>>> NAML<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>>
>>>>
>>>>
>>>
>>
>
> ------------------------------
> View this message in context: Re: Lower delivery rate than publish rate -
> why?<http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32089.html>
> Sent from the RabbitMQ mailing list archive<http://rabbitmq.1065348.n5.nabble.com/>at Nabble.com.
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131215/1d96d215/attachment.html>
More information about the rabbitmq-discuss
mailing list