[rabbitmq-discuss] Lower delivery rate than publish rate - why?

Sat Dec 21 21:10:55 GMT 2013

Hi,

RabbitMQ will always prioritise sending messages to consumers that are
ready. So a slow consumer won't block others from receiving messages,
provided there's a basic.qos set, ie: not unlimited.

There's a better explanation of consumers ready here:
http://www.rabbitmq.com/consumer-priority.html

Regards,

Alvaro

On Sat, Dec 21, 2013 at 6:02 PM, MikeTempleman <mike at meshfire.com> wrote:

> We are still seeing the cycling.
>
> I wanted to ask if the round-robing scheduling could be a factor.
>
> For example, if I have 4 servers with 5 channels each on the same topic
> queue. Each channel has a prefetch setting of 50. When a channel consumer
> becomes blocked (slow db operation, overloaded with other requests, etc.),
> does that block delivery of messages to the other channels? If that
> delivery to fill the prefetch cache is delayed, is there any impact on the
> client (i.e. does it become blocked)?
>
> iow, is there a way for all channels on all servers to become blocked in a
> high load situation? Even if their prefetch caches are full?
>
> --
>
> *Mike Templeman*
> *Head of Development*
>
> T: @talkingfrog1950 <http://twitter.com/missdestructo>
> T: @Meshfire <http://twitter.com/meshfire>
>
>
>
> On Wed, Dec 18, 2013 at 10:32 AM, Mike Templeman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32259&i=0>
> > wrote:
>
>> Well, multi-ack didn't help very much. We can see some, but not enough to
>> matter.
>>
>> We cannot use auto-ack because consumers (multiple/server) die
>> unexpectedly as the app servers are autoscaled. We have not built a fully
>> separated service yet (too hard to debug on development machines right
>> now). But could Publisher confirms resolve the issue of servers dying with
>> n messages in their prefetch buffers?
>>
>>
>>
>> --
>>
>> *Mike Templeman *
>> *Head of Development*
>>
>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>> T: @Meshfire <http://twitter.com/meshfire>
>>
>>
>>
>> On Sun, Dec 15, 2013 at 5:02 AM, Alvaro Videla-2 [via RabbitMQ] <[hidden
>> email] <http://user/SendEmail.jtp?type=node&node=32259&i=1>> wrote:
>>
>>> Hi Mike,
>>>
>>> Yes, RabbitMQ queues are designed for fast delivery of messages and for
>>> being as empty as possible, as that blog post explains.
>>>
>>> Another interesting blog post, about consumer strategies and basic.qos
>>> settings is this one:
>>> http://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/#more-276
>>>
>>> re multi ack: yes, that might help.
>>>
>>> Regards,
>>>
>>> Alvaro
>>>
>>>
>>> On Sat, Dec 14, 2013 at 2:15 AM, MikeTempleman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32095&i=0>
>>> > wrote:
>>>
>>>> I realized that was a bad interpretation. Sorry. The exchange is just
>>>> successfully routing all the messages to the target queues.
>>>>
>>>> After reading a number of posts and this blog entry (
>>>> http://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/) I wonder
>>>> if the issue is that each message is ack'd. It seemed that the issue
>>>> occurred when I had a large backlog in the queues. When Rabbit is empty,
>>>> performance is fine. When the consumers tried to run at much higher speeds,
>>>> we encountered this cycling.
>>>>
>>>> We have run a brief test with no-ack, not on production, and the
>>>> performance is excellent even under load. But this is not a viable solution
>>>> (appservers crash, autoscaling shuts servers down that have prefetched
>>>> messages and are still connected to rabbit) without a full redesign.
>>>>
>>>> Assuming each queue is only one thread (I assume it handles both
>>>> receipt, delivery, and ack cleanup) then I can understand what might happen
>>>> when the consumers generate ~500 acks/s while new messages are coming in at
>>>> a low 50-100/s rate on a specific queue. I will move out some events that
>>>> tend to generate peaks into their own queue and accept that queue
>>>> processing more slowly. As for separating the real worker queue, I suppose
>>>> I could create 2 or so static queues to divide the load up.
>>>>
>>>> So what I think I can do is:
>>>> 1. bump the default tcp buffer from 128KB to around 10MB. The added
>>>> buffering may help a little
>>>> 2. see if I can find out how to set the multiple ack flag. We are using
>>>> Grails so maybe that is just creating a custom bean. I don't know
>>>> 3. create a couple of queues for lower-priority events specifically
>>>> events chosen to be less time critical.
>>>> 4. if all that doesn't work then probably create 4 queues for the high
>>>> priority events, randomly publish to those queues, and put consumers for
>>>> each queue.
>>>> 5. Also, upgrade the server to the latest version.
>>>>
>>>> Mike Templeman
>>>>
>>>> --
>>>>
>>>> *Mike Templeman*
>>>> *Head of Development*
>>>>
>>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>>> T: @Meshfire <http://twitter.com/meshfire>
>>>>
>>>>
>>>>
>>>> On Fri, Dec 13, 2013 at 1:42 PM, Mike Templeman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32089&i=0>
>>>> > wrote:
>>>>
>>>>> I noticed something else very odd.
>>>>>
>>>>> Currently, one queue has 43,000 messages backed up in its queue. But
>>>>> when I look at the exchange (there is only one exchange) I see that the
>>>>> message rate in exactly matches the message rate out.
>>>>>
>>>>> With such a huge backlog, why would that be? I would have thought that
>>>>> the consumers (there are 16 total distributed across 4 systems for that
>>>>> queue with a prefetch of 100) would run at a much higher steady state.
>>>>>
>>>>> This exchange also seems to cycle regularly. It appears to run from a
>>>>> low of around 60/s in and out to 500+ a second in and out.
>>>>>
>>>>>  --
>>>>>
>>>>> *Mike Templeman*
>>>>> *Head of Development*
>>>>>
>>>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>>>> T: @Meshfire <http://twitter.com/meshfire>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Dec 13, 2013 at 10:40 AM, Mike Templeman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32089&i=1>
>>>>> > wrote:
>>>>>
>>>>>> Also, from observing the Connections screen on the web UI shows that
>>>>>> no flow control has been recently turned on for any of the four current
>>>>>> connections (four app servers).
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Mike Templeman *
>>>>>> *Head of Development*
>>>>>>
>>>>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>>>>> T: @Meshfire <http://twitter.com/meshfire>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 13, 2013 at 10:17 AM, Mike Templeman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32089&i=2>
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Alvaro
>>>>>>>
>>>>>>> I would be more than happy to provide logs. But all they have in
>>>>>>> them is connection and shutdown information. Nothing more. I have just
>>>>>>> enabled tracing on the vhost and will send the logs shortly. We encounter
>>>>>>> this issue when under load every day now.
>>>>>>>
>>>>>>> Let me tell you our architecture and deployment:
>>>>>>>
>>>>>>> rabbitMQ:
>>>>>>>
>>>>>>>    - m1.large ec2 instance. Version: RabbitMQ 3.1.5,  Erlang R14B04
>>>>>>>    - 23 queues (transaction and direct)
>>>>>>>    - 3 exchanges used, two fanout and one topic exchange
>>>>>>>    - topic exchange
>>>>>>>    - Topic exchange overview is attached.
>>>>>>>    - 46 total channels.
>>>>>>>
>>>>>>>
>>>>>>> AppServers
>>>>>>>
>>>>>>>    - m1.large tomcat servers running grails application
>>>>>>>    - 2-7 servers at any one time.
>>>>>>>    - Consume + publish
>>>>>>>    - On busy queues, each server has 16 consumers with prefetch at
>>>>>>>    100
>>>>>>>    - message sizes on busy queues are ~4KB.
>>>>>>>    - publishing rates on busiest queue ranges from 16/s to >100/s.
>>>>>>>    (We need to be able to support 1000/s).
>>>>>>>
>>>>>>>
>>>>>>> Each AppServer connects to a sharded mongodb cluster of 3 shards.
>>>>>>> Our first suspicion was that something in Mongo or AWS was causing the
>>>>>>> periodic delay but AWS techs looked into our volume use and said we were
>>>>>>> only use 25% of available bandwidth.
>>>>>>>
>>>>>>> At this moment, we have a modest publish rate (~50-60/s) but a
>>>>>>> backlog of 50,000 messages for the queue "user". You can see a 10 minute
>>>>>>> snapshot of the queue and see the cycling.
>>>>>>>
>>>>>>> I turned on tracing but the results don't seem to becoming into the
>>>>>>> log. Is there another way to enable reporting of flow control?
>>>>>>>
>>>>>>> Mike Templeman
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>>
>>>>>>> *Mike Templeman*
>>>>>>> *Head of Development*
>>>>>>>
>>>>>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>>>>>> T: @Meshfire <http://twitter.com/meshfire>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 13, 2013 at 6:03 AM, Alvaro Videla-2 [via RabbitMQ] <[hidden
>>>>>>> email] <http://user/SendEmail.jtp?type=node&node=32089&i=3>> wrote:
>>>>>>>
>>>>>>>> Mike,
>>>>>>>>
>>>>>>>> Would you be able to provide information more information to help
>>>>>>>> us
>>>>>>>> debug the problem?
>>>>>>>>
>>>>>>>> Tim (from the rabbitmq team) requested more info in order to try to
>>>>>>>> find answers for this.
>>>>>>>>
>>>>>>>> For example, when consumption drops to zero, are there any logs on
>>>>>>>> the
>>>>>>>> rabbitmq server that might tell of a flow control mechanism being
>>>>>>>> activated?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Alvaro
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Dec 13, 2013 at 2:19 AM, MikeTempleman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32063&i=0>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > Tyson
>>>>>>>> >
>>>>>>>> > Did you ever find an answer to this question? We are encountering
>>>>>>>> virtually
>>>>>>>> > the exact same problem.
>>>>>>>> >
>>>>>>>> > We have a variable number of servers setup as producers and
>>>>>>>> consumers and
>>>>>>>> > see our throughput drop to zero on a periodic basis. This is most
>>>>>>>> severe
>>>>>>>> > when there are a few hundred thousand messages on rabbit.
>>>>>>>> >
>>>>>>>> > Did you just drop Rabbit? Ours is running on an m1.large instance
>>>>>>>> with RAID0
>>>>>>>> > ephemeral drives, so size and performance of the disk subsystem
>>>>>>>> is not an
>>>>>>>> > issue (we are still in beta). We have spent untold hours tuning
>>>>>>>> our sharded
>>>>>>>> > mongodb subsystem only to find out that it is only being 25%
>>>>>>>> utilized (at
>>>>>>>> > least it will be blazing fast if we ever figure this out).
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > View this message in context:
>>>>>>>> http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32040.html
>>>>>>>> > Sent from the RabbitMQ mailing list archive at Nabble.com.
>>>>>>>> > _______________________________________________
>>>>>>>> > rabbitmq-discuss mailing list
>>>>>>>> > [hidden email]<http://user/SendEmail.jtp?type=node&node=32063&i=1>
>>>>>>>> >
>>>>>>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>>>>>> _______________________________________________
>>>>>>>> rabbitmq-discuss mailing list
>>>>>>>> [hidden email] <http://user/SendEmail.jtp?type=node&node=32063&i=2>
>>>>>>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>>  If you reply to this email, your message will be added to the
>>>>>>>> discussion below:
>>>>>>>>
>>>>>>>> http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32063.html
>>>>>>>>  To unsubscribe from Lower delivery rate than publish rate - why?, click
>>>>>>>> here.
>>>>>>>> NAML<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> ------------------------------
>>>> View this message in context: Re: Lower delivery rate than publish
>>>> rate - why?<http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32089.html>
>>>>  Sent from the RabbitMQ mailing list archive<http://rabbitmq.1065348.n5.nabble.com/>at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> rabbitmq-discuss mailing list
>>>> [hidden email] <http://user/SendEmail.jtp?type=node&node=32095&i=1>
>>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>>
>>>>
>>>
>>> _______________________________________________
>>> rabbitmq-discuss mailing list
>>> [hidden email] <http://user/SendEmail.jtp?type=node&node=32095&i=2>
>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>
>>>
>>> ------------------------------
>>>  If you reply to this email, your message will be added to the
>>> discussion below:
>>>
>>> http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32095.html
>>>  To unsubscribe from Lower delivery rate than publish rate - why?, click
>>> here.
>>> NAML<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>>
>
> ------------------------------
> View this message in context: Re: Lower delivery rate than publish rate -
> why?<http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32259.html>
> Sent from the RabbitMQ mailing list archive<http://rabbitmq.1065348.n5.nabble.com/>at Nabble.com.
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131221/32bc60a1/attachment.html>