[rabbitmq-discuss] Lower delivery rate than publish rate - why?

Sat Dec 21 17:02:16 GMT 2013

We are still seeing the cycling.

I wanted to ask if the round-robing scheduling could be a factor.

For example, if I have 4 servers with 5 channels each on the same topic
queue. Each channel has a prefetch setting of 50. When a channel consumer
becomes blocked (slow db operation, overloaded with other requests, etc.),
does that block delivery of messages to the other channels? If that
delivery to fill the prefetch cache is delayed, is there any impact on the
client (i.e. does it become blocked)?

iow, is there a way for all channels on all servers to become blocked in a
high load situation? Even if their prefetch caches are full?

-- 

*Mike Templeman*
*Head of Development*

T: @talkingfrog1950 <http://twitter.com/missdestructo>
T: @Meshfire <http://twitter.com/meshfire>

On Wed, Dec 18, 2013 at 10:32 AM, Mike Templeman <mike at meshfire.com> wrote:

> Well, multi-ack didn't help very much. We can see some, but not enough to
> matter.
>
> We cannot use auto-ack because consumers (multiple/server) die
> unexpectedly as the app servers are autoscaled. We have not built a fully
> separated service yet (too hard to debug on development machines right
> now). But could Publisher confirms resolve the issue of servers dying with
> n messages in their prefetch buffers?
>
>
>
> --
>
> *Mike Templeman *
> *Head of Development*
>
> T: @talkingfrog1950 <http://twitter.com/missdestructo>
> T: @Meshfire <http://twitter.com/meshfire>
>
>
>
> On Sun, Dec 15, 2013 at 5:02 AM, Alvaro Videla-2 [via RabbitMQ] <
> ml-node+s1065348n32095h82 at n5.nabble.com> wrote:
>
>> Hi Mike,
>>
>> Yes, RabbitMQ queues are designed for fast delivery of messages and for
>> being as empty as possible, as that blog post explains.
>>
>> Another interesting blog post, about consumer strategies and basic.qos
>> settings is this one:
>> http://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/#more-276
>>
>> re multi ack: yes, that might help.
>>
>> Regards,
>>
>> Alvaro
>>
>>
>> On Sat, Dec 14, 2013 at 2:15 AM, MikeTempleman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32095&i=0>
>> > wrote:
>>
>>> I realized that was a bad interpretation. Sorry. The exchange is just
>>> successfully routing all the messages to the target queues.
>>>
>>> After reading a number of posts and this blog entry (
>>> http://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/) I wonder
>>> if the issue is that each message is ack'd. It seemed that the issue
>>> occurred when I had a large backlog in the queues. When Rabbit is empty,
>>> performance is fine. When the consumers tried to run at much higher speeds,
>>> we encountered this cycling.
>>>
>>> We have run a brief test with no-ack, not on production, and the
>>> performance is excellent even under load. But this is not a viable solution
>>> (appservers crash, autoscaling shuts servers down that have prefetched
>>> messages and are still connected to rabbit) without a full redesign.
>>>
>>> Assuming each queue is only one thread (I assume it handles both
>>> receipt, delivery, and ack cleanup) then I can understand what might happen
>>> when the consumers generate ~500 acks/s while new messages are coming in at
>>> a low 50-100/s rate on a specific queue. I will move out some events that
>>> tend to generate peaks into their own queue and accept that queue
>>> processing more slowly. As for separating the real worker queue, I suppose
>>> I could create 2 or so static queues to divide the load up.
>>>
>>> So what I think I can do is:
>>> 1. bump the default tcp buffer from 128KB to around 10MB. The added
>>> buffering may help a little
>>> 2. see if I can find out how to set the multiple ack flag. We are using
>>> Grails so maybe that is just creating a custom bean. I don't know
>>> 3. create a couple of queues for lower-priority events specifically
>>> events chosen to be less time critical.
>>> 4. if all that doesn't work then probably create 4 queues for the high
>>> priority events, randomly publish to those queues, and put consumers for
>>> each queue.
>>> 5. Also, upgrade the server to the latest version.
>>>
>>> Mike Templeman
>>>
>>> --
>>>
>>> *Mike Templeman*
>>> *Head of Development*
>>>
>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>> T: @Meshfire <http://twitter.com/meshfire>
>>>
>>>
>>>
>>> On Fri, Dec 13, 2013 at 1:42 PM, Mike Templeman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32089&i=0>
>>> > wrote:
>>>
>>>> I noticed something else very odd.
>>>>
>>>> Currently, one queue has 43,000 messages backed up in its queue. But
>>>> when I look at the exchange (there is only one exchange) I see that the
>>>> message rate in exactly matches the message rate out.
>>>>
>>>> With such a huge backlog, why would that be? I would have thought that
>>>> the consumers (there are 16 total distributed across 4 systems for that
>>>> queue with a prefetch of 100) would run at a much higher steady state.
>>>>
>>>> This exchange also seems to cycle regularly. It appears to run from a
>>>> low of around 60/s in and out to 500+ a second in and out.
>>>>
>>>>  --
>>>>
>>>> *Mike Templeman*
>>>> *Head of Development*
>>>>
>>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>>> T: @Meshfire <http://twitter.com/meshfire>
>>>>
>>>>
>>>>
>>>> On Fri, Dec 13, 2013 at 10:40 AM, Mike Templeman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32089&i=1>
>>>> > wrote:
>>>>
>>>>> Also, from observing the Connections screen on the web UI shows that
>>>>> no flow control has been recently turned on for any of the four current
>>>>> connections (four app servers).
>>>>>
>>>>> --
>>>>>
>>>>> *Mike Templeman *
>>>>> *Head of Development*
>>>>>
>>>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>>>> T: @Meshfire <http://twitter.com/meshfire>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Dec 13, 2013 at 10:17 AM, Mike Templeman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32089&i=2>
>>>>> > wrote:
>>>>>
>>>>>> Hi Alvaro
>>>>>>
>>>>>> I would be more than happy to provide logs. But all they have in them
>>>>>> is connection and shutdown information. Nothing more. I have just enabled
>>>>>> tracing on the vhost and will send the logs shortly. We encounter this
>>>>>> issue when under load every day now.
>>>>>>
>>>>>> Let me tell you our architecture and deployment:
>>>>>>
>>>>>> rabbitMQ:
>>>>>>
>>>>>>    - m1.large ec2 instance. Version: RabbitMQ 3.1.5,  Erlang R14B04
>>>>>>    - 23 queues (transaction and direct)
>>>>>>    - 3 exchanges used, two fanout and one topic exchange
>>>>>>    - topic exchange
>>>>>>    - Topic exchange overview is attached.
>>>>>>    - 46 total channels.
>>>>>>
>>>>>>
>>>>>> AppServers
>>>>>>
>>>>>>    - m1.large tomcat servers running grails application
>>>>>>    - 2-7 servers at any one time.
>>>>>>    - Consume + publish
>>>>>>    - On busy queues, each server has 16 consumers with prefetch at
>>>>>>    100
>>>>>>    - message sizes on busy queues are ~4KB.
>>>>>>    - publishing rates on busiest queue ranges from 16/s to >100/s.
>>>>>>    (We need to be able to support 1000/s).
>>>>>>
>>>>>>
>>>>>> Each AppServer connects to a sharded mongodb cluster of 3 shards. Our
>>>>>> first suspicion was that something in Mongo or AWS was causing the periodic
>>>>>> delay but AWS techs looked into our volume use and said we were only use
>>>>>> 25% of available bandwidth.
>>>>>>
>>>>>> At this moment, we have a modest publish rate (~50-60/s) but a
>>>>>> backlog of 50,000 messages for the queue "user". You can see a 10 minute
>>>>>> snapshot of the queue and see the cycling.
>>>>>>
>>>>>> I turned on tracing but the results don't seem to becoming into the
>>>>>> log. Is there another way to enable reporting of flow control?
>>>>>>
>>>>>> Mike Templeman
>>>>>>
>>>>>>
>>>>>>  --
>>>>>>
>>>>>> *Mike Templeman*
>>>>>> *Head of Development*
>>>>>>
>>>>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>>>>> T: @Meshfire <http://twitter.com/meshfire>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 13, 2013 at 6:03 AM, Alvaro Videla-2 [via RabbitMQ] <[hidden
>>>>>> email] <http://user/SendEmail.jtp?type=node&node=32089&i=3>> wrote:
>>>>>>
>>>>>>> Mike,
>>>>>>>
>>>>>>> Would you be able to provide information more information to help us
>>>>>>> debug the problem?
>>>>>>>
>>>>>>> Tim (from the rabbitmq team) requested more info in order to try to
>>>>>>> find answers for this.
>>>>>>>
>>>>>>> For example, when consumption drops to zero, are there any logs on
>>>>>>> the
>>>>>>> rabbitmq server that might tell of a flow control mechanism being
>>>>>>> activated?
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Alvaro
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 13, 2013 at 2:19 AM, MikeTempleman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32063&i=0>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Tyson
>>>>>>> >
>>>>>>> > Did you ever find an answer to this question? We are encountering
>>>>>>> virtually
>>>>>>> > the exact same problem.
>>>>>>> >
>>>>>>> > We have a variable number of servers setup as producers and
>>>>>>> consumers and
>>>>>>> > see our throughput drop to zero on a periodic basis. This is most
>>>>>>> severe
>>>>>>> > when there are a few hundred thousand messages on rabbit.
>>>>>>> >
>>>>>>> > Did you just drop Rabbit? Ours is running on an m1.large instance
>>>>>>> with RAID0
>>>>>>> > ephemeral drives, so size and performance of the disk subsystem is
>>>>>>> not an
>>>>>>> > issue (we are still in beta). We have spent untold hours tuning
>>>>>>> our sharded
>>>>>>> > mongodb subsystem only to find out that it is only being 25%
>>>>>>> utilized (at
>>>>>>> > least it will be blazing fast if we ever figure this out).
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > View this message in context:
>>>>>>> http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32040.html
>>>>>>> > Sent from the RabbitMQ mailing list archive at Nabble.com.
>>>>>>> > _______________________________________________
>>>>>>> > rabbitmq-discuss mailing list
>>>>>>> > [hidden email]<http://user/SendEmail.jtp?type=node&node=32063&i=1>
>>>>>>> >
>>>>>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>>>>> _______________________________________________
>>>>>>> rabbitmq-discuss mailing list
>>>>>>> [hidden email] <http://user/SendEmail.jtp?type=node&node=32063&i=2>
>>>>>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>  If you reply to this email, your message will be added to the
>>>>>>> discussion below:
>>>>>>>
>>>>>>> http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32063.html
>>>>>>>  To unsubscribe from Lower delivery rate than publish rate - why?, click
>>>>>>> here.
>>>>>>> NAML<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> ------------------------------
>>> View this message in context: Re: Lower delivery rate than publish rate
>>> - why?<http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32089.html>
>>>  Sent from the RabbitMQ mailing list archive<http://rabbitmq.1065348.n5.nabble.com/>at Nabble.com.
>>>
>>> _______________________________________________
>>> rabbitmq-discuss mailing list
>>> [hidden email] <http://user/SendEmail.jtp?type=node&node=32095&i=1>
>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>
>>>
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> [hidden email] <http://user/SendEmail.jtp?type=node&node=32095&i=2>
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32095.html
>>  To unsubscribe from Lower delivery rate than publish rate - why?, click
>> here<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=29247&code=bWlrZUBtZXNoZmlyZS5jb218MjkyNDd8MTYzNTUyMDM4MA==>
>> .
>> NAML<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>

--
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32259.html
Sent from the RabbitMQ mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131221/2981b64b/attachment.html>