[rabbitmq-discuss] Lower delivery rate than publish rate - why?
MikeTempleman
mike at meshfire.com
Sat Dec 14 02:15:41 GMT 2013
I realized that was a bad interpretation. Sorry. The exchange is just
successfully routing all the messages to the target queues.
After reading a number of posts and this blog entry (
http://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/) I wonder if
the issue is that each message is ack'd. It seemed that the issue occurred
when I had a large backlog in the queues. When Rabbit is empty, performance
is fine. When the consumers tried to run at much higher speeds, we
encountered this cycling.
We have run a brief test with no-ack, not on production, and the
performance is excellent even under load. But this is not a viable solution
(appservers crash, autoscaling shuts servers down that have prefetched
messages and are still connected to rabbit) without a full redesign.
Assuming each queue is only one thread (I assume it handles both receipt,
delivery, and ack cleanup) then I can understand what might happen when the
consumers generate ~500 acks/s while new messages are coming in at a low
50-100/s rate on a specific queue. I will move out some events that tend to
generate peaks into their own queue and accept that queue processing more
slowly. As for separating the real worker queue, I suppose I could create 2
or so static queues to divide the load up.
So what I think I can do is:
1. bump the default tcp buffer from 128KB to around 10MB. The added
buffering may help a little
2. see if I can find out how to set the multiple ack flag. We are using
Grails so maybe that is just creating a custom bean. I don't know
3. create a couple of queues for lower-priority events specifically events
chosen to be less time critical.
4. if all that doesn't work then probably create 4 queues for the high
priority events, randomly publish to those queues, and put consumers for
each queue.
5. Also, upgrade the server to the latest version.
Mike Templeman
--
*Mike Templeman*
*Head of Development*
T: @talkingfrog1950 <http://twitter.com/missdestructo>
T: @Meshfire <http://twitter.com/meshfire>
On Fri, Dec 13, 2013 at 1:42 PM, Mike Templeman <mike at meshfire.com> wrote:
> I noticed something else very odd.
>
> Currently, one queue has 43,000 messages backed up in its queue. But when
> I look at the exchange (there is only one exchange) I see that the message
> rate in exactly matches the message rate out.
>
> With such a huge backlog, why would that be? I would have thought that the
> consumers (there are 16 total distributed across 4 systems for that queue
> with a prefetch of 100) would run at a much higher steady state.
>
> This exchange also seems to cycle regularly. It appears to run from a low
> of around 60/s in and out to 500+ a second in and out.
>
> --
>
> *Mike Templeman*
> *Head of Development*
>
> T: @talkingfrog1950 <http://twitter.com/missdestructo>
> T: @Meshfire <http://twitter.com/meshfire>
>
>
>
> On Fri, Dec 13, 2013 at 10:40 AM, Mike Templeman <mike at meshfire.com>wrote:
>
>> Also, from observing the Connections screen on the web UI shows that no
>> flow control has been recently turned on for any of the four current
>> connections (four app servers).
>>
>> --
>>
>> *Mike Templeman *
>> *Head of Development*
>>
>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>> T: @Meshfire <http://twitter.com/meshfire>
>>
>>
>>
>> On Fri, Dec 13, 2013 at 10:17 AM, Mike Templeman <mike at meshfire.com>wrote:
>>
>>> Hi Alvaro
>>>
>>> I would be more than happy to provide logs. But all they have in them is
>>> connection and shutdown information. Nothing more. I have just enabled
>>> tracing on the vhost and will send the logs shortly. We encounter this
>>> issue when under load every day now.
>>>
>>> Let me tell you our architecture and deployment:
>>>
>>> rabbitMQ:
>>>
>>> - m1.large ec2 instance. Version: RabbitMQ 3.1.5, Erlang R14B04
>>> - 23 queues (transaction and direct)
>>> - 3 exchanges used, two fanout and one topic exchange
>>> - topic exchange
>>> - Topic exchange overview is attached.
>>> - 46 total channels.
>>>
>>>
>>> AppServers
>>>
>>> - m1.large tomcat servers running grails application
>>> - 2-7 servers at any one time.
>>> - Consume + publish
>>> - On busy queues, each server has 16 consumers with prefetch at 100
>>> - message sizes on busy queues are ~4KB.
>>> - publishing rates on busiest queue ranges from 16/s to >100/s. (We
>>> need to be able to support 1000/s).
>>>
>>>
>>> Each AppServer connects to a sharded mongodb cluster of 3 shards. Our
>>> first suspicion was that something in Mongo or AWS was causing the periodic
>>> delay but AWS techs looked into our volume use and said we were only use
>>> 25% of available bandwidth.
>>>
>>> At this moment, we have a modest publish rate (~50-60/s) but a backlog
>>> of 50,000 messages for the queue "user". You can see a 10 minute snapshot
>>> of the queue and see the cycling.
>>>
>>> I turned on tracing but the results don't seem to becoming into the log.
>>> Is there another way to enable reporting of flow control?
>>>
>>> Mike Templeman
>>>
>>>
>>> --
>>>
>>> *Mike Templeman*
>>> *Head of Development*
>>>
>>> T: @talkingfrog1950 <http://twitter.com/missdestructo>
>>> T: @Meshfire <http://twitter.com/meshfire>
>>>
>>>
>>>
>>> On Fri, Dec 13, 2013 at 6:03 AM, Alvaro Videla-2 [via RabbitMQ] <
>>> ml-node+s1065348n32063h90 at n5.nabble.com> wrote:
>>>
>>>> Mike,
>>>>
>>>> Would you be able to provide information more information to help us
>>>> debug the problem?
>>>>
>>>> Tim (from the rabbitmq team) requested more info in order to try to
>>>> find answers for this.
>>>>
>>>> For example, when consumption drops to zero, are there any logs on the
>>>> rabbitmq server that might tell of a flow control mechanism being
>>>> activated?
>>>>
>>>> Regards,
>>>>
>>>> Alvaro
>>>>
>>>>
>>>> On Fri, Dec 13, 2013 at 2:19 AM, MikeTempleman <[hidden email]<http://user/SendEmail.jtp?type=node&node=32063&i=0>>
>>>> wrote:
>>>>
>>>> > Tyson
>>>> >
>>>> > Did you ever find an answer to this question? We are encountering
>>>> virtually
>>>> > the exact same problem.
>>>> >
>>>> > We have a variable number of servers setup as producers and consumers
>>>> and
>>>> > see our throughput drop to zero on a periodic basis. This is most
>>>> severe
>>>> > when there are a few hundred thousand messages on rabbit.
>>>> >
>>>> > Did you just drop Rabbit? Ours is running on an m1.large instance
>>>> with RAID0
>>>> > ephemeral drives, so size and performance of the disk subsystem is
>>>> not an
>>>> > issue (we are still in beta). We have spent untold hours tuning our
>>>> sharded
>>>> > mongodb subsystem only to find out that it is only being 25% utilized
>>>> (at
>>>> > least it will be blazing fast if we ever figure this out).
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > View this message in context:
>>>> http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32040.html
>>>> > Sent from the RabbitMQ mailing list archive at Nabble.com.
>>>> > _______________________________________________
>>>> > rabbitmq-discuss mailing list
>>>> > [hidden email] <http://user/SendEmail.jtp?type=node&node=32063&i=1>
>>>> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>> _______________________________________________
>>>> rabbitmq-discuss mailing list
>>>> [hidden email] <http://user/SendEmail.jtp?type=node&node=32063&i=2>
>>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>>
>>>>
>>>> ------------------------------
>>>> If you reply to this email, your message will be added to the
>>>> discussion below:
>>>>
>>>> http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32063.html
>>>> To unsubscribe from Lower delivery rate than publish rate - why?, click
>>>> here<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=29247&code=bWlrZUBtZXNoZmlyZS5jb218MjkyNDd8MTYzNTUyMDM4MA==>
>>>> .
>>>> NAML<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>
>>>
>>>
>>
>
--
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Lower-delivery-rate-than-publish-rate-why-tp29247p32089.html
Sent from the RabbitMQ mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131213/642882af/attachment.html>
More information about the rabbitmq-discuss
mailing list