[rabbitmq-discuss] Extremely uneven distribution of latencies under high load
ekirpichov at gmail.com
Mon Aug 1 12:27:21 BST 2011
Forgot to say, I'm on rabbitmq 2.5.1, on Windows, Erlang R14B02.
2011/8/1 Eugene Kirpichov <ekirpichov at gmail.com>:
> I've got a large cluster (several hundred machines) "load-testing" a
> smaller rabbitmq cluster (4 machines - 1 on one rack, 3 on another, 8
> queues) in a ~balanced and partitioned fashion (probably I don't even
> need an actual rmq cluster).
> I'm publishing about 6000 messages/s, each about 10kb in size, in
> roundrobin to these 4 nodes.
> Messages are persistent and durable; I have publisher confirms turned
> on. Autoack is turned off; I have a cycle of "get, process, publish
> result, ack". Each message takes ~800ms to process.
> I'm experiencing things like "rabbitmq not giving a message to a
> consumer for several minutes", i.e. a large portion of my cluster is
> idle waiting for messages.
> RabbitMQ is also rather frequently dropping connections but I'm reconnecting.
> The distribution of message waiting times is like "75% < 50ms, 90% <
> 1s, 99% < 60s, 100% < 180s".
> Some consumers are served very well (a constant stream of messages),
> some wait for minutes. Some have periods of both kinds.
> The ones that are served really well don't experience reconnects,
> though the poor ones also have only a few reconnects per minute at
> I also graphed the publish confirmations from RabbitMQ (actually the
> number of unack'd messages remaining) and I'm seeing that most of the
> brokers keep this number under a couple of thousand for all queues,
> however one other broker (the one on a solitary rack) has it grow
> seemingly without bound - to ~25k and more, again on all queues.
> 1) Is this expected behavior?
> 2) How can this behavior be explained? Are there any
> performance-related quirks in clustered RabbitMQ?
> 3) Is it likely that I'll be able to get rid of this behavior and make
> it at least more predictable by e.g. reconnecting if I don't receive a
> message for, say, 15s?
> Eugene Kirpichov
> Principal Engineer, Mirantis Inc. http://www.mirantis.com/
> Editor, http://fprog.ru/
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
More information about the rabbitmq-discuss