[rabbitmq-discuss] Consumer setup performance / starvation issue

flzz nominal at gmail.com
Mon Jun 20 19:25:09 BST 2011


Greetings,  I'm in the process of vetting RabbitMQ for a high
throughput
event bus.  In testing various durability cases I've observed some
behavior
that I can't explain.  I'll start by elaborating on the setup.

First,  I'm using a very simple direct exchange producer/consumer
environment
ala http://www.rabbitmq.com/tutorials/tutorial-two-python.html

I have a single broker and a single worker machine, both physical
boxes.
Both are identical hardware.  I am using python and kombu for the
producers/consumers.

To keep things as simple as possible for testing, I'm publishing a
Hello World!
message as fast as possible into the queue (I see upwards of 20k
messages/sec being published with 4 producers).  The publishing works
well
enough.

There are a few ways to create (what I think is) the problematic
condition,
but the common thread with all of them is queued messages (on durable
or
non-durable queue).

So, the steps to achieve this are as follows, this simulates a very
controlled condition of worker failure.  The problem arises when
workers/consumers begin to establish a new connection after all
workers have
failed.

1. Start a single producer, let it queue 500,000 messages or so, then
stop it
2. Start up 16 workers to consume the queue (workers have a prefetch
of 1)

The first 7 to 8 workers will successfully setup and start processing
messages, the other 8 to 9 workers will sit in a wait state while
trying to
set themselves up as consumers.  Only after all the messages in the
queue are
processed will the rest of the workers successfully establish a
connection
as a consumer.

Here is a log snippet generated by the benchmarking that is in the
simple
worker I'm using.  This is 8 different workers that are finally able
to
successfully establish themselves as consumers after the queue was
fully
processed.

DEBUG:amqp_worker:create_consumer took 53064.931 ms
DEBUG:amqp_worker:create_consumer took 52985.335 ms
DEBUG:amqp_worker:create_consumer took 51774.898 ms
DEBUG:amqp_worker:create_consumer took 54275.732 ms
DEBUG:amqp_worker:set_qos took 0.437 ms
DEBUG:amqp_worker:register_callback took 0.011 ms
DEBUG:amqp_worker:create_consumer took 55493.148 ms
DEBUG:amqp_worker:create_consumer took 54205.868 ms
DEBUG:amqp_worker:create_consumer took 51856.420 ms
DEBUG:amqp_worker:create_consumer took 55415.933 ms

To be clear, create_consumer is effectively the same as:

    channel.basic_consume(callback, queue='task_queue')

At this point I should mention that I can't find any type of
documentation that
would indicate this as being normal.

Now into a few more details about what I've tried to eliminate as
possible
causes.  I've used Kombu with pyamqplib transport,  pika directly
without
Kombu, and pyamqplib directly without kombu, Rabbit version 2.4.1 and
2.5.0.
All exhibited the same behavior as described above.  It really seems
to be
the broker itself.  I've also tried various config tweaks for erlang
and
rabbit,  ie, setting the async thread pool to 0, disabling
kernel polling, disabled all management plugins/stats collection,
setting
consumer prefetch to 0 as well as 10 etc, all to no avail.

Another item worth mentioning, is that rabbitmqctl list_channels will
exhibit
the same wait issue when workers are processing the queued messages.
I can provide further details and all the code I'm using for
producers/consumers if needed, but this issue can be duplicated with
the
examples from http://www.rabbitmq.com/tutorials/tutorial-two-python.html

I really hope I'm missing something simple, any insights would be much
appreciated!

Here are some additional details about broker os, status & evironment

Ubuntu lucid 10.04.2

Application environment of node rabbit at broker2 ...
[{auth_backends,[rabbit_auth_backend_internal]},
 {auth_mechanisms,['PLAIN','AMQPLAIN']},
 {backing_queue_module,rabbit_variable_queue},
 {cluster_nodes,[]},
 {collect_statistics,coarse},
 {default_permissions,[<<".*">>,<<".*">>,<<".*">>]},
 {default_user,<<"guest">>},
 {default_user_is_admin,true},
 {default_vhost,<<"/">>},
 {delegate_count,16},
 {frame_max,131072},
 {included_applications,[]},
 {msg_store_file_size_limit,16777216},
 {msg_store_index_module,rabbit_msg_store_ets_index},
 {persister_hibernate_after,10000},
 {persister_max_wrap_entries,500},
 {queue_index_max_journal_entries,262144},
 {server_properties,[]},
 {ssl_listeners,[]},
 {ssl_options,[]},
 {tcp_listen_options,[binary,
                      {packet,raw},
                      {reuseaddr,true},
                      {backlog,128},
                      {nodelay,true},
                      {exit_on_close,false}]},
 {tcp_listeners,[{"0.0.0.0",5672}]},
 {trace_vhosts,[]},
 {vm_memory_high_watermark,0.8}]
...done.

Status of node rabbit at broker2 ...
[{pid,15744},
 {running_applications,
     [{rabbitmq_management,"RabbitMQ Management Console","2.5.0"},
      {webmachine,"webmachine","1.7.0-rmq2.5.0-hg0c4b60a"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","2.5.0"},
      {amqp_client,"RabbitMQ AMQP Client","2.5.0"},
      {rabbit,"RabbitMQ","2.5.0"},
      {os_mon,"CPO  CXC 138 46","2.2.5"},
      {sasl,"SASL  CXC 138 11","2.1.9.2"},
      {rabbitmq_mochiweb,"RabbitMQ Mochiweb Embedding","2.5.0"},
      {mochiweb,"MochiMedia Web Server","1.3-rmq2.5.0-git9a53dbd"},
      {inets,"INETS  CXC 138 49","5.4"},
      {mnesia,"MNESIA  CXC 138 12","4.4.14"},
      {stdlib,"ERTS  CXC 138 10","1.17"},
      {kernel,"ERTS  CXC 138 10","2.14"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang R14A (erts-5.8) [source] [64-bit] [smp:16:16] [rq:16]
[async-threads:30] [kernel-poll:true]\n"},
 {memory,
     [{total,57446128},
      {processes,12534864},
      {processes_used,12528176},
      {system,44911264},
      {atom,1317185},
      {atom_used,1283859},
      {binary,110448},
      {code,15152068},
      {ets,8861792}]}]
...done.

Cheers,
Etrik


More information about the rabbitmq-discuss mailing list