[rabbitmq-discuss] Clarification on semantics of publisher confirms

Mon Jan 16 22:12:21 GMT 2012

Hi, while using the Federation plugin with publisher confirms during some
load tests I noticed a behavior I wasn't expecting after the link between
two brokers with a federated exchange went down:

   - The link stayed down for several hours, and around 100k messages
   accumulated on the upstream broker
   - The upstream kept publishing messages at a rate of 10/s during the
   network failure
   - The downstream had 5 channels with consumers consuming each from a
   queue
   - Every queue/exchange is durable messages are persistent and autoack is
   off on the clients
   - Using the default unbounded prefetch thresholds both for the
   federation and the client channels

Once the connection has restored I noticed several things:

   - the upstream started delivering messages to the downstream, apparently
   overflowing it since its CPU stayed at 100% for several minutes
   - none of the clients connected to the downstream received anything for
   quite some time, not sure when they exactly started receiving messages
   - the UI interface kept showing lots of undelivered and unconfirmed
   messages on the federation outbound queue

After some time, around two hours, the upstream broker completed delivering
all the messages to the downstream and the downstream confirmed all of
them. The clients are currently still catching up with the backlog.
Now any insight in what RabbitMQ was actually doing during this time is
appreciated, but I am specifically interested in how publisher confirms
behave in general. From the docs:

Persistent messages are confirmed when all queues have either delivered the
message and received an acknowledgement (if required), or persisted the
message

What is not clear to me is whether there is a chance for one or more slow
consumers, as in this case, to slow down the entire federation due to the
downstream broker waiting for their acknowledge for delivered messages,
which they are not able to give soon as they are still trying to catch up
with the backlog. So if the federation uses publisher confirms and the
downstream is not acking messages to the upstream because the clients have
not all acknowledge them, then also the upstream will be slowed down and
its outbound queue not emptied as long as consumers on the downstream ack
their messages. If this is the case I would think it is a bit weird for
slow consumers on a broker to also affect what happens on another broker.

When and how does the broker decide whether to confirm messages because
they were "delivered and acked" or "persisted"? I would rather prefer it
did it when persisting them, rather than when delivering them to clients
which cannot acknowledge them in time.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120116/8cf2f587/attachment.htm>