[rabbitmq-discuss] flow control issues

Wed Sep 15 15:11:36 BST 2010

On Tue, Sep 14, 2010 at 09:07, romary.kremer at gmail.com
<romary.kremer at gmail.com> wrote:
>> The flow control was heavily modified between 1.8.1 and 2.0.0. In summary:
>> - 1.8.1 - we have send Channel.flow AMQP message to everyone once
>>  rabbit reached memory limit
>> - 2.0.0 - once we reach memory limit, the connections from which we hear
>>  publishes are stopped temporarily. We stop receiving bytes from tcp
>> sockets.
>>  That 'stop' shouldn't take too long, as data should be swapped out to
>> disk
>>  and memory pressure will drop pretty quickly.
>
> Do you mean that in 2.0.0 the Channel.flow AMQP message is no longer sent to
> the producer that are stopped temporarily ? So that would explain why
>        1) Channel.publish() can be blocking on the client side when the
> broker stop
>        reading from the socket !
>
>        2) FlowListener.handleFlow() is no longer invoked on the registered
> listener when
>        the alarm handler is set or cleared
> Are my deduction wright ?

Yes. You will never hear "FlowListener.handleFlow()" and it may be possible for
channel.publish to block (though I would need to consult the sources
to be sure).

> Do you have any figures to quantify "should, not take too long" ? Are their
> some
> test reports available about that major evolution ?

That's the question we really avoided :) Oh, well. No, we haven't done any
'real' tests, it's only based on our intuition and experience. In most
cases the blocking goes away pretty quickly - after 30 seconds usually,
about two minutes sometimes.

But it is possible to create a very pessimistic environment in which the
memory usage will not drop - and the connection could be stuck for a long time.
(though it's pretty unlikely).

> Sorry If I wasn't clear on the previous post ,we are already in 2.0.0 for
> both broker and
> client library.

Good.

>>> It looks like any listener is called back  when the alarm handler is set
>>> or
>>> cleared, while the producers are still paused / resumed
>>> like their are to be.
>>
>> Interesting. Maybe we have a race there? Or maybe you're blocking
>> the main java client thread? (nothing blocking should be done from
>> the main thread)
>
> I am quite sure I am not blocking the main thread, neither the Connection
> Thread. All
> the message-related logic is in a particular thread (Some kind of
> ProducerGroup
> pool of threads actually).
> Consumer call back are running within the Connection thread if I refer to
> the Javadoc !
>
> The same code using the library version 1.8.1, The callback where invoked
> when
> alarm handler is set or cleared anyway.
>>
>>>> during long running tests, we have encountered strange behaviour due to
>>>> flow control :
>>>>
>>>> The queue depth starts to increase linearly for about 2 hours, these is
>>>> coherent since the message throughput of the single consumer
>>>> is not enough to absorb message ingress. Memory occupation grow faster
>>>> as
>>>> well, until the memory watermark is reached on the broker side.
>>
>> Are you sure your consumer is ACK-ing the messages it received?
>
> The Consumer call back does ACK messages upon reception, one at a time
> (multiple == false).
> Does the basic.ack() method is eligible to be blocked as well as publish()
> upon flow control ?

Well, under current implementation of flow control - yes. As it's whole
tcp/ip connection that gets blocked. It will affect any commands, including
basic.ack.

What we usually propose is to use different tcp/ip connection for receiving
and different for publishing. On memory pressure we only block the publishers.
Using separate connection only for receiving you may be sure it will
never be blocked.

>>>> From that point, the producers are indeed paused, as flow control
>>>> request
>>>> has been issued by the broker, but the consumer seems to be blocked
>>>> as well. The queue level is flatten at its top value until the end of
>>>> the
>>>> test, even when memory occupation lowered under the threshold.
>>
>> That's how 1.8.1 behaves. In 2.0.0 we introduced swapping out big queues
>> to disk, so the memory usage shouldn't be dependent on a queue size.
>
> Good new, because we had identified 2 scenarios in wich memory-based channel
> flow
> was triggered :
>
>        - the use of SSL
>        - the use of larger message (4kb, same ingress)
> Now I hope that the message size will not be that much determinant for flow
> controll,as soon
> as consumers are able to handle these message regularly.
>
>>
>>>> By registering the FlowListener callback, we have noticed that not all
>>>> of
>>>> the producers are notified all the time the alarm handler is set.
>>>> Does this mean that the broker applies some heuristic to try not to
>>>> block
>>>> every body every time ?
>>>> Or does it mean that some of the channels have been somehow blacklisted
>>>> by
>>>> the broker ?
>>
>> No, in 1.8.1 broker should send 'channel.flow' to all the channels.
>
> Strange so, their must be some thing very weird.
>>
>>>> Could anybody explain how the blocking of consumer is assumed to be
>>>> implemented ?
>>
>> The best description is probably here:
>>  http://www.rabbitmq.com/extensions.html#memsup
>>
>> But it covers 2.0.0. I'd suggest an upgrade to 2.0.0 and monitoring
>> not only queue size but also number of unacknowledged messages
>> ('Msg unack' in status plugin). This number should be near zero.
>>
> We are already with 2.0.0.
> Where can I find some doc about the Status plugin anyway ?

I'm afraid the old blog post is the only source:
http://www.lshift.net/blog/2009/11/30/introducing-rabbitmq-status-plugin