[rabbitmq-discuss] flow control issues

Tue Sep 14 09:07:20 BST 2010

Thanks Marek for your reply.

Le 10 sept. 10 à 15:06, Marek Majkowski a écrit :

> Romary,
>
> First, thank a lot for the feedback. For example the information about
> SSL memory
> usage is indeed very interesting. (if that is a big problem to you,
> you may always
> fall back to the old technique of using stunnel)

Stunnel is not considered for now, We would rather ear your opinion  
and knowledge
of potential memory leak due to the use SSL (maybe from the Erlang  
stack itself).
It seems that memory is still growing even when the broker is running  
no activity
but connections. We ran a test in which we just open 10 000  
connections and do
noting for 2 hours but observing the memory occupation on the broker  
side to confirm it.

> The flow control was heavily modified between 1.8.1 and 2.0.0. In  
> summary:
> - 1.8.1 - we have send Channel.flow AMQP message to everyone once
>   rabbit reached memory limit
> - 2.0.0 - once we reach memory limit, the connections from which we  
> hear
>   publishes are stopped temporarily. We stop receiving bytes from  
> tcp sockets.
>   That 'stop' shouldn't take too long, as data should be swapped out  
> to disk
>   and memory pressure will drop pretty quickly.

Do you mean that in 2.0.0 the Channel.flow AMQP message is no longer  
sent to
the producer that are stopped temporarily ? So that would explain why
	1) Channel.publish() can be blocking on the client side when the  
broker stop
	reading from the socket !

	2) FlowListener.handleFlow() is no longer invoked on the registered  
listener when
	the alarm handler is set or cleared
Are my deduction wright ?

Do you have any figures to quantify "should, not take too long" ? Are  
their some
test reports available about that major evolution ?

>
>
> On Wed, Sep 8, 2010 at 11:49, Romary Kremer  
> <romary.kremer at gmail.com> wrote:
>> I've started playing a bit with the latest release 2.0.0 and I m  
>> affraid
>> that it looks like their are some regression or at least some  
>> semantic
>> updates.
>
> It's best if you upgraded both server and client library. Do you  
> have any
> particular problems? A lot was changed in 2.0.0 but we think it's  
> fully
> functional. If you found something that blocks you to migrate, you
> could report a bug.

Sorry If I wasn't clear on the previous post ,we are already in 2.0.0  
for both broker and
client library.

>
>> It looks like any listener is called back  when the alarm handler  
>> is set or
>> cleared, while the producers are still paused / resumed
>> like their are to be.
>
> Interesting. Maybe we have a race there? Or maybe you're blocking
> the main java client thread? (nothing blocking should be done from
> the main thread)

I am quite sure I am not blocking the main thread, neither the  
Connection Thread. All
the message-related logic is in a particular thread (Some kind of  
ProducerGroup
pool of threads actually).
Consumer call back are running within the Connection thread if I refer  
to the Javadoc !

The same code using the library version 1.8.1, The callback where  
invoked when
alarm handler is set or cleared anyway.

>
>>> during long running tests, we have encountered strange behaviour  
>>> due to
>>> flow control :
>>>
>>> The queue depth starts to increase linearly for about 2 hours,  
>>> these is
>>> coherent since the message throughput of the single consumer
>>> is not enough to absorb message ingress. Memory occupation grow  
>>> faster as
>>> well, until the memory watermark is reached on the broker side.
>
> Are you sure your consumer is ACK-ing the messages it received?

The Consumer call back does ACK messages upon reception, one at a time  
(multiple == false).
Does the basic.ack() method is eligible to be blocked as well as  
publish() upon flow control ?

>
>>> From that point, the producers are indeed paused, as flow control  
>>> request
>>> has been issued by the broker, but the consumer seems to be blocked
>>> as well. The queue level is flatten at its top value until the end  
>>> of the
>>> test, even when memory occupation lowered under the threshold.
>
> That's how 1.8.1 behaves. In 2.0.0 we introduced swapping out big  
> queues
> to disk, so the memory usage shouldn't be dependent on a queue size.

Good new, because we had identified 2 scenarios in wich memory-based  
channel flow
was triggered :

	- the use of SSL
	- the use of larger message (4kb, same ingress)
Now I hope that the message size will not be that much determinant for  
flow controll,as soon
as consumers are able to handle these message regularly.

>
>>> By registering the FlowListener callback, we have noticed that not  
>>> all of
>>> the producers are notified all the time the alarm handler is set.
>>> Does this mean that the broker applies some heuristic to try not  
>>> to block
>>> every body every time ?
>>> Or does it mean that some of the channels have been somehow  
>>> blacklisted by
>>> the broker ?
>
> No, in 1.8.1 broker should send 'channel.flow' to all the channels.
Strange so, their must be some thing very weird.
>
>>> Could anybody explain how the blocking of consumer is assumed to be
>>> implemented ?
>
> The best description is probably here:
>  http://www.rabbitmq.com/extensions.html#memsup
>
> But it covers 2.0.0. I'd suggest an upgrade to 2.0.0 and monitoring
> not only queue size but also number of unacknowledged messages
> ('Msg unack' in status plugin). This number should be near zero.
>
We are already with 2.0.0.
Where can I find some doc about the Status plugin anyway ?

Cheers, Romary.
> Cheers,
>  Marek Majkowski