[rabbitmq-discuss] flow control issues

romary.kremer at gmail.com romary.kremer at gmail.com
Wed Sep 15 19:11:47 BST 2010


Le 15 sept. 10 à 16:11, Marek Majkowski a écrit :

> On Tue, Sep 14, 2010 at 09:07, romary.kremer at gmail.com
> <romary.kremer at gmail.com> wrote:
>>> The flow control was heavily modified between 1.8.1 and 2.0.0. In  
>>> summary:
>>> - 1.8.1 - we have send Channel.flow AMQP message to everyone once
>>>  rabbit reached memory limit
>>> - 2.0.0 - once we reach memory limit, the connections from which  
>>> we hear
>>>  publishes are stopped temporarily. We stop receiving bytes from tcp
>>> sockets.
>>>  That 'stop' shouldn't take too long, as data should be swapped  
>>> out to
>>> disk
>>>  and memory pressure will drop pretty quickly.
>>
>> Do you mean that in 2.0.0 the Channel.flow AMQP message is no  
>> longer sent to
>> the producer that are stopped temporarily ? So that would explain why
>>        1) Channel.publish() can be blocking on the client side when  
>> the
>> broker stop
>>        reading from the socket !
>>
>>        2) FlowListener.handleFlow() is no longer invoked on the  
>> registered
>> listener when
>>        the alarm handler is set or cleared
>> Are my deduction wright ?
>
> Yes. You will never hear "FlowListener.handleFlow()" and it may be  
> possible for
> channel.publish to block (though I would need to consult the sources
> to be sure).

It seems to me that FlowListener interface is likely to be deprecated  
so, does'nt it ?
Does not really matter for us anyway, cause we where on wrong idea  
using that.
Does this new implementation keep the broker on track for compliance  
with specification then ?

>
>> Do you have any figures to quantify "should, not take too long" ?  
>> Are their
>> some
>> test reports available about that major evolution ?
>
> That's the question we really avoided :) Oh, well. No, we haven't  
> done any
> 'real' tests, it's only based on our intuition and experience. In most
> cases the blocking goes away pretty quickly - after 30 seconds  
> usually,
> about two minutes sometimes.

This would be acceptable for our needs, only if we can somehow  
guarantee that's an upper boundary !
>
> But it is possible to create a very pessimistic environment in which  
> the
> memory usage will not drop - and the connection could be stuck for a  
> long time.
> (though it's pretty unlikely).

... Not that much unlikely, considering my little playing with the  
MultiCastMain sample (see my previous reply about it for details).
I get 100 % times blocked connection.
What would be, based on your knowledge and your intuition, "a very  
pessimistic environment in which the memory usage will not drop" ?

I think that the experimentation I've done on the MultiCastMain is  
maybe a beginning of an answer for that question, although I would
never have considered that a single producer could have such power to  
flood the broker.

>
>> Sorry If I wasn't clear on the previous post ,we are already in  
>> 2.0.0 for
>> both broker and
>> client library.
>
> Good.
>
>>>> It looks like any listener is called back  when the alarm handler  
>>>> is set
>>>> or
>>>> cleared, while the producers are still paused / resumed
>>>> like their are to be.
>>>
>>> Interesting. Maybe we have a race there? Or maybe you're blocking
>>> the main java client thread? (nothing blocking should be done from
>>> the main thread)
>>
>> I am quite sure I am not blocking the main thread, neither the  
>> Connection
>> Thread. All
>> the message-related logic is in a particular thread (Some kind of
>> ProducerGroup
>> pool of threads actually).
>> Consumer call back are running within the Connection thread if I  
>> refer to
>> the Javadoc !
>>
>> The same code using the library version 1.8.1, The callback where  
>> invoked
>> when
>> alarm handler is set or cleared anyway.
>>>
>>>>> during long running tests, we have encountered strange behaviour  
>>>>> due to
>>>>> flow control :
>>>>>
>>>>> The queue depth starts to increase linearly for about 2 hours,  
>>>>> these is
>>>>> coherent since the message throughput of the single consumer
>>>>> is not enough to absorb message ingress. Memory occupation grow  
>>>>> faster
>>>>> as
>>>>> well, until the memory watermark is reached on the broker side.
>>>
>>> Are you sure your consumer is ACK-ing the messages it received?
>>
>> The Consumer call back does ACK messages upon reception, one at a  
>> time
>> (multiple == false).
>> Does the basic.ack() method is eligible to be blocked as well as  
>> publish()
>> upon flow control ?
>
> Well, under current implementation of flow control - yes. As it's  
> whole
> tcp/ip connection that gets blocked. It will affect any commands,  
> including
> basic.ack.
>
> What we usually propose is to use different tcp/ip connection for  
> receiving
> and different for publishing. On memory pressure we only block the  
> publishers.
> Using separate connection only for receiving you may be sure it will
> never be blocked.

Weren't Channel design for that ? In our environment, we have  
(naively ?) considered the use of Channel to
separate the message production from the consumption.
Since we are targeting 10 000 peers doing both production and  
consumption, the fact of multiplying the number of
connections by 2 is not negligible at all, considering scalability.
Moreover, as I reported later on, we use SSL to authenticate the  
broker, and we are still unclear about memory leaks
induce by SSL connections. Doubling the number of connections will not  
be negligible at all considering memory occupation either.
In conclusion, we are not likely to implement our peers using 2  
connections for the same broker.
What would you recommend to us then ? And could you give us a better  
understanding on the use case of channels ?

>
>>>>> From that point, the producers are indeed paused, as flow control
>>>>> request
>>>>> has been issued by the broker, but the consumer seems to be  
>>>>> blocked
>>>>> as well. The queue level is flatten at its top value until the  
>>>>> end of
>>>>> the
>>>>> test, even when memory occupation lowered under the threshold.
>>>
>>> That's how 1.8.1 behaves. In 2.0.0 we introduced swapping out big  
>>> queues
>>> to disk, so the memory usage shouldn't be dependent on a queue size.
>>
>> Good new, because we had identified 2 scenarios in wich memory- 
>> based channel
>> flow
>> was triggered :
>>
>>        - the use of SSL
>>        - the use of larger message (4kb, same ingress)
>> Now I hope that the message size will not be that much determinant  
>> for flow
>> controll,as soon
>> as consumers are able to handle these message regularly.
>>
>>>
>>>>> By registering the FlowListener callback, we have noticed that  
>>>>> not all
>>>>> of
>>>>> the producers are notified all the time the alarm handler is set.
>>>>> Does this mean that the broker applies some heuristic to try not  
>>>>> to
>>>>> block
>>>>> every body every time ?
>>>>> Or does it mean that some of the channels have been somehow  
>>>>> blacklisted
>>>>> by
>>>>> the broker ?
>>>
>>> No, in 1.8.1 broker should send 'channel.flow' to all the channels.
>>
>> Strange so, their must be some thing very weird.
>>>
>>>>> Could anybody explain how the blocking of consumer is assumed to  
>>>>> be
>>>>> implemented ?
>>>
>>> The best description is probably here:
>>>  http://www.rabbitmq.com/extensions.html#memsup
>>>
>>> But it covers 2.0.0. I'd suggest an upgrade to 2.0.0 and monitoring
>>> not only queue size but also number of unacknowledged messages
>>> ('Msg unack' in status plugin). This number should be near zero.
>>>
>> We are already with 2.0.0.
>> Where can I find some doc about the Status plugin anyway ?
>
> I'm afraid the old blog post is the only source:
> http://www.lshift.net/blog/2009/11/30/introducing-rabbitmq-status-plugin

Take it easy, it was really straight forward to install it after all.  
for those who would experiment some issues just
go in /usr/lib/rabbitmq/lib/rabbitmq_server-2.0.0/plugins then get
	-mochiweb-2.0.0.ez
	-rabbitmq-mochiweb-2.0.0.ez
	-rabbit_status-2.0.0.ez
from here, and voila !!

B.R,

Romary.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20100915/58b83b72/attachment-0001.htm>


More information about the rabbitmq-discuss mailing list