[rabbitmq-discuss] Connection blocked by "flow" for more than 600 seconds

Simon MacMullen simon at rabbitmq.com
Fri Oct 11 16:40:10 BST 2013


Ouch. Didn't realise it was that bad. So should we disrecommend R16B01 
in general?

Cheers, Simon

On 11/10/2013 12:53PM, Jesper Louis Andersen wrote:
> You are using R16B01. Upgrade to R16B02 at once! R16B01 has a bug which
> means that async worker processes are not getting used correctly (too
> many processes are hashed to the wrong async worker, more or less). This
> severely hits disk I/O on a busy machine.
>
> There are other problems with R16B01. It should be avoided if possible.
>
>
> On Fri, Oct 11, 2013 at 1:29 PM, Simon MacMullen <simon at rabbitmq.com
> <mailto:simon at rabbitmq.com>> wrote:
>
>     OK, so your screenshot shows 750 queues and 753 connections. Was
>     this from the same time as you had ~10k file descriptors in use?
>     That sounds wrong.
>
>     I think your publishing connections are going into flow control
>     because there's a squeeze on file descriptors, which is causing the
>     queues to have to share a small number of file descriptors between
>     them - thus slowing them down.
>
>     If you do have far more file descriptors in use than queues +
>     connections, do you have any exotic plugins in use? What does "lsof
>     -lnp <pid of server process>" say?
>
>     Cheers, Simon
>
>
>     On 11/10/2013 3:22AM, Choo wrote:
>
>         Hi Simon,
>
>         As memory is plenty, I found that file descriptors hit the
>         default limit,
>         so, I bumped the limit up to 5,120, and finally to 10,240 on
>         each nodes.  It
>         turned out that the file descriptors also touched the limit
>         (around 10,086),
>         and things started to go downhill.
>
>         <http://rabbitmq.1065348.n5.__nabble.com/file/n30402/__ScreenShot.jpg
>         <http://rabbitmq.1065348.n5.nabble.com/file/n30402/ScreenShot.jpg>>
>
>         I started processes in reverse order, by starting
>         subscriber-side first
>         (1:42), then the bigger publishers later (1:45).  The number of
>         published
>         messages bounced up&down, then just after 1:48, the most of the
>         publishers
>         were blocked.
>
>         There are more than 350 of blocked connections like below now
>         (and file
>         descriptors are running at 7,558 + 4,647 on 2 nodes):
>         10.95.212.11:33751 <http://10.95.212.11:33751> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1261.558817
>              flow
>         10.95.212.11:33752 <http://10.95.212.11:33752> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1326.324919
>              flow
>         10.95.212.11:33753 <http://10.95.212.11:33753> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1326.45322
>               flow
>         10.95.212.11:33754 <http://10.95.212.11:33754> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1278.581221
>              flow
>         10.95.212.11:33755 <http://10.95.212.11:33755> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1312.584426
>              flow
>         10.95.212.11:33756 <http://10.95.212.11:33756> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1279.623625
>              flow
>         10.95.212.11:33757 <http://10.95.212.11:33757> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1294.492535
>              flow
>         10.95.212.11:33758 <http://10.95.212.11:33758> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1276.134377
>              flow
>         10.95.212.11:33759 <http://10.95.212.11:33759> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1292.862081
>              flow
>         10.95.212.11:33760 <http://10.95.212.11:33760> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1290.695249
>              flow
>         10.95.212.11:33761 <http://10.95.212.11:33761> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1255.599642
>              flow
>         10.95.212.11:33762 <http://10.95.212.11:33762> ->
>         10.95.212.13:5672 <http://10.95.212.13:5672> blocked 1284.984752
>              flow
>
>         Please kindly suggest.
>
>         Thank you and Best Regards,
>         Choo
>
>
>
>         --
>         View this message in context:
>         http://rabbitmq.1065348.n5.__nabble.com/Connection-blocked-__by-flow-for-more-than-600-__seconds-tp30349p30402.html
>         <http://rabbitmq.1065348.n5.nabble.com/Connection-blocked-by-flow-for-more-than-600-seconds-tp30349p30402.html>
>         Sent from the RabbitMQ mailing list archive at Nabble.com.
>         _________________________________________________
>         rabbitmq-discuss mailing list
>         rabbitmq-discuss at lists.__rabbitmq.com
>         <mailto:rabbitmq-discuss at lists.rabbitmq.com>
>         https://lists.rabbitmq.com/__cgi-bin/mailman/listinfo/__rabbitmq-discuss
>         <https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
>
>
>     --
>     Simon MacMullen
>     RabbitMQ, Pivotal
>
>     _________________________________________________
>     rabbitmq-discuss mailing list
>     rabbitmq-discuss at lists.__rabbitmq.com
>     <mailto:rabbitmq-discuss at lists.rabbitmq.com>
>     https://lists.rabbitmq.com/__cgi-bin/mailman/listinfo/__rabbitmq-discuss
>     <https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
>
>
>
>
> --
> J.
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>

-- 
Simon MacMullen
RabbitMQ, Pivotal


More information about the rabbitmq-discuss mailing list