[rabbitmq-discuss] Channel storms

carlhoerberg carl.hoerberg at gmail.com
Fri Sep 6 07:50:16 BST 2013


We sometimes see node-amqp clients creating thousands and yet thousands of
channels, bringing a whole cluster to a halt. Normally we detect high
channel counts by polling the /api/connections endpoint, but when this
happens, the channels doesn't even belong to a connection, only if you can
poll /api/channels can you see all the channels (and when the CPU usage is
100% and you have 20000 channels this takes hours), but the connection is
then "unknown"/null. 

https://www.dropbox.com/s/e0koajldzxkuqlo/channelstorm.png

When the cluster is overloaded like that it responds to nothing, not even
rabbitmqctl status or rabbitmqctl delete_user, so a full cluster restart is
required and then we can block the misbehaving client. 

This shows up a lot in the logs when this is happening: 

=ERROR REPORT==== 6-Sep-2013::00:25:52 ===
** Generic server <0.22899.21> terminating
** Last message in was {'$gen_cast',
                           {method,
                              
{'queue.declare',0,<<>>,false,false,true,true,
                                   false,[]},
                               none,noflow}}
** When Server state == {ch,running,rabbit_framing_amqp_0_9_1,158,
                            <0.22259.21>,<0.22897.21>,<0.22259.21>,
                            <<"54.208.167.186:42852 -> 10.64.29.128:5672">>,
                            {token,<0.22898.21>,false},
                            none,1,
                            {[],[]},
                            {[],[]},
                            [],[],
                            {user,<<"urhyfncz">>,
                                [management],
                                rabbit_auth_backend_internal,
                                {internal_user,<<"urhyfncz">>,
                                   
<<229,2,91,60,213,201,95,103,146,3,49,108,
                                      66,173,123,15,172,181,119,97>>,
                                    [management]}},
                            <<"urhyfncz">>,<<>>,
                            {dict,0,16,16,8,80,48,
                               
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            {dict,0,16,16,8,80,48,
                               
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            {set,0,16,16,8,80,48,
                               
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            {dict,0,16,16,8,80,48,
                               
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            {set,0,16,16,8,80,48,
                               
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            <0.22257.21>,
                            {state,fine,5000,#Ref<0.0.9.213050>},
                            false,1,
                            {{0,nil},{0,nil}},
                            [],[],none}
** Reason for termination == 
** {{case_clause,not_found},
    [{rabbit_channel,handle_method,3},
     {rabbit_channel,handle_cast,2},
     {gen_server2,handle_msg,2},
     {proc_lib,init_p_do_apply,3}]}

=WARNING REPORT==== 6-Sep-2013::00:25:52 ===
Queue {resource,<<"urhyfncz">>,queue,<<"amq.gen-FQZtyp6mpRdmm3-on_LAOQ">>}
exclusive owner went away

So the problem is two fold, the node-amqp lib obvisouly has a bug, I think
it doesn't closes channels after it self by default, but also, RabbitMQ
allows this kind of behavior and doesnt report the high channel count on
connections so we can't automatically detect it either :(  



--
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Channel-storms-tp29395.html
Sent from the RabbitMQ mailing list archive at Nabble.com.


More information about the rabbitmq-discuss mailing list