[rabbitmq-discuss] RabbitMQ (branch bug21673) running out of file descriptors

Andrey Smirnov smirnov.andrey at gmail.com
Thu Feb 18 08:02:42 GMT 2010


Forgot to add: at the time of crash we had more than 10k queues...
(Looks like more than file descriptor limit)

2010/2/18 Andrey Smirnov <smirnov.andrey at gmail.com>:
> Hello!
>
> We're running RabbitMQ from branch bug21673 (built yesterday). We have
> a lot of non-persisten queues (2k - 10k+) that are created/destroyed
> all the time. Some queues have unconsumed messaged and they will get
> destroyed with unconsumed messages.
>
> When file descriptor limit for RabbitMQ was left to default 1024 files
> it was dying in 2 hours. I've raised the limit to 10240 files and it
> worked for 16 hours and crashed.
>
> Crash for clients looks like "INTERNAL_ERROR" error on connections.
> RabbitMQ is still running and eating 100% of CPU. Restarting RabbitMQ
> helps to revive it.
>
> Adding log file excerpt (log is huge, it contains dump of all internal
> state, I have it if someone will be interested):
>
> =INFO REPORT==== 16-Feb-2010::04:42:05 ===
> Limiting to approx 10190 file handles
>
> =INFO REPORT==== 16-Feb-2010::04:42:05 ===
> Memory limit set to 3196MB.
>
> =INFO REPORT==== 16-Feb-2010::04:42:05 ===
> Using rabbit_msg_store_ets_index to provide index for message store
>
> =INFO REPORT==== 16-Feb-2010::04:42:05 ===
> started TCP Listener on 0.0.0.0:5672
>
> =INFO REPORT==== 16-Feb-2010::04:42:26 ===
> accepted TCP connection on 0.0.0.0:5672 from <redacted>:35525
>
> .....
>
> ERROR REPORT==== 17-Feb-2010::17:32:16 ===
> ** Generic server <0.7466.264> terminating
> ** Last message in was {'$gen_cast',
>                           {method,
>                               {'queue.declare',0,
>
> <<"qik.item.session.83c5e30a-180e-4f2a-b59d-c6bd7938faac">>,
>                                   false,false,false,false,false,[]},
>                               none}}
> ** When Server state == {ch,running,3,<0.1622.0>,<0.7464.264>,undefined,none,
>                            {set,0,16,16,8,80,48,
>                                 {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>                                  [],[]},
>                                 {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>                                   [],[]}}},
>                            1,
>                            {[],[]},
>                            {[],[]},
>                            <<"guest">>,<<"localhost">>,
>
> <<"qik.item.session.6c6a3c8e-38d7-4c69-bb1c-26ae3cc00d67">>,
>                            {dict,0,16,16,8,80,48,
>                                  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>                                   [],[]},
>                                  {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>                                    [],[]}}}}
> ** Reason for termination ==
> ** {{badmatch,{error,{{badmatch,{error,emfile}},
>                      [{rabbit_queue_index,get_journal_handle,1},
>                       {rabbit_queue_index,load_journal,1},
>                       {rabbit_queue_index,init,1},
>                       {rabbit_variable_queue,init,1},
>                       {rabbit_amqqueue_process,init,1},
>                       {gen_server2,init_it,6},
>                       {proc_lib,init_p_do_apply,3}]}}},
>    [{rabbit_amqqueue,start_queue_process,1},
>     {rabbit_amqqueue,declare,4},
>     {rabbit_channel,handle_method,3},
>     {rabbit_channel,handle_cast,2},
>     {gen_server2,handle_msg,7},
>     {proc_lib,wake_up,3}]}
>
> =ERROR REPORT==== 17-Feb-2010::17:32:16 ===
> connection <0.1622.0> (running), channel 3 - error:
> {{badmatch,{error,{{badmatch,{error,emfile}},
>                   [{rabbit_queue_index,get_journal_handle,1},
>                    {rabbit_queue_index,load_journal,1},
>                    {rabbit_queue_index,init,1},
>                    {rabbit_variable_queue,init,1},
>                    {rabbit_amqqueue_process,init,1},
>                    {gen_server2,init_it,6},
>
> ....
>
> =ERROR REPORT==== 17-Feb-2010::17:33:08 ===
> Mnesia(rabbit at SM2): ** ERROR ** (core dumped to file:
> "/usr/local/sbin/MnesiaCore.rabbit at SM2_1266_456788_594733")
>  ** FATAL ** Cannot open log file
> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL":
> {file_error,
>
>
> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",
>
>                        emfile}
>
> =ERROR REPORT==== 17-Feb-2010::17:33:08 ===
> ** gen_event handler rabbit_error_logger crashed.
> ** Was installed in error_logger
> ** Last event was: {error,<0.58.0>,
>                       {<0.61.0>,
>                        "Mnesia(~p): ** ERROR ** (core dumped to file:
> ~p)~n ** FATAL ** Cannot open log file ~p: ~p~n",
>                        [rabbit at SM2,
>
> "/usr/local/sbin/MnesiaCore.rabbit at SM2_1266_456788_594733",
>
> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",
>                         {file_error,
>
> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",
>                             emfile}]}}
> ** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}
> ** Reason == {{function_clause,
>               [{gen,call,
>                 [{rex,{error,{node_not_running,rabbit at SM2}}},
>                  '$gen_call',
>                  {call,mnesia_lib,db_select_cont,
>                   [ram_copies,
>                    {rabbit_route,
>                     {binding,
>                      {resource,<<"localhost">>,exchange,
>                       <<"qik.exc.session">>},
>
> <<"qik.item.session.22ff9df4-c115-4cdf-a721-27f99eba9aaa">>,
>                      {resource,<<"localhost">>,queue,
>
> <<"qik.item.session.22ff9df4-c115-4cdf-a721-27f99eba9aaa">>},
>                      []},
>                     [],100,<<>>,[],0,0},
>                    [{'$1',[],['$1']}]],
>                   <0.0.0>},
>                  infinity]},
>                {gen_server,call,3},
>                {rpc,do_call,3},
>                {mnesia,do_dirty_rpc,5},
>                {mnesia,select_cont,3},
>                {mnesia,'-qlc_select/1-fun-0-',1},
>                {rabbit_exchange,'-match_bindings/2-fun-1-',10},
>    {rabbit_variable_queue,flush_journal,1},
>     {rabbit_amqqueue_process,handle_pre_hibernate,1}]}
>
> =ERROR REPORT==== 17-Feb-2010::17:33:08 ===
> Mnesia(rabbit at SM2): ** ERROR ** (core dumped to file:
> "/usr/local/sbin/MnesiaCore.rabbit at SM2_1266_456788_594733")
>  ** FATAL ** Cannot open log file
> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL":
> {file_error,
>
>
> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",
>
>                        emfile}
>
> =ERROR REPORT==== 17-Feb-2010::17:33:08 ===
> ** gen_event handler rabbit_error_logger crashed.
> ** Was installed in error_logger
> ** Last event was: {error,<0.58.0>,
>                       {<0.61.0>,
>                        "Mnesia(~p): ** ERROR ** (core dumped to file:
> ~p)~n ** FATAL ** Cannot open log file ~p: ~p~n",
>                        [rabbit at SM2,
>
> "/usr/local/sbin/MnesiaCore.rabbit at SM2_1266_456788_594733",
>
> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",
>                         {file_error,
>
> "/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",
>                             emfile}]}}
>
> ** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}
> ** Reason == {{function_clause,
>               [{gen,call,
>                 [{rex,{error,{node_not_running,rabbit at SM2}}},
>                  '$gen_call',
>                  {call,mnesia_lib,db_select_cont,
>                   [ram_copies,
>                    {rabbit_route,
>                     {binding,
>                      {resource,<<"localhost">>,exchange,
>                       <<"qik.exc.session">>},
>
> <<"qik.item.session.22ff9df4-c115-4cdf-a721-27f99eba9aaa">>,
>                      {resource,<<"localhost">>,queue,
>
> <<"qik.item.session.22ff9df4-c115-4cdf-a721-27f99eba9aaa">>},
>                      []},
>                     [],100,<<>>,[],0,0},
>                    [{'$1',[],['$1']}]],
>                   <0.0.0>},
>                  infinity]},
>                {gen_server,call,3},
>                {rpc,do_call,3},
>                {mnesia,do_dirty_rpc,5},
>                {mnesia,select_cont,3},
>                {mnesia,'-qlc_select/1-fun-0-',1},
>                {rabbit_exchange,'-match_bindings/2-fun-1-',10},
>                {qlc,collect,1}]},
>              {gen_server,call,
>               [{rex,{error,{node_not_running,rabbit at SM2}}},
>                {call,mnesia_lib,db_select_cont,
>                 [ram_copies,
>                  {rabbit_route,
>
> .....
>
>
> ** Reason for termination ==
> ** {{badmatch,{error,emfile}},
>    [{rabbit_queue_index,get_segment_handle,1},
>     {rabbit_queue_index,append_journal_to_segment,2},
>     {rabbit_queue_index,'-flush_journal/1-fun-0-',3},
>     {lists,foldl,3},
>     {rabbit_queue_index,segment_fold,3},
>     {rabbit_queue_index,flush_journal,1},
>     {rabbit_variable_queue,flush_journal,1},
>     {rabbit_amqqueue_process,handle_pre_hibernate,1}]}
>
>
>
> --
> Andrey Smirnov
>



-- 
Andrey Smirnov,
phone. +7 (905) 769-83-20


More information about the rabbitmq-discuss mailing list