[rabbitmq-discuss] RabbitMQ (branch bug21673) running out of file descriptors

Andrey Smirnov smirnov.andrey at gmail.com
Thu Feb 18 07:58:34 GMT 2010


Hello!

We're running RabbitMQ from branch bug21673 (built yesterday). We have
a lot of non-persisten queues (2k - 10k+) that are created/destroyed
all the time. Some queues have unconsumed messaged and they will get
destroyed with unconsumed messages.

When file descriptor limit for RabbitMQ was left to default 1024 files
it was dying in 2 hours. I've raised the limit to 10240 files and it
worked for 16 hours and crashed.

Crash for clients looks like "INTERNAL_ERROR" error on connections.
RabbitMQ is still running and eating 100% of CPU. Restarting RabbitMQ
helps to revive it.

Adding log file excerpt (log is huge, it contains dump of all internal
state, I have it if someone will be interested):

=INFO REPORT==== 16-Feb-2010::04:42:05 ===
Limiting to approx 10190 file handles

=INFO REPORT==== 16-Feb-2010::04:42:05 ===
Memory limit set to 3196MB.

=INFO REPORT==== 16-Feb-2010::04:42:05 ===
Using rabbit_msg_store_ets_index to provide index for message store

=INFO REPORT==== 16-Feb-2010::04:42:05 ===
started TCP Listener on 0.0.0.0:5672

=INFO REPORT==== 16-Feb-2010::04:42:26 ===
accepted TCP connection on 0.0.0.0:5672 from <redacted>:35525

.....

ERROR REPORT==== 17-Feb-2010::17:32:16 ===
** Generic server <0.7466.264> terminating
** Last message in was {'$gen_cast',
                           {method,
                               {'queue.declare',0,

<<"qik.item.session.83c5e30a-180e-4f2a-b59d-c6bd7938faac">>,
                                   false,false,false,false,false,[]},
                               none}}
** When Server state == {ch,running,3,<0.1622.0>,<0.7464.264>,undefined,none,
                            {set,0,16,16,8,80,48,
                                 {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]},
                                 {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                   [],[]}}},
                            1,
                            {[],[]},
                            {[],[]},
                            <<"guest">>,<<"localhost">>,

<<"qik.item.session.6c6a3c8e-38d7-4c69-bb1c-26ae3cc00d67">>,
                            {dict,0,16,16,8,80,48,
                                  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                   [],[]},
                                  {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]}}}}
** Reason for termination ==
** {{badmatch,{error,{{badmatch,{error,emfile}},
                      [{rabbit_queue_index,get_journal_handle,1},
                       {rabbit_queue_index,load_journal,1},
                       {rabbit_queue_index,init,1},
                       {rabbit_variable_queue,init,1},
                       {rabbit_amqqueue_process,init,1},
                       {gen_server2,init_it,6},
                       {proc_lib,init_p_do_apply,3}]}}},
    [{rabbit_amqqueue,start_queue_process,1},
     {rabbit_amqqueue,declare,4},
     {rabbit_channel,handle_method,3},
     {rabbit_channel,handle_cast,2},
     {gen_server2,handle_msg,7},
     {proc_lib,wake_up,3}]}

=ERROR REPORT==== 17-Feb-2010::17:32:16 ===
connection <0.1622.0> (running), channel 3 - error:
{{badmatch,{error,{{badmatch,{error,emfile}},
                   [{rabbit_queue_index,get_journal_handle,1},
                    {rabbit_queue_index,load_journal,1},
                    {rabbit_queue_index,init,1},
                    {rabbit_variable_queue,init,1},
                    {rabbit_amqqueue_process,init,1},
                    {gen_server2,init_it,6},

....

=ERROR REPORT==== 17-Feb-2010::17:33:08 ===
Mnesia(rabbit at SM2): ** ERROR ** (core dumped to file:
"/usr/local/sbin/MnesiaCore.rabbit at SM2_1266_456788_594733")
 ** FATAL ** Cannot open log file
"/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL":
{file_error,


"/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",

                        emfile}

=ERROR REPORT==== 17-Feb-2010::17:33:08 ===
** gen_event handler rabbit_error_logger crashed.
** Was installed in error_logger
** Last event was: {error,<0.58.0>,
                       {<0.61.0>,
                        "Mnesia(~p): ** ERROR ** (core dumped to file:
~p)~n ** FATAL ** Cannot open log file ~p: ~p~n",
                        [rabbit at SM2,

"/usr/local/sbin/MnesiaCore.rabbit at SM2_1266_456788_594733",

"/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",
                         {file_error,

"/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",
                             emfile}]}}
** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}
** Reason == {{function_clause,
               [{gen,call,
                 [{rex,{error,{node_not_running,rabbit at SM2}}},
                  '$gen_call',
                  {call,mnesia_lib,db_select_cont,
                   [ram_copies,
                    {rabbit_route,
                     {binding,
                      {resource,<<"localhost">>,exchange,
                       <<"qik.exc.session">>},

<<"qik.item.session.22ff9df4-c115-4cdf-a721-27f99eba9aaa">>,
                      {resource,<<"localhost">>,queue,

<<"qik.item.session.22ff9df4-c115-4cdf-a721-27f99eba9aaa">>},
                      []},
                     [],100,<<>>,[],0,0},
                    [{'$1',[],['$1']}]],
                   <0.0.0>},
                  infinity]},
                {gen_server,call,3},
                {rpc,do_call,3},
                {mnesia,do_dirty_rpc,5},
                {mnesia,select_cont,3},
                {mnesia,'-qlc_select/1-fun-0-',1},
                {rabbit_exchange,'-match_bindings/2-fun-1-',10},
    {rabbit_variable_queue,flush_journal,1},
     {rabbit_amqqueue_process,handle_pre_hibernate,1}]}

=ERROR REPORT==== 17-Feb-2010::17:33:08 ===
Mnesia(rabbit at SM2): ** ERROR ** (core dumped to file:
"/usr/local/sbin/MnesiaCore.rabbit at SM2_1266_456788_594733")
 ** FATAL ** Cannot open log file
"/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL":
{file_error,


"/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",

                        emfile}

=ERROR REPORT==== 17-Feb-2010::17:33:08 ===
** gen_event handler rabbit_error_logger crashed.
** Was installed in error_logger
** Last event was: {error,<0.58.0>,
                       {<0.61.0>,
                        "Mnesia(~p): ** ERROR ** (core dumped to file:
~p)~n ** FATAL ** Cannot open log file ~p: ~p~n",
                        [rabbit at SM2,

"/usr/local/sbin/MnesiaCore.rabbit at SM2_1266_456788_594733",

"/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",
                         {file_error,

"/var/lib/rabbitmq/mnesia/rabbit/rabbit_durable_route.DCL",
                             emfile}]}}

** When handler state == {resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}
** Reason == {{function_clause,
               [{gen,call,
                 [{rex,{error,{node_not_running,rabbit at SM2}}},
                  '$gen_call',
                  {call,mnesia_lib,db_select_cont,
                   [ram_copies,
                    {rabbit_route,
                     {binding,
                      {resource,<<"localhost">>,exchange,
                       <<"qik.exc.session">>},

<<"qik.item.session.22ff9df4-c115-4cdf-a721-27f99eba9aaa">>,
                      {resource,<<"localhost">>,queue,

<<"qik.item.session.22ff9df4-c115-4cdf-a721-27f99eba9aaa">>},
                      []},
                     [],100,<<>>,[],0,0},
                    [{'$1',[],['$1']}]],
                   <0.0.0>},
                  infinity]},
                {gen_server,call,3},
                {rpc,do_call,3},
                {mnesia,do_dirty_rpc,5},
                {mnesia,select_cont,3},
                {mnesia,'-qlc_select/1-fun-0-',1},
                {rabbit_exchange,'-match_bindings/2-fun-1-',10},
                {qlc,collect,1}]},
              {gen_server,call,
               [{rex,{error,{node_not_running,rabbit at SM2}}},
                {call,mnesia_lib,db_select_cont,
                 [ram_copies,
                  {rabbit_route,

.....


** Reason for termination ==
** {{badmatch,{error,emfile}},
    [{rabbit_queue_index,get_segment_handle,1},
     {rabbit_queue_index,append_journal_to_segment,2},
     {rabbit_queue_index,'-flush_journal/1-fun-0-',3},
     {lists,foldl,3},
     {rabbit_queue_index,segment_fold,3},
     {rabbit_queue_index,flush_journal,1},
     {rabbit_variable_queue,flush_journal,1},
     {rabbit_amqqueue_process,handle_pre_hibernate,1}]}



-- 
Andrey Smirnov




More information about the rabbitmq-discuss mailing list