[rabbitmq-discuss] rabbit cluster keeps crashing

LoOoD gman at colo247.com
Thu Mar 25 01:07:09 GMT 2010


We've been having a problem with our rabbit servers. We have a cluster setup,
one disk node and two ram nodes. All clients only connect to
the ram nodes. The ram nodes eventually stop accepting new connections.

The only way to fix so far is to restart the affected ram node. 

We're running 1.7.2-1 (installed via the deb repo) on ubuntu 8.10 64bit.

Here is are some crash reports from rabbit-sasl.log:

=CRASH REPORT==== 24-Mar-2010::17:17:49 ===
  crasher:
    pid: <0.8294.1>
    registered_name: []
    exception exit: {{{nodedown,rabbit at job3},
                      {gen_server2,call,[<6902.26064.102>,stat,infinity]}},
                     [{gen_server2,call,3},
                      {rabbit_misc,with_exit_handler,2},
                      {rabbit_channel,return_queue_declare_ok,3},
                      {rabbit_channel,handle_cast,2},
                      {gen_server2,handle_msg,7},
                      {proc_lib,init_p,5}]}
      in function  gen_server2:terminate/6
    initial call: gen:init_it(gen_server2,<0.8293.1>,<0.8293.1>,
                              rabbit_channel,
                              [1,<0.8283.1>,<0.8292.1>,<<"mtvmq">>,<<"/">>],
                              [])
    ancestors: [<0.8293.1>]
    messages: []
    links: [<0.8293.1>,<0.8292.1>]
    dictionary: [{permission_cache,
                     
[{{resource,<<"/">>,queue,<<"worker.transcode_queue">>},
                        configure}]}]
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 23
    reductions: 2442
  neighbours:
    neighbour: [{pid,<0.8292.1>},
                  {registered_name,[]},
                  {initial_call,{rabbit_writer,mainloop,1}},
                  {current_function,{erlang,hibernate,3}},
                  {ancestors,[]},
                  {messages,[shutdown]},
                  {links,[<0.8294.1>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,runnable},
                  {heap_size,7},
                  {stack_size,0},
                  {reductions,17}]
=CRASH REPORT==== 24-Mar-2010::17:17:49 ===
  crasher:
    pid: <0.9336.0>
    registered_name: []
    exception exit: {{{nodedown,rabbit at job3},
                      {gen_server2,call,[<6902.26064.102>,stat,infinity]}},
                     [{gen_server2,call,3},
                      {rabbit_misc,with_exit_handler,2},
                      {rabbit_channel,return_queue_declare_ok,3},
                      {rabbit_channel,handle_cast,2},
                      {gen_server2,handle_msg,7},
                      {proc_lib,wake_up,3}]}
      in function  gen_server2:terminate/6
    initial call: gen_server2:wake_hib(<0.9335.0>,<0.9336.0>,
                                       {ch,running,1,<0.9333.0>,<0.9334.0>,
                                        <0.9337.0>,none,
                                        {sets,0,16,16,8,80,48,
                                        
{[],[],[],[],[],[],[],[],[],[],[],[],
                                          [],[],[],[]},
                                         {{[],[],[],[],[],[],[],[],[],[],[],
                                           [],[],[],[],[]}}},
                                        3,
                                        {[],[]},
                                        {[],[]},
                                        <<"mtvmq">>,<<"/">>,
                                        <<"worker.transcode_queue">>,
                                        {dict,1,16,16,8,80,48,
                                        
{[],[],[],[],[],[],[],[],[],[],[],[],
                                          [],[],[],[]},
                                         {{[],[],[],[],[],[],[],[],[],[],[],
                                           [],
                                           [[<<"W">>|
                                             {resource,<<"/">>,queue,
                                             
<<"worker.transcode_queue">>}]],
                                           [],[],[]}}}},
                                       rabbit_channel,
                                       {{1269,476217,925545},
                                        {backoff,1460,1000,10000,
                                         {1635,29903,19828}}},
                                       {queue,[],[]},
                                       [])
    ancestors: [<0.9335.0>]
    messages: []
    links: [<0.9335.0>,<0.9334.0>]
    dictionary: [{permission_cache,
                     
[{{resource,<<"/">>,queue,<<"worker.transcode_queue">>},
                        configure}]}]
    trap_exit: true
    status: running
    heap_size: 610
    stack_size: 23
    reductions: 20122
  neighbours:
    neighbour: [{pid,<0.9334.0>},
                  {registered_name,[]},
                  {initial_call,{rabbit_writer,mainloop,1}},
                  {current_function,{erlang,hibernate,3}},
                  {ancestors,[]},
                  {messages,[shutdown]},
                  {links,[<0.9336.0>]},
                  {dictionary,[]},
                  {trap_exit,false},
                  {status,runnable},
                  {heap_size,7},
                  {stack_size,0},
                  {reductions,352}]

=CRASH REPORT==== 24-Mar-2010::17:17:56 ===
  crasher:
    pid: <0.16307.0>
    registered_name: []
    exception exit: {{{nodedown,rabbit at job3},
                      {gen_server2,call,[<6902.26064.102>,stat,infinity]}},
                     [{gen_server2,call,3},
                      {rabbit_misc,with_exit_handler,2},
                      {rabbit_channel,return_queue_declare_ok,3},
                      {rabbit_channel,handle_cast,2},
                      {gen_server2,handle_msg,7},
                      {proc_lib,wake_up,3}]}
      in function  gen_server2:terminate/6
    initial call: gen_server2:wake_hib(<0.16306.0>,<0.16307.0>,
                                      
{ch,running,1,<0.16304.0>,<0.16305.0>,
                                        <0.16308.0>,none,
                                        {sets,0,16,16,8,80,48,
                                        
{[],[],[],[],[],[],[],[],[],[],[],[],
                                          [],[],[],[]},
                                         {{[],[],[],[],[],[],[],[],[],[],[],
                                           [],[],[],[],[]}}},
                                        7,
                                        {[],[]},
                                        {[],[]},
                                        <<"mtvmq">>,<<"/">>,
                                        <<"worker.transcode_queue">>,
                                        {dict,1,16,16,8,80,48,
                                        
{[],[],[],[],[],[],[],[],[],[],[],[],
                                          [],[],[],[]},
                                         {{[],[],[],[],[],[],[],[],[],[],[],
                                           [],
                                           [[<<"W">>|
                                             {resource,<<"/">>,queue,
                                             
<<"worker.transcode_queue">>}]],
                                           [],[],[]}}}},
                                       rabbit_channel,
                                       {{1269,476256,284302},
                                        {backoff,1998,1000,10000,
                                         {7558,3810,6674}}},
                                       {queue,[],[]},
                                       [])
    ancestors: [<0.16306.0>]
    messages: [{'EXIT',<0.16305.0>,normal}]
    links: [<0.16306.0>]
    dictionary: [{permission_cache,
                     
[{{resource,<<"/">>,queue,<<"worker.transcode_queue">>},
                        configure}]}]
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 23
    reductions: 12119
  neighbours:

=CRASH REPORT==== 24-Mar-2010::17:17:56 ===
  crasher:
    pid: <0.13792.0>
    registered_name: []
    exception exit: {{{nodedown,rabbit at job3},
                      {gen_server2,call,[<6902.26064.102>,stat,infinity]}},
                     [{gen_server2,call,3},
                      {rabbit_misc,with_exit_handler,2},
                      {rabbit_channel,return_queue_declare_ok,3},
                      {rabbit_channel,handle_cast,2},
                      {gen_server2,handle_msg,7},
                      {proc_lib,wake_up,3}]}
      in function  gen_server2:terminate/6
    initial call: gen_server2:wake_hib(<0.13791.0>,<0.13792.0>,
                                      
{ch,running,1,<0.13789.0>,<0.13790.0>,
                                        <0.13793.0>,none,
                                        {sets,0,16,16,8,80,48,
                                        
{[],[],[],[],[],[],[],[],[],[],[],[],
                                          [],[],[],[]},
                                         {{[],[],[],[],[],[],[],[],[],[],[],
                                           [],[],[],[],[]}}},
                                        5,
                                        {[],[]},
                                        {[],[]},
                                        <<"mtvmq">>,<<"/">>,
                                        <<"worker.transcode_queue">>,
                                        {dict,1,16,16,8,80,48,
                                        
{[],[],[],[],[],[],[],[],[],[],[],[],
                                          [],[],[],[]},
                                         {{[],[],[],[],[],[],[],[],[],[],[],
                                           [],
                                           [[<<"W">>|
                                             {resource,<<"/">>,queue,
                                             
<<"worker.transcode_queue">>}]],
                                           [],[],[]}}}},
                                       rabbit_channel,
                                       {{1269,476259,683048},
                                        {backoff,1067,1000,10000,
                                         {21120,18572,28318}}},
                                       {queue,[],[]},
                                       [])
    ancestors: [<0.13791.0>]
    messages: [{'EXIT',<0.13790.0>,normal}]
    links: [<0.13791.0>]
    dictionary: [{permission_cache,
                     
[{{resource,<<"/">>,queue,<<"worker.transcode_queue">>},
                        configure}]}]
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 23
    reductions: 11759
  neighbours:

-- 
View this message in context: http://old.nabble.com/rabbit-cluster-keeps-crashing-tp28023134p28023134.html
Sent from the RabbitMQ mailing list archive at Nabble.com.





More information about the rabbitmq-discuss mailing list