[rabbitmq-discuss] Strange subtract_acks crash in RabbitMQ 3.1.0-1 (case_clause empty) on EC2

Karl Rieb karl.rieb at gmail.com
Sat May 18 04:20:01 BST 2013


Hi,

I am running a RabbitMQ cluster of 4 nodes in AWS EC2 on *c1.xlarge *instances 
using Ubuntu 12.04 LTS (kernel 3.2.0-38).  Periodically two of my nodes 
holding the largest queues will crash and require me to restart the 
rabbitmq-server service.  Looking at the logs, I'm at a loss as to why the 
crash is occurring.  I've have tried reproducing the crash manually by 
flooding my nodes with messages, but that doesn't seem to trigger the 
issue.  Below is output from various RabbitMQ logs:

# cat /var/log/rabbitmq/rabbit at ip-10-123-123-123-sasl.log
> =CRASH REPORT==== 18-May-2013::01:41:51 ===
>   crasher:
>     initial call: gen:init_it/6
>     pid: <0.301.0>
>     registered_name: []
>     exception exit: {{case_clause,{empty,{[],[]}}},
>                      [{rabbit_amqqueue_process,subtract_acks,3,[]},
>                       {rabbit_amqqueue_process,subtract_acks,4,[]},
>                       {rabbit_amqqueue_process,handle_cast,2,[]},
>                       {gen_server2,handle_msg,2,[]},
>                       {proc_lib,wake_up,3,
>                                 [{file,"proc_lib.erl"},{line,249}]}]}
>       in function  gen_server2:terminate/3
>     ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.207.0>]
>     messages: [{'$gen_cast',{resume,<0.676.0>}},
>                   {'$gen_cast',{ack,[27583,27606,27629,27652,27675,27698,
>                                      27721,27744,27767,27790,27813,27836,
>                                      27859,27882,27905,27928,27951,27974,
>                                      27997,28020,28043,28066,28089,28112,
>                                      28135,28158,28179,28200,28221,28242,
>                                      28263,28284,28305,28324,28342,28360,
>                                      28378,28396,28414,28432,28450,28468,
>                                      28486,28504,28522,28540,28558,28576,
>                                      28594,28612],
>                                     <0.737.0>}},
>                   {'$gen_cast',{resume,<0.737.0>}},
>                   {'$gen_cast',{ack,[27608,27631,27654,27677,27700,27723,
>                                      27746,27769,27792,27815,27838,27861,
>                                      27884,27907,27930,27953,27976,27999,
>                                      28022,28045,28068,28091,28114,28137,
>                                      28160,28181,28202,28223,28244,28265,
>                                      28286,28307,28326,28344,28362,28380,
>                                      28398,28416,28434,28452,28470,28488,
>                                      28506,28524,28542,28560,28578,28596,
>                                      28614],
>                                     <0.676.0>}},
>                   {'$gen_cast',{ack,[27578,27601,27624,27647,27670,27693,
>                                      27716,27739,27762,27785,27808,27831,
>                                      27854,27877,27900,27923,27946,27969,
>                                      27992,28015,28038,28061,28084,28107,
>                                      28130,28153,28174,28195,28216,28237,
>                                      28258,28279,28300,28321,28340,28358,
>                                      28376,28394,28412,28430,28448,28466,
>                                      28484,28502,28520,28538,28556,28574,
>                                      28592,28610],
>                                     <0.716.0>}},
>                   {'$gen_cast',{resume,<0.716.0>}},
>                   {'$gen_cast',{ack,[27582,27605,27628,27651,27674,27697,
>                                      27720,27743,27766,27789,27812,27835,
>                                      27858,27881,27904,27927,27950,27973,
>                                      27996,28019,28042,28065,28088,28111,
>                                      28134,28157,28178,28199,28220,28241,
>                                      28262,28283,28304,28323,28341,28359,
>                                      28377,28395,28413,28431,28449,28467,
>                                      28485,28503,28521,28539,28557,28575,
>                                      28593,28611],
>                                     <0.737.0>}}]
>     links: [<0.300.0>]
>     dictionary: [{{ch,<0.729.0>},
>                    {cr,<0.729.0>,#Ref<0.0.0.12296>,
>                        {[28715,28696,28677,28658,28639,28620,28602,28584,
>                          28566,28548,28530,28512,28494,28476,28458,28440,
>                          28422,28404,28386,28368,28350,28332,28313,28292,
>                          28271,28250,28229,28208,28187,28166,28143,28120,
>                          28097,28074,28051,28028,28005,27982,27959,27936,
>                          27913,27890,27867,27844,27821,27798,27775,27752,
>                          27729,27706,27683,27660,27637,27614],
>                         [27591]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.728.0>,active,{0,nil}},
>                        11}},
>                   {{credit_to,<7350.877.0>},22},
>                   {{ch,<0.608.0>},
>                    {cr,<0.608.0>,#Ref<0.0.0.9041>,
>                        {[28708,28689,28670,28651,28632,28607,28589,28571,
>                          28553,28535,28517,28499,28481,28463,28445,28427,
>                          28409,28391,28373,28355,28337,28318,28297,28276,
>                          28255,28234,28213,28192,28171,28150,28127,28104,
>                          28081,28058,28035,28012,27989,27966,27943,27920,
>                          27897,27874,27851,27828,27805,27782,27759,27736,
>                          27713,27690,27667,27644,27621,27598],
>                         [27575]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.607.0>,active,{0,nil}},
>                        38}},
>                   {{ch,<0.624.0>},
>                    {cr,<0.624.0>,#Ref<0.0.0.9436>,
>                        {[28722,28703,28684,28665,28646],[28627]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.623.0>,active,{0,nil}},
>                        36}},
>                   {{ch,<0.696.0>},
>                    {cr,<0.696.0>,#Ref<0.0.0.11414>,
>                        {[],[]},
>                        2,
>                        {[],[]},
>                        {qstate,<0.695.0>,active,{0,nil}},
>                        24}},
>                   {{ch,<0.564.0>},
>                    {cr,<0.564.0>,#Ref<0.0.0.7285>,
>                        {[28718,28699,28680,28661,28642],[28623]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.563.0>,active,{0,nil}},
>                        12}},
>                   {{ch,<0.644.0>},
>                    {cr,<0.644.0>,#Ref<0.0.0.10071>,
>                        {[28713,28694,28675,28656,28637,28618,28600,28582,
>                          28564,28546,28528,28510,28492,28474,28456,28438,
>                          28420,28402,28384,28366,28348,28330,28311,28290,
>                          28269,28248,28227,28206,28185,28164,28141,28118,
>                          28095,28072,28049,28026,28003,27980,27957,27934,
>                          27911,27888,27865,27842,27819,27796,27773,27750,
>                          27727,27704,27681,27658,27635,27612],
>                         [27589]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.643.0>,active,{0,nil}},
>                        11}},
>                   {{ch,<0.676.0>},
>                    {cr,<0.676.0>,#Ref<0.0.0.11309>,
>                        {[28614,28613,28596,28595,28578,28577,28560,28559,
>                          28542,28541,28524,28523,28506,28505,28488,28487,
>                          28470,28469,28452,28451,28434,28433,28416,28415,
>                          28398,28397,28380,28379,28362,28361,28344,28343,
>                          28326,28325,28307,28306,28286,28285,28265,28264,
>                          28244,28243,28223,28222,28202,28201,28181,28180,
>                          28160,28159,28137,28136,28114,28113,28091,28090,
>                          28068,28067,28045,28044,28022,28021,27999,27998,
>                          27976,27975,27953,27952,27930,27929,27907,27906,
>                          27884,27883,27861,27860,27838,27837,27815,27814,
>                          27792,27791,27769,27768,27746,27745,27723,27722,
>                          27700,27699,27677,27676,27654,27653,27631,27630,
>                          27608,27607,27585],
>                         [27584]},
>                        2,
>                        {[{<0.676.0>,
>                           {consumer,<<"amq.ctag-0cvDGmyaAbEW_j15AfhGHQ">>,
>                                     true}}],
>                         [{<0.676.0>,
>                           {consumer,<<"amq.ctag-YqU04SIfQv6P-QVueEjbRg">>,
>                                     true}}]},
>                        {qstate,<0.675.0>,suspended,{0,nil}},
>                        25}},
>                   {{xtype_to_module,direct},rabbit_exchange_type_direct},
>                   {{credit_to,<7350.833.0>},31},
>                   {{ch,<0.745.0>},
>                    {cr,<0.745.0>,#Ref<0.0.0.12674>,
>                        {[28716,28697,28678,28659,28640,28621,28603,28585,
>                          28567,28549,28531,28513,28495,28477,28459,28441,
>                          28423,28405,28387,28369,28351,28333,28314,28293,
>                          28272,28251,28230,28209,28188,28167,28144,28121,
>                          28098,28075,28052,28029,28006,27983,27960,27937,
>                          27914,27891,27868,27845,27822,27799,27776,27753,
>                          27730,27707,27684,27661,27638,27615],
>                         [27592]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.744.0>,active,{0,nil}},
>                        37}},
>                   {{ch,<0.660.0>},
>                    {cr,<0.660.0>,#Ref<0.0.0.10650>,
>                        {[28712,28693,28674,28655,28636,28617,28599,28581,
>                          28563,28545,28527,28509,28491,28473,28455,28437,
>                          28419,28401,28383,28365,28347,28329,28310,28289,
>                          28268,28247,28226,28205,28184,28163,28140,28117,
>                          28094,28071,28048,28025,28002,27979,27956,27933,
>                          27910,27887,27864,27841,27818,27795,27772,27749,
>                          27726,27703,27680,27657,27634,27611],
>                         [27588]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.659.0>,active,{0,nil}},
>                        11}},
>                   {{credit_to,<7350.837.0>},32},
>                   {{credit_to,<7350.881.0>},17},
>                   {{ch,<0.737.0>},
>                    {cr,<0.737.0>,#Ref<0.0.0.12429>,
>                        {[28612,28611,28594,28593,28576,28575,28558,28557,
>                          28540,28539,28522,28521,28504,28503,28486,28485,
>                          28468,28467,28450,28449,28432,28431,28414,28413,
>                          28396,28395,28378,28377,28360,28359,28342,28341,
>                          28324,28323,28305,28304,28284,28283,28263,28262,
>                          28242,28241,28221,28220,28200,28199,28179,28178,
>                          28158,28157,28135,28134,28112,28111,28089,28088,
>                          28066,28065,28043,28042,28020,28019,27997,27996,
>                          27974,27973,27951,27950,27928,27927,27905,27904,
>                          27882,27881,27859,27858,27836,27835,27813,27812,
>                          27790,27789,27767,27766,27744,27743,27721,27720,
>                          27698,27697,27675,27674,27652,27651,27629,27628,
>                          27606,27605,27583],
>                         [27582]},
>                        2,
>                        {[{<0.737.0>,
>                           {consumer,<<"amq.ctag-ITZqhulzHhUzx1uy1Eot3g">>,
>                                     true}}],
>                         [{<0.737.0>,
>                           {consumer,<<"amq.ctag-hpo5ejdoJRdLrT6JS42jGw">>,
>                                     true}}]},
>                        {qstate,<0.736.0>,suspended,{0,nil}},
>                        11}},
>                   {credit_blocked,[]},
>                   {{credit_to,<7350.854.0>},3},
>                   {{ch,<0.716.0>},
>                    {cr,<0.716.0>,#Ref<0.0.0.11820>,
>                        {[28610,28609,28592,28591,28574,28573,28556,28555,
>                          28538,28537,28520,28519,28502,28501,28484,28483,
>                          28466,28465,28448,28447,28430,28429,28412,28411,
>                          28394,28393,28376,28375,28358,28357,28340,28339,
>                          28321,28320,28300,28299,28279,28278,28258,28257,
>                          28237,28236,28216,28215,28195,28194,28174,28173,
>                          28153,28152,28130,28129,28107,28106,28084,28083,
>                          28061,28060,28038,28037,28015,28014,27992,27991,
>                          27969,27968,27946,27945,27923,27922,27900,27899,
>                          27877,27876,27854,27853,27831,27830,27808,27807,
>                          27785,27784,27762,27761,27739,27738,27716,27715,
>                          27693,27692,27670,27669,27647,27646,27624,27623,
>                          27601,27600,27578],
>                         [27577]},
>                        2,
>                        {[{<0.716.0>,
>                           {consumer,<<"amq.ctag-BTlS-GmbXNpcU-wUlMI60Q">>,
>                                     true}}],
>                         [{<0.716.0>,
>                           {consumer,<<"amq.ctag-loc5monF8M1pz_Bb8ypumw">>,
>                                     true}}]},
>                        {qstate,<0.715.0>,suspended,{0,nil}},
>                        44}},
>                   {{ch,<0.782.0>},
>                    {cr,<0.782.0>,#Ref<0.0.0.15649>,
>                        {[28714,28695,28676,28657,28638,28619,28601,28583,
>                          28565,28547,28529,28511,28493,28475,28457,28439,
>                          28421,28403,28385,28367,28349,28331,28312,28291,
>                          28270,28249,28228,28207,28186,28165,28142,28119,
>                          28096,28073,28050,28027,28004,27981,27958,27935,
>                          27912,27889,27866,27843,27820,27797,27774,27751,
>                          27728,27705,27682,27659,27636,27613],
>                         [27590]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.781.0>,active,{0,nil}},
>                        35}},
>                   {{credit_to,<7350.846.0>},22},
>                   {{ch,<0.684.0>},
>                    {cr,<0.684.0>,#Ref<0.0.0.11198>,
>                        {[28711,28692,28673,28654,28635,28616,28598,28580,
>                          28562,28544,28526,28508,28490,28472,28454,28436,
>                          28418,28400,28382,28364,28346,28328,28309,28288,
>                          28267,28246,28225,28204,28183,28162,28139,28116,
>                          28093,28070,28047,28024,28001,27978,27955,27932,
>                          27909,27886,27863,27840,27817,27794,27771,27748,
>                          27725,27702,27679,27656,27633,27610],
>                         [27587]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.683.0>,active,{0,nil}},
>                        11}},
>                   {{ch,<0.692.0>},
>                    {cr,<0.692.0>,#Ref<0.0.0.11088>,
>                        {[28710,28691,28672,28653,28634,28615,28597,28579,
>                          28561,28543,28525,28507,28489,28471,28453,28435,
>                          28417,28399,28381,28363,28345,28327,28308,28287,
>                          28266,28245,28224,28203,28182,28161,28138,28115,
>                          28092,28069,28046,28023,28000,27977,27954,27931,
>                          27908,27885,27862,27839,27816,27793,27770,27747,
>                          27724,27701,27678,27655,27632,27609],
>                         [27586]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.691.0>,active,{0,nil}},
>                        12}},
>                   {{ch,<0.648.0>},
>                    {cr,<0.648.0>,#Ref<0.0.0.10283>,
>                        {[28717,28698,28679,28660,28641],[28622]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.647.0>,active,{0,nil}},
>                        12}},
>                   {guid,{{2296174761,591959305,1562876388,1822892473},1}},
>                   {{ch,<0.720.0>},
>                    {cr,<0.720.0>,#Ref<0.0.0.12036>,
>                        {[28723,28704,28685,28666,28647],[28628]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.719.0>,active,{0,nil}},
>                        37}},
>                   {{credit_to,<7350.869.0>},39},
>                   {{ch,<0.636.0>},
>                    {cr,<0.636.0>,#Ref<0.0.0.9960>,
>                        {[28705,28686,28667,28648,28629,28604,28586,28568,
>                          28550,28532,28514,28496,28478,28460,28442,28424,
>                          28406,28388,28370,28352,28334,28315,28294,28273,
>                          28252,28231,28210,28189,28168,28147,28124,28101,
>                          28078,28055,28032,28009,27986,27963,27940,27917,
>                          27894,27871,27848,27825,27802,27779,27756,27733,
>                          27710,27687,27664,27641,27618,27595],
>                         [27572]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.635.0>,active,{0,nil}},
>                        15}},
>                   {{credit_to,<7350.850.0>},50},
>                   {{ch,<0.616.0>},
>                    {cr,<0.616.0>,#Ref<0.0.0.9300>,
>                        {[28719,28700,28681,28662,28643],[28624]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.615.0>,active,{0,nil}},
>                        12}},
>                   {{ch,<0.509.0>},
>                    {cr,<0.509.0>,#Ref<0.0.0.2243>,
>                        {[28302,28281,28260,28239,28218,28197,28176,28155,
>                          28132,28109,28086,28063,28040,28017,27994,27971],
>                         [27580,27603,27626,27649,27672,27695,27718,27741,
>                          27764,27787,27810,27833,27856,27879,27902,27925,
>                          27948]},
>                        3,
>                        {[],[]},
>                        {qstate,<0.508.0>,active,{0,nil}},
>                        36}},
>                   {{credit_to,<7350.873.0>},17},
>                   {{ch,<0.588.0>},
>                    {cr,<0.588.0>,#Ref<0.0.0.7808>,
>                        {[28721,28702,28683,28664,28645],[28626]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.587.0>,active,{0,nil}},
>                        12}},
>                   {{ch,<0.664.0>},
>                    {cr,<0.664.0>,#Ref<0.0.0.10772>,
>                        {[28720,28701,28682,28663,28644],[28625]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.663.0>,active,{0,nil}},
>                        13}},
>                   {fhc_age_tree,{0,nil}},
>                   {{ch,<0.708.0>},
>                    {cr,<0.708.0>,#Ref<0.0.0.11695>,
>                        {[28707,28688,28669,28650,28631,28606,28588,28570,
>                          28552,28534,28516,28498,28480,28462,28444,28426,
>                          28408,28390,28372,28354,28336,28317,28296,28275,
>                          28254,28233,28212,28191,28170,28149,28126,28103,
>                          28080,28057,28034,28011,27988,27965,27942,27919,
>                          27896,27873,27850,27827,27804,27781,27758,27735,
>                          27712,27689,27666,27643,27620,27597],
>                         [27574]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.707.0>,active,{0,nil}},
>                        35}},
>                   {{credit_to,<7350.829.0>},13},
>                   {{ch,<0.757.0>},
>                    {cr,<0.757.0>,#Ref<0.0.0.13150>,
>                        {[28706,28687,28668,28649,28630,28605,28587,28569,
>                          28551,28533,28515,28497,28479,28461,28443,28425,
>                          28407,28389,28371,28353,28335,28316,28295,28274,
>                          28253,28232,28211,28190,28169,28148,28125,28102,
>                          28079,28056,28033,28010,27987,27964,27941,27918,
>                          27895,27872,27849,27826,27803,27780,27757,27734,
>                          27711,27688,27665,27642,27619,27596],
>                         [27573]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.756.0>,active,{0,nil}},
>                        49}},
>                   {{ch,<0.572.0>},
>                    {cr,<0.572.0>,#Ref<0.0.0.7536>,
>                        {[28709,28690,28671,28652,28633,28608,28590,28572,
>                          28554,28536,28518,28500,28482,28464,28446,28428,
>                          28410,28392,28374,28356,28338,28319,28298,28277,
>                          28256,28235,28214,28193,28172,28151,28128,28105,
>                          28082,28059,28036,28013,27990,27967,27944,27921,
>                          27898,27875,27852,27829,27806,27783,27760,27737,
>                          27714,27691,27668,27645,27622,27599],
>                         [27576]},
>                        1,
>                        {[],[]},
>                        {qstate,<0.571.0>,active,{0,nil}},
>                        35}},
>                   {{credit_to,<7350.825.0>},2},
>                   {{credit_to,<7350.858.0>},3}]
>     trap_exit: true
>     status: running
>     heap_size: 121536
>     stack_size: 27
>     reductions: 9363545
>   neighbours:
>
> =SUPERVISOR REPORT==== 18-May-2013::01:41:51 ===
>      Supervisor: {local,rabbit_amqqueue_sup}
>      Context:    child_terminated
>      Reason:     {{case_clause,{empty,{[],[]}}},
>                   [{rabbit_amqqueue_process,subtract_acks,3,[]},
>                    {rabbit_amqqueue_process,subtract_acks,4,[]},
>                    {rabbit_amqqueue_process,handle_cast,2,[]},
>                    {gen_server2,handle_msg,2,[]},
>                   
>  {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>      Offender:   [{pid,<0.301.0>},
>                   {name,rabbit_amqqueue},
>                   {mfa,
>                       {rabbit_amqqueue_process,start_link,
>                           [{amqqueue,
>                                {resource,<<"/">>,queue,
>                                    <<"document_queue">>},
>                               
>  true,false,none,[],<0.7622.0>,[],[],undefined,
>                                []}]}},
>                   {restart_type,temporary},
>                   {shutdown,4294967295},
>                   {child_type,worker}]


Notice how the *case_clause *is *empty*.  The other rabbit log looks 
similar:

# cat /var/log/rabbitmq/rabbit at ip-10-123-123-123.log 

 

=ERROR REPORT==== 18-May-2013::01:41:48 ===

** Generic server <0.301.0> terminating

** Last message in was {'$gen_cast',

                           {ack,

                               [27585,27584,27607,27630,27653,27676,27699,

                                27722,27745,27768,27791,27814,27837,27860,

                                27883,27906,27929,27952,27975,27998,28021,

                                28044,28067,28090,28113,28136,28159,28180,

                                28201,28222,28243,28264,28285,28306,28325,

                                28343,28361,28379,28397,28415,28433,28451,

                                28469,28487,28505,28523,28541,28559,28577,

                                28595,28613],

                               <0.676.0>}}

** When Server state == {q,

                         {amqqueue,

                          {resource,<<"/">>,queue,

                           <<"document_queue">>},

                          true,false,none,[],<0.301.0>,[],[],undefined,[]},

                         none,true,rabbit_variable_queue,

                         {vqstate,

                          {0,{[],[]}},

                          {0,{[],[]}},

                          {delta,undefined,0,undefined},

                          {0,{[],[]}},

                          {0,{[],[]}}, 

*                          ... [ thousands of lines of state ] ... *

 

** Reason for termination ==

** {{case_clause,{empty,{[],[]}}},

    [{rabbit_amqqueue_process,subtract_acks,3,[]},

     {rabbit_amqqueue_process,subtract_acks,4,[]},

     {rabbit_amqqueue_process,handle_cast,2,[]},

     {gen_server2,handle_msg,2,[]},

     {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}


> =ERROR REPORT==== 18-May-2013::01:41:51 ===

connection <0.430.0>, channel 22 - soft error:

{amqp_error,not_found,

            "no queue 'document_queue' in vhost '/'",

            'queue.declare'}


> =ERROR REPORT==== 18-May-2013::01:41:51 ===

connection <0.384.0>, channel 12 - soft error:

{amqp_error,not_found,

            "no queue 'document_queue' in vhost '/'",

            'queue.declare'}


> =ERROR REPORT==== 18-May-2013::01:41:51 ===

connection <0.384.0>, channel 15 - soft error:

{amqp_error,not_found,

            "no queue 'document_queue' in vhost '/'",

            'queue.declare'}


> =ERROR REPORT==== 18-May-2013::01:41:51 ===

connection <0.373.0>, channel 23 - soft error:

{amqp_error,not_found,

            "no queue 'document_queue' in vhost '/'",

            'queue.declare'} 

* ... [ repeated a bunch of times ] ...*


I also see this in my */var/log/syslog* with regards to *beam.smp*:

# cat /var/log/syslog
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176041] INFO: task 
> beam.smp:18971 blocked for more than 120 seconds.
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176053] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176060] beam.smp       
>  D ffff8801bfd93700     0 18971  18962 0x00000000
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176065] 
>  ffff8801b11cdcb8 0000000000000282 0000000000000000 ffffffffffffffe0
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176070] 
>  ffff8801b11cdfd8 ffff8801b11cdfd8 ffff8801b11cdfd8 0000000000013700
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176075] 
>  ffff8801b3180000 ffff8801b35244a0 00007f500e2079e0 ffff8801b146bb80
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176079] Call Trace:
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176089] 
>  [<ffffffff8165434f>] schedule+0x3f/0x60
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176094] 
>  [<ffffffff8106aef5>] exit_mm+0x85/0x130
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176097] 
>  [<ffffffff8106b10e>] do_exit+0x16e/0x450
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176102] 
>  [<ffffffff810797aa>] ? __dequeue_signal+0x6a/0xb0
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176105] 
>  [<ffffffff8106b594>] do_group_exit+0x44/0xa0
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176108] 
>  [<ffffffff8107c36c>] get_signal_to_deliver+0x21c/0x420
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176113] 
>  [<ffffffff81014825>] do_signal+0x45/0x130
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176116] 
>  [<ffffffff8105539d>] ? set_next_entity+0xad/0xd0
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176120] 
>  [<ffffffff81014ad5>] do_notify_resume+0x65/0x80
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176124] 
>  [<ffffffff8165ea90>] int_signal+0x12/0x17
> *... [ repeated 9 more times with same timestamp ] ...*


There are no strange CPU spikes on my EC2 node based on AWS monitoring. The 
only thing that seems to stand out is that my writes to my EBS volume spike 
during the crash (~90 write OPs/sec, 4 MiB/sec write bandwidth, ~30ms/op 
write latency).

My team and I are completely stumped and would appreciate any insight you 
might have to the crash reports.  Is this a problem with writes hanging 
when going against the EBS volume since EBS volumes are essentially network 
mounted?  Is there a way to configure rabbit to be more tolerant of these 
delays?  


For debugging purposes, below is the status output of one of the nodes when 
it is healthy (so you can see the versions of everything i'm running):

# rabbitmqctl status
> Status of node 'rabbit at ip-10-123-123-123' ...
> [{pid,28978},
>  {running_applications,
>      [{rabbitmq_management,"RabbitMQ Management Console","3.1.0"},
>       {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.1.0"},
>       {webmachine,"webmachine","1.9.1-rmq3.1.0-git52e62bc"},
>       {mochiweb,"MochiMedia Web Server","2.3.1-rmq3.1.0-gitd541e9a"},
>       {rabbitmq_management_agent,"RabbitMQ Management Agent","3.1.0"},
>       {rabbit,"RabbitMQ","3.1.0"},
>       {os_mon,"CPO  CXC 138 46","2.2.11"},
>       {inets,"INETS  CXC 138 49","5.9.4"},
>       {mnesia,"MNESIA  CXC 138 12","4.8"},
>       {amqp_client,"RabbitMQ AMQP Client","3.1.0"},
>       {xmerl,"XML parser","1.3.3"},
>       {sasl,"SASL  CXC 138 11","2.3.1"},
>       {stdlib,"ERTS  CXC 138 10","1.19.1"},
>       {kernel,"ERTS  CXC 138 10","2.16.1"}]},
>  {os,{unix,linux}},
>  {erlang_version,
>      "Erlang R16B (erts-5.10.1) [source-05f1189] [64-bit] [smp:8:8] 
> [async-threads:30] [hipe] [kernel-poll:true]\n"},
>  {memory,
>      [{total,60179048},
>       {connection_procs,5274808},
>       {queue_procs,13107264},
>       {plugins,354272},
>       {other_proc,9741830},
>       {mnesia,150456},
>       {mgmt_db,12248},
>       {msg_index,34768},
>       {other_ets,1148248},
>       {binary,4905584},
>       {code,19565863},
>       {atom,703377},
>       {other_system,5180330}]},
>  {vm_memory_high_watermark,0.8},
>  {vm_memory_limit,5828434329},
>  {disk_free_limit,1000000000},
>  {disk_free,3953344512},
>  {file_descriptors,
>     
>  [{total_limit,924},{total_used,6},{sockets_limit,829},{sockets_used,4}]},
>  {processes,[{limit,1048576},{used,348}]},
>  {run_queue,0},
>  {uptime,3796}]
> ...done.


Note that I am using Erlang v16 from the Erlang Solutions repo (esl-erlang 
package).  I originally was using Erlang v14 from Ubuntu repos 
(erlang-base), but decided to upgrade hoping it would resolve the issue 
(which it did not).  I have also tried running RabbitMQ 3.0.4 without 
success.  This issue seems to affect both versions.

Thanks,
Karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130517/864aab47/attachment.htm>


More information about the rabbitmq-discuss mailing list