[rabbitmq-discuss] Strange subtract_acks crash in RabbitMQ 3.1.0-1 (case_clause empty) on EC2
Brett Cameron
brett.r.cameron at gmail.com
Sat May 18 04:48:10 BST 2013
Karl,
Wild speculation would be that messages have been consumed from a queue but
that queue (name = document_queue?) is for some reason missing in action
(gone, inaccessible, ...) when the consumer subsequently tries to ack the
message(s).
Brett
On Sat, May 18, 2013 at 3:20 PM, Karl Rieb <karl.rieb at gmail.com> wrote:
> Hi,
>
> I am running a RabbitMQ cluster of 4 nodes in AWS EC2 on *c1.xlarge *instances
> using Ubuntu 12.04 LTS (kernel 3.2.0-38). Periodically two of my nodes
> holding the largest queues will crash and require me to restart the
> rabbitmq-server service. Looking at the logs, I'm at a loss as to why the
> crash is occurring. I've have tried reproducing the crash manually by
> flooding my nodes with messages, but that doesn't seem to trigger the
> issue. Below is output from various RabbitMQ logs:
>
> # cat /var/log/rabbitmq/rabbit at ip-10-123-123-123-sasl.log
>> =CRASH REPORT==== 18-May-2013::01:41:51 ===
>> crasher:
>> initial call: gen:init_it/6
>> pid: <0.301.0>
>> registered_name: []
>> exception exit: {{case_clause,{empty,{[],[]}}},
>> [{rabbit_amqqueue_process,subtract_acks,3,[]},
>> {rabbit_amqqueue_process,subtract_acks,4,[]},
>> {rabbit_amqqueue_process,handle_cast,2,[]},
>> {gen_server2,handle_msg,2,[]},
>> {proc_lib,wake_up,3,
>> [{file,"proc_lib.erl"},{line,249}]}]}
>> in function gen_server2:terminate/3
>> ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.207.0>]
>> messages: [{'$gen_cast',{resume,<0.676.0>}},
>> {'$gen_cast',{ack,[27583,27606,27629,27652,27675,27698,
>> 27721,27744,27767,27790,27813,27836,
>> 27859,27882,27905,27928,27951,27974,
>> 27997,28020,28043,28066,28089,28112,
>> 28135,28158,28179,28200,28221,28242,
>> 28263,28284,28305,28324,28342,28360,
>> 28378,28396,28414,28432,28450,28468,
>> 28486,28504,28522,28540,28558,28576,
>> 28594,28612],
>> <0.737.0>}},
>> {'$gen_cast',{resume,<0.737.0>}},
>> {'$gen_cast',{ack,[27608,27631,27654,27677,27700,27723,
>> 27746,27769,27792,27815,27838,27861,
>> 27884,27907,27930,27953,27976,27999,
>> 28022,28045,28068,28091,28114,28137,
>> 28160,28181,28202,28223,28244,28265,
>> 28286,28307,28326,28344,28362,28380,
>> 28398,28416,28434,28452,28470,28488,
>> 28506,28524,28542,28560,28578,28596,
>> 28614],
>> <0.676.0>}},
>> {'$gen_cast',{ack,[27578,27601,27624,27647,27670,27693,
>> 27716,27739,27762,27785,27808,27831,
>> 27854,27877,27900,27923,27946,27969,
>> 27992,28015,28038,28061,28084,28107,
>> 28130,28153,28174,28195,28216,28237,
>> 28258,28279,28300,28321,28340,28358,
>> 28376,28394,28412,28430,28448,28466,
>> 28484,28502,28520,28538,28556,28574,
>> 28592,28610],
>> <0.716.0>}},
>> {'$gen_cast',{resume,<0.716.0>}},
>> {'$gen_cast',{ack,[27582,27605,27628,27651,27674,27697,
>> 27720,27743,27766,27789,27812,27835,
>> 27858,27881,27904,27927,27950,27973,
>> 27996,28019,28042,28065,28088,28111,
>> 28134,28157,28178,28199,28220,28241,
>> 28262,28283,28304,28323,28341,28359,
>> 28377,28395,28413,28431,28449,28467,
>> 28485,28503,28521,28539,28557,28575,
>> 28593,28611],
>> <0.737.0>}}]
>> links: [<0.300.0>]
>> dictionary: [{{ch,<0.729.0>},
>> {cr,<0.729.0>,#Ref<0.0.0.12296>,
>> {[28715,28696,28677,28658,28639,28620,28602,28584,
>> 28566,28548,28530,28512,28494,28476,28458,28440,
>> 28422,28404,28386,28368,28350,28332,28313,28292,
>> 28271,28250,28229,28208,28187,28166,28143,28120,
>> 28097,28074,28051,28028,28005,27982,27959,27936,
>> 27913,27890,27867,27844,27821,27798,27775,27752,
>> 27729,27706,27683,27660,27637,27614],
>> [27591]},
>> 1,
>> {[],[]},
>> {qstate,<0.728.0>,active,{0,nil}},
>> 11}},
>> {{credit_to,<7350.877.0>},22},
>> {{ch,<0.608.0>},
>> {cr,<0.608.0>,#Ref<0.0.0.9041>,
>> {[28708,28689,28670,28651,28632,28607,28589,28571,
>> 28553,28535,28517,28499,28481,28463,28445,28427,
>> 28409,28391,28373,28355,28337,28318,28297,28276,
>> 28255,28234,28213,28192,28171,28150,28127,28104,
>> 28081,28058,28035,28012,27989,27966,27943,27920,
>> 27897,27874,27851,27828,27805,27782,27759,27736,
>> 27713,27690,27667,27644,27621,27598],
>> [27575]},
>> 1,
>> {[],[]},
>> {qstate,<0.607.0>,active,{0,nil}},
>> 38}},
>> {{ch,<0.624.0>},
>> {cr,<0.624.0>,#Ref<0.0.0.9436>,
>> {[28722,28703,28684,28665,28646],[28627]},
>> 1,
>> {[],[]},
>> {qstate,<0.623.0>,active,{0,nil}},
>> 36}},
>> {{ch,<0.696.0>},
>> {cr,<0.696.0>,#Ref<0.0.0.11414>,
>> {[],[]},
>> 2,
>> {[],[]},
>> {qstate,<0.695.0>,active,{0,nil}},
>> 24}},
>> {{ch,<0.564.0>},
>> {cr,<0.564.0>,#Ref<0.0.0.7285>,
>> {[28718,28699,28680,28661,28642],[28623]},
>> 1,
>> {[],[]},
>> {qstate,<0.563.0>,active,{0,nil}},
>> 12}},
>> {{ch,<0.644.0>},
>> {cr,<0.644.0>,#Ref<0.0.0.10071>,
>> {[28713,28694,28675,28656,28637,28618,28600,28582,
>> 28564,28546,28528,28510,28492,28474,28456,28438,
>> 28420,28402,28384,28366,28348,28330,28311,28290,
>> 28269,28248,28227,28206,28185,28164,28141,28118,
>> 28095,28072,28049,28026,28003,27980,27957,27934,
>> 27911,27888,27865,27842,27819,27796,27773,27750,
>> 27727,27704,27681,27658,27635,27612],
>> [27589]},
>> 1,
>> {[],[]},
>> {qstate,<0.643.0>,active,{0,nil}},
>> 11}},
>> {{ch,<0.676.0>},
>> {cr,<0.676.0>,#Ref<0.0.0.11309>,
>> {[28614,28613,28596,28595,28578,28577,28560,28559,
>> 28542,28541,28524,28523,28506,28505,28488,28487,
>> 28470,28469,28452,28451,28434,28433,28416,28415,
>> 28398,28397,28380,28379,28362,28361,28344,28343,
>> 28326,28325,28307,28306,28286,28285,28265,28264,
>> 28244,28243,28223,28222,28202,28201,28181,28180,
>> 28160,28159,28137,28136,28114,28113,28091,28090,
>> 28068,28067,28045,28044,28022,28021,27999,27998,
>> 27976,27975,27953,27952,27930,27929,27907,27906,
>> 27884,27883,27861,27860,27838,27837,27815,27814,
>> 27792,27791,27769,27768,27746,27745,27723,27722,
>> 27700,27699,27677,27676,27654,27653,27631,27630,
>> 27608,27607,27585],
>> [27584]},
>> 2,
>> {[{<0.676.0>,
>> {consumer,<<"amq.ctag-0cvDGmyaAbEW_j15AfhGHQ">>,
>> true}}],
>> [{<0.676.0>,
>> {consumer,<<"amq.ctag-YqU04SIfQv6P-QVueEjbRg">>,
>> true}}]},
>> {qstate,<0.675.0>,suspended,{0,nil}},
>> 25}},
>> {{xtype_to_module,direct},rabbit_exchange_type_direct},
>> {{credit_to,<7350.833.0>},31},
>> {{ch,<0.745.0>},
>> {cr,<0.745.0>,#Ref<0.0.0.12674>,
>> {[28716,28697,28678,28659,28640,28621,28603,28585,
>> 28567,28549,28531,28513,28495,28477,28459,28441,
>> 28423,28405,28387,28369,28351,28333,28314,28293,
>> 28272,28251,28230,28209,28188,28167,28144,28121,
>> 28098,28075,28052,28029,28006,27983,27960,27937,
>> 27914,27891,27868,27845,27822,27799,27776,27753,
>> 27730,27707,27684,27661,27638,27615],
>> [27592]},
>> 1,
>> {[],[]},
>> {qstate,<0.744.0>,active,{0,nil}},
>> 37}},
>> {{ch,<0.660.0>},
>> {cr,<0.660.0>,#Ref<0.0.0.10650>,
>> {[28712,28693,28674,28655,28636,28617,28599,28581,
>> 28563,28545,28527,28509,28491,28473,28455,28437,
>> 28419,28401,28383,28365,28347,28329,28310,28289,
>> 28268,28247,28226,28205,28184,28163,28140,28117,
>> 28094,28071,28048,28025,28002,27979,27956,27933,
>> 27910,27887,27864,27841,27818,27795,27772,27749,
>> 27726,27703,27680,27657,27634,27611],
>> [27588]},
>> 1,
>> {[],[]},
>> {qstate,<0.659.0>,active,{0,nil}},
>> 11}},
>> {{credit_to,<7350.837.0>},32},
>> {{credit_to,<7350.881.0>},17},
>> {{ch,<0.737.0>},
>> {cr,<0.737.0>,#Ref<0.0.0.12429>,
>> {[28612,28611,28594,28593,28576,28575,28558,28557,
>> 28540,28539,28522,28521,28504,28503,28486,28485,
>> 28468,28467,28450,28449,28432,28431,28414,28413,
>> 28396,28395,28378,28377,28360,28359,28342,28341,
>> 28324,28323,28305,28304,28284,28283,28263,28262,
>> 28242,28241,28221,28220,28200,28199,28179,28178,
>> 28158,28157,28135,28134,28112,28111,28089,28088,
>> 28066,28065,28043,28042,28020,28019,27997,27996,
>> 27974,27973,27951,27950,27928,27927,27905,27904,
>> 27882,27881,27859,27858,27836,27835,27813,27812,
>> 27790,27789,27767,27766,27744,27743,27721,27720,
>> 27698,27697,27675,27674,27652,27651,27629,27628,
>> 27606,27605,27583],
>> [27582]},
>> 2,
>> {[{<0.737.0>,
>> {consumer,<<"amq.ctag-ITZqhulzHhUzx1uy1Eot3g">>,
>> true}}],
>> [{<0.737.0>,
>> {consumer,<<"amq.ctag-hpo5ejdoJRdLrT6JS42jGw">>,
>> true}}]},
>> {qstate,<0.736.0>,suspended,{0,nil}},
>> 11}},
>> {credit_blocked,[]},
>> {{credit_to,<7350.854.0>},3},
>> {{ch,<0.716.0>},
>> {cr,<0.716.0>,#Ref<0.0.0.11820>,
>> {[28610,28609,28592,28591,28574,28573,28556,28555,
>> 28538,28537,28520,28519,28502,28501,28484,28483,
>> 28466,28465,28448,28447,28430,28429,28412,28411,
>> 28394,28393,28376,28375,28358,28357,28340,28339,
>> 28321,28320,28300,28299,28279,28278,28258,28257,
>> 28237,28236,28216,28215,28195,28194,28174,28173,
>> 28153,28152,28130,28129,28107,28106,28084,28083,
>> 28061,28060,28038,28037,28015,28014,27992,27991,
>> 27969,27968,27946,27945,27923,27922,27900,27899,
>> 27877,27876,27854,27853,27831,27830,27808,27807,
>> 27785,27784,27762,27761,27739,27738,27716,27715,
>> 27693,27692,27670,27669,27647,27646,27624,27623,
>> 27601,27600,27578],
>> [27577]},
>> 2,
>> {[{<0.716.0>,
>> {consumer,<<"amq.ctag-BTlS-GmbXNpcU-wUlMI60Q">>,
>> true}}],
>> [{<0.716.0>,
>> {consumer,<<"amq.ctag-loc5monF8M1pz_Bb8ypumw">>,
>> true}}]},
>> {qstate,<0.715.0>,suspended,{0,nil}},
>> 44}},
>> {{ch,<0.782.0>},
>> {cr,<0.782.0>,#Ref<0.0.0.15649>,
>> {[28714,28695,28676,28657,28638,28619,28601,28583,
>> 28565,28547,28529,28511,28493,28475,28457,28439,
>> 28421,28403,28385,28367,28349,28331,28312,28291,
>> 28270,28249,28228,28207,28186,28165,28142,28119,
>> 28096,28073,28050,28027,28004,27981,27958,27935,
>> 27912,27889,27866,27843,27820,27797,27774,27751,
>> 27728,27705,27682,27659,27636,27613],
>> [27590]},
>> 1,
>> {[],[]},
>> {qstate,<0.781.0>,active,{0,nil}},
>> 35}},
>> {{credit_to,<7350.846.0>},22},
>> {{ch,<0.684.0>},
>> {cr,<0.684.0>,#Ref<0.0.0.11198>,
>> {[28711,28692,28673,28654,28635,28616,28598,28580,
>> 28562,28544,28526,28508,28490,28472,28454,28436,
>> 28418,28400,28382,28364,28346,28328,28309,28288,
>> 28267,28246,28225,28204,28183,28162,28139,28116,
>> 28093,28070,28047,28024,28001,27978,27955,27932,
>> 27909,27886,27863,27840,27817,27794,27771,27748,
>> 27725,27702,27679,27656,27633,27610],
>> [27587]},
>> 1,
>> {[],[]},
>> {qstate,<0.683.0>,active,{0,nil}},
>> 11}},
>> {{ch,<0.692.0>},
>> {cr,<0.692.0>,#Ref<0.0.0.11088>,
>> {[28710,28691,28672,28653,28634,28615,28597,28579,
>> 28561,28543,28525,28507,28489,28471,28453,28435,
>> 28417,28399,28381,28363,28345,28327,28308,28287,
>> 28266,28245,28224,28203,28182,28161,28138,28115,
>> 28092,28069,28046,28023,28000,27977,27954,27931,
>> 27908,27885,27862,27839,27816,27793,27770,27747,
>> 27724,27701,27678,27655,27632,27609],
>> [27586]},
>> 1,
>> {[],[]},
>> {qstate,<0.691.0>,active,{0,nil}},
>> 12}},
>> {{ch,<0.648.0>},
>> {cr,<0.648.0>,#Ref<0.0.0.10283>,
>> {[28717,28698,28679,28660,28641],[28622]},
>> 1,
>> {[],[]},
>> {qstate,<0.647.0>,active,{0,nil}},
>> 12}},
>> {guid,{{2296174761
>> ,591959305,1562876388,1822892473},1}},
>> {{ch,<0.720.0>},
>> {cr,<0.720.0>,#Ref<0.0.0.12036>,
>> {[28723,28704,28685,28666,28647],[28628]},
>> 1,
>> {[],[]},
>> {qstate,<0.719.0>,active,{0,nil}},
>> 37}},
>> {{credit_to,<7350.869.0>},39},
>> {{ch,<0.636.0>},
>> {cr,<0.636.0>,#Ref<0.0.0.9960>,
>> {[28705,28686,28667,28648,28629,28604,28586,28568,
>> 28550,28532,28514,28496,28478,28460,28442,28424,
>> 28406,28388,28370,28352,28334,28315,28294,28273,
>> 28252,28231,28210,28189,28168,28147,28124,28101,
>> 28078,28055,28032,28009,27986,27963,27940,27917,
>> 27894,27871,27848,27825,27802,27779,27756,27733,
>> 27710,27687,27664,27641,27618,27595],
>> [27572]},
>> 1,
>> {[],[]},
>> {qstate,<0.635.0>,active,{0,nil}},
>> 15}},
>> {{credit_to,<7350.850.0>},50},
>> {{ch,<0.616.0>},
>> {cr,<0.616.0>,#Ref<0.0.0.9300>,
>> {[28719,28700,28681,28662,28643],[28624]},
>> 1,
>> {[],[]},
>> {qstate,<0.615.0>,active,{0,nil}},
>> 12}},
>> {{ch,<0.509.0>},
>> {cr,<0.509.0>,#Ref<0.0.0.2243>,
>> {[28302,28281,28260,28239,28218,28197,28176,28155,
>> 28132,28109,28086,28063,28040,28017,27994,27971],
>> [27580,27603,27626,27649,27672,27695,27718,27741,
>> 27764,27787,27810,27833,27856,27879,27902,27925,
>> 27948]},
>> 3,
>> {[],[]},
>> {qstate,<0.508.0>,active,{0,nil}},
>> 36}},
>> {{credit_to,<7350.873.0>},17},
>> {{ch,<0.588.0>},
>> {cr,<0.588.0>,#Ref<0.0.0.7808>,
>> {[28721,28702,28683,28664,28645],[28626]},
>> 1,
>> {[],[]},
>> {qstate,<0.587.0>,active,{0,nil}},
>> 12}},
>> {{ch,<0.664.0>},
>> {cr,<0.664.0>,#Ref<0.0.0.10772>,
>> {[28720,28701,28682,28663,28644],[28625]},
>> 1,
>> {[],[]},
>> {qstate,<0.663.0>,active,{0,nil}},
>> 13}},
>> {fhc_age_tree,{0,nil}},
>> {{ch,<0.708.0>},
>> {cr,<0.708.0>,#Ref<0.0.0.11695>,
>> {[28707,28688,28669,28650,28631,28606,28588,28570,
>> 28552,28534,28516,28498,28480,28462,28444,28426,
>> 28408,28390,28372,28354,28336,28317,28296,28275,
>> 28254,28233,28212,28191,28170,28149,28126,28103,
>> 28080,28057,28034,28011,27988,27965,27942,27919,
>> 27896,27873,27850,27827,27804,27781,27758,27735,
>> 27712,27689,27666,27643,27620,27597],
>> [27574]},
>> 1,
>> {[],[]},
>> {qstate,<0.707.0>,active,{0,nil}},
>> 35}},
>> {{credit_to,<7350.829.0>},13},
>> {{ch,<0.757.0>},
>> {cr,<0.757.0>,#Ref<0.0.0.13150>,
>> {[28706,28687,28668,28649,28630,28605,28587,28569,
>> 28551,28533,28515,28497,28479,28461,28443,28425,
>> 28407,28389,28371,28353,28335,28316,28295,28274,
>> 28253,28232,28211,28190,28169,28148,28125,28102,
>> 28079,28056,28033,28010,27987,27964,27941,27918,
>> 27895,27872,27849,27826,27803,27780,27757,27734,
>> 27711,27688,27665,27642,27619,27596],
>> [27573]},
>> 1,
>> {[],[]},
>> {qstate,<0.756.0>,active,{0,nil}},
>> 49}},
>> {{ch,<0.572.0>},
>> {cr,<0.572.0>,#Ref<0.0.0.7536>,
>> {[28709,28690,28671,28652,28633,28608,28590,28572,
>> 28554,28536,28518,28500,28482,28464,28446,28428,
>> 28410,28392,28374,28356,28338,28319,28298,28277,
>> 28256,28235,28214,28193,28172,28151,28128,28105,
>> 28082,28059,28036,28013,27990,27967,27944,27921,
>> 27898,27875,27852,27829,27806,27783,27760,27737,
>> 27714,27691,27668,27645,27622,27599],
>> [27576]},
>> 1,
>> {[],[]},
>> {qstate,<0.571.0>,active,{0,nil}},
>> 35}},
>> {{credit_to,<7350.825.0>},2},
>> {{credit_to,<7350.858.0>},3}]
>> trap_exit: true
>> status: running
>> heap_size: 121536
>> stack_size: 27
>> reductions: 9363545
>> neighbours:
>>
>> =SUPERVISOR REPORT==== 18-May-2013::01:41:51 ===
>> Supervisor: {local,rabbit_amqqueue_sup}
>> Context: child_terminated
>> Reason: {{case_clause,{empty,{[],[]}}},
>> [{rabbit_amqqueue_process,subtract_acks,3,[]},
>> {rabbit_amqqueue_process,subtract_acks,4,[]},
>> {rabbit_amqqueue_process,handle_cast,2,[]},
>> {gen_server2,handle_msg,2,[]},
>>
>> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>> Offender: [{pid,<0.301.0>},
>> {name,rabbit_amqqueue},
>> {mfa,
>> {rabbit_amqqueue_process,start_link,
>> [{amqqueue,
>> {resource,<<"/">>,queue,
>> <<"document_queue">>},
>>
>> true,false,none,[],<0.7622.0>,[],[],undefined,
>> []}]}},
>> {restart_type,temporary},
>> {shutdown,4294967295},
>> {child_type,worker}]
>
>
> Notice how the *case_clause *is *empty*. The other rabbit log looks
> similar:
>
> # cat /var/log/rabbitmq/rabbit at ip-10-123-123-123.log
>
>
>
> =ERROR REPORT==== 18-May-2013::01:41:48 ===
>
> ** Generic server <0.301.0> terminating
>
> ** Last message in was {'$gen_cast',
>
> {ack,
>
> [27585,27584,27607,27630,27653,27676,27699,
>
> 27722,27745,27768,27791,27814,27837,27860,
>
> 27883,27906,27929,27952,27975,27998,28021,
>
> 28044,28067,28090,28113,28136,28159,28180,
>
> 28201,28222,28243,28264,28285,28306,28325,
>
> 28343,28361,28379,28397,28415,28433,28451,
>
> 28469,28487,28505,28523,28541,28559,28577,
>
> 28595,28613],
>
> <0.676.0>}}
>
> ** When Server state == {q,
>
> {amqqueue,
>
> {resource,<<"/">>,queue,
>
> <<"document_queue">>},
>
> true,false,none,[],<0.301.0>,[],[],undefined,[]},
>
> none,true,rabbit_variable_queue,
>
> {vqstate,
>
> {0,{[],[]}},
>
> {0,{[],[]}},
>
> {delta,undefined,0,undefined},
>
> {0,{[],[]}},
>
> {0,{[],[]}},
>
> * ... [ thousands of lines of state ] ... *
>
>
>
> ** Reason for termination ==
>
> ** {{case_clause,{empty,{[],[]}}},
>
> [{rabbit_amqqueue_process,subtract_acks,3,[]},
>
> {rabbit_amqqueue_process,subtract_acks,4,[]},
>
> {rabbit_amqqueue_process,handle_cast,2,[]},
>
> {gen_server2,handle_msg,2,[]},
>
> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>
>
>> =ERROR REPORT==== 18-May-2013::01:41:51 ===
>
> connection <0.430.0>, channel 22 - soft error:
>
> {amqp_error,not_found,
>
> "no queue 'document_queue' in vhost '/'",
>
> 'queue.declare'}
>
>
>> =ERROR REPORT==== 18-May-2013::01:41:51 ===
>
> connection <0.384.0>, channel 12 - soft error:
>
> {amqp_error,not_found,
>
> "no queue 'document_queue' in vhost '/'",
>
> 'queue.declare'}
>
>
>> =ERROR REPORT==== 18-May-2013::01:41:51 ===
>
> connection <0.384.0>, channel 15 - soft error:
>
> {amqp_error,not_found,
>
> "no queue 'document_queue' in vhost '/'",
>
> 'queue.declare'}
>
>
>> =ERROR REPORT==== 18-May-2013::01:41:51 ===
>
> connection <0.373.0>, channel 23 - soft error:
>
> {amqp_error,not_found,
>
> "no queue 'document_queue' in vhost '/'",
>
> 'queue.declare'}
>
> * ... [ repeated a bunch of times ] ...*
>
>
> I also see this in my */var/log/syslog* with regards to *beam.smp*:
>
> # cat /var/log/syslog
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176041] INFO: task
>> beam.smp:18971 blocked for more than 120 seconds.
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176053] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176060] beam.smp
>> D ffff8801bfd93700 0 18971 18962 0x00000000
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176065]
>> ffff8801b11cdcb8 0000000000000282 0000000000000000 ffffffffffffffe0
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176070]
>> ffff8801b11cdfd8 ffff8801b11cdfd8 ffff8801b11cdfd8 0000000000013700
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176075]
>> ffff8801b3180000 ffff8801b35244a0 00007f500e2079e0 ffff8801b146bb80
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176079] Call Trace:
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176089]
>> [<ffffffff8165434f>] schedule+0x3f/0x60
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176094]
>> [<ffffffff8106aef5>] exit_mm+0x85/0x130
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176097]
>> [<ffffffff8106b10e>] do_exit+0x16e/0x450
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176102]
>> [<ffffffff810797aa>] ? __dequeue_signal+0x6a/0xb0
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176105]
>> [<ffffffff8106b594>] do_group_exit+0x44/0xa0
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176108]
>> [<ffffffff8107c36c>] get_signal_to_deliver+0x21c/0x420
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176113]
>> [<ffffffff81014825>] do_signal+0x45/0x130
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176116]
>> [<ffffffff8105539d>] ? set_next_entity+0xad/0xd0
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176120]
>> [<ffffffff81014ad5>] do_notify_resume+0x65/0x80
>> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176124]
>> [<ffffffff8165ea90>] int_signal+0x12/0x17
>> *... [ repeated 9 more times with same timestamp ] ...*
>
>
> There are no strange CPU spikes on my EC2 node based on AWS monitoring.
> The only thing that seems to stand out is that my writes to my EBS volume
> spike during the crash (~90 write OPs/sec, 4 MiB/sec write bandwidth,
> ~30ms/op write latency).
>
> My team and I are completely stumped and would appreciate any insight you
> might have to the crash reports. Is this a problem with writes hanging
> when going against the EBS volume since EBS volumes are essentially network
> mounted? Is there a way to configure rabbit to be more tolerant of these
> delays?
>
>
> For debugging purposes, below is the status output of one of the nodes
> when it is healthy (so you can see the versions of everything i'm running):
>
> # rabbitmqctl status
>> Status of node 'rabbit at ip-10-123-123-123' ...
>> [{pid,28978},
>> {running_applications,
>> [{rabbitmq_management,"RabbitMQ Management Console","3.1.0"},
>> {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.1.0"},
>> {webmachine,"webmachine","1.9.1-rmq3.1.0-git52e62bc"},
>> {mochiweb,"MochiMedia Web Server","2.3.1-rmq3.1.0-gitd541e9a"},
>> {rabbitmq_management_agent,"RabbitMQ Management Agent","3.1.0"},
>> {rabbit,"RabbitMQ","3.1.0"},
>> {os_mon,"CPO CXC 138 46","2.2.11"},
>> {inets,"INETS CXC 138 49","5.9.4"},
>> {mnesia,"MNESIA CXC 138 12","4.8"},
>> {amqp_client,"RabbitMQ AMQP Client","3.1.0"},
>> {xmerl,"XML parser","1.3.3"},
>> {sasl,"SASL CXC 138 11","2.3.1"},
>> {stdlib,"ERTS CXC 138 10","1.19.1"},
>> {kernel,"ERTS CXC 138 10","2.16.1"}]},
>> {os,{unix,linux}},
>> {erlang_version,
>> "Erlang R16B (erts-5.10.1) [source-05f1189] [64-bit] [smp:8:8]
>> [async-threads:30] [hipe] [kernel-poll:true]\n"},
>> {memory,
>> [{total,60179048},
>> {connection_procs,5274808},
>> {queue_procs,13107264},
>> {plugins,354272},
>> {other_proc,9741830},
>> {mnesia,150456},
>> {mgmt_db,12248},
>> {msg_index,34768},
>> {other_ets,1148248},
>> {binary,4905584},
>> {code,19565863},
>> {atom,703377},
>> {other_system,5180330}]},
>> {vm_memory_high_watermark,0.8},
>> {vm_memory_limit,5828434329},
>> {disk_free_limit,1000000000},
>> {disk_free,3953344512},
>> {file_descriptors,
>>
>> [{total_limit,924},{total_used,6},{sockets_limit,829},{sockets_used,4}]},
>> {processes,[{limit,1048576},{used,348}]},
>> {run_queue,0},
>> {uptime,3796}]
>> ...done.
>
>
> Note that I am using Erlang v16 from the Erlang Solutions repo (esl-erlang
> package). I originally was using Erlang v14 from Ubuntu repos
> (erlang-base), but decided to upgrade hoping it would resolve the issue
> (which it did not). I have also tried running RabbitMQ 3.0.4 without
> success. This issue seems to affect both versions.
>
> Thanks,
> Karl
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130518/0529f34a/attachment.htm>
More information about the rabbitmq-discuss
mailing list