[rabbitmq-discuss] Strange subtract_acks crash in RabbitMQ 3.1.0-1 (case_clause empty) on EC2
Karl Rieb
karl.rieb at gmail.com
Sat May 18 13:57:23 BST 2013
Brett Cameron <brett.r.cameron at ...> writes:
>
>
>
> Karl,Wild speculation would be that messages have been consumed from a
queue but that queue (name = document_queue?) is for some reason missing in
action (gone, inaccessible, ...) when the consumer subsequently tries to ack
the message(s).
> Brett
>
>
> On Sat, May 18, 2013 at 3:20 PM, Karl Rieb <karl.rieb <at> gmail.com>
wrote:Hi,
> I am running a RabbitMQ cluster of 4 nodes in AWS EC2 on c1.xlarge
instances using Ubuntu 12.04 LTS (kernel 3.2.0-38). Periodically two of my
nodes holding the largest queues will crash and require me to restart the
rabbitmq-server service. Looking at the logs, I'm at a loss as to why the
crash is occurring. I've have tried reproducing the crash manually by
flooding my nodes with messages, but that doesn't seem to trigger the issue.
Below is output from various RabbitMQ logs:
>
> # cat /var/log/rabbitmq/rabbit <at> ip-10-123-123-123-sasl.log
> =CRASH REPORT==== 18-May-2013::01:41:51 === crasher: initial call:
gen:init_it/6 pid: <0.301.0> registered_name: [] exception exit:
{{case_clause,{empty,{[],[]}}},
[{rabbit_amqqueue_process,subtract_acks,3,[]},
> {rabbit_amqqueue_process,subtract_acks,4,[]},
{rabbit_amqqueue_process,handle_cast,2,[]},
{gen_server2,handle_msg,2,[]}, {proc_lib,wake_up,3,
> [{file,"proc_lib.erl"},{line,249}]}]}
in function gen_server2:terminate/3 ancestors:
[rabbit_amqqueue_sup,rabbit_sup,<0.207.0>] messages: [{'$gen_cast',
{resume,<0.676.0>}},
> {'$gen_cast',{ack,[27583,27606,27629,27652,27675,27698,
27721,27744,27767,27790,27813,27836,
27859,27882,27905,27928,27951,27974,
> 27997,28020,28043,28066,28089,28112,
28135,28158,28179,28200,28221,28242,
28263,28284,28305,28324,28342,28360,
> 28378,28396,28414,28432,28450,28468,
28486,28504,28522,28540,28558,28576,
28594,28612],
<0.737.0>}},
> {'$gen_cast',{resume,<0.737.0>}},
{'$gen_cast',{ack,[27608,27631,27654,27677,27700,27723,
27746,27769,27792,27815,27838,27861,
> 27884,27907,27930,27953,27976,27999,
28022,28045,28068,28091,28114,28137,
28160,28181,28202,28223,28244,28265,
> 28286,28307,28326,28344,28362,28380,
28398,28416,28434,28452,28470,28488,
28506,28524,28542,28560,28578,28596,
> 28614],
<0.676.0>}}, {'$gen_cast',{ack,
[27578,27601,27624,27647,27670,27693,
27716,27739,27762,27785,27808,27831,
> 27854,27877,27900,27923,27946,27969,
27992,28015,28038,28061,28084,28107,
28130,28153,28174,28195,28216,28237,
> 28258,28279,28300,28321,28340,28358,
28376,28394,28412,28430,28448,28466,
28484,28502,28520,28538,28556,28574,
> 28592,28610],
<0.716.0>}}, {'$gen_cast',{resume,<0.716.0>}},
{'$gen_cast',{ack,[27582,27605,27628,27651,27674,27697,
> 27720,27743,27766,27789,27812,27835,
27858,27881,27904,27927,27950,27973,
27996,28019,28042,28065,28088,28111,
> 28134,28157,28178,28199,28220,28241,
28262,28283,28304,28323,28341,28359,
28377,28395,28413,28431,28449,28467,
> 28485,28503,28521,28539,28557,28575,
28593,28611],
<0.737.0>}}] links: [<0.300.0>]
> dictionary: [{{ch,<0.729.0>}, {cr,
<0.729.0>,#Ref<0.0.0.12296>,
{[28715,28696,28677,28658,28639,28620,28602,28584,
28566,28548,28530,28512,28494,28476,28458,28440,
> 28422,28404,28386,28368,28350,28332,28313,28292,
28271,28250,28229,28208,28187,28166,28143,28120,
28097,28074,28051,28028,28005,27982,27959,27936,
> 27913,27890,27867,27844,27821,27798,27775,27752,
27729,27706,27683,27660,27637,27614],
[27591]}, 1, {[],[]},
> {qstate,<0.728.0>,active,{0,nil}},
11}}, {{credit_to,<7350.877.0>},22},
{{ch,<0.608.0>}, {cr,<0.608.0>,#Ref<0.0.0.9041>,
> {[28708,28689,28670,28651,28632,28607,28589,28571,
28553,28535,28517,28499,28481,28463,28445,28427,
28409,28391,28373,28355,28337,28318,28297,28276,
> 28255,28234,28213,28192,28171,28150,28127,28104,
28081,28058,28035,28012,27989,27966,27943,27920,
27897,27874,27851,27828,27805,27782,27759,27736,
> 27713,27690,27667,27644,27621,27598],
[27575]}, 1, {[],
[]}, {qstate,<0.607.0>,active,{0,nil}},
> 38}}, {{ch,<0.624.0>},
{cr,<0.624.0>,#Ref<0.0.0.9436>,
{[28722,28703,28684,28665,28646],[28627]}, 1,
> {[],[]}, {qstate,
<0.623.0>,active,{0,nil}}, 36}},
{{ch,<0.696.0>}, {cr,<0.696.0>,#Ref<0.0.0.11414>,
> {[],[]}, 2,
{[],[]}, {qstate,<0.695.0>,active,{0,nil}},
24}}, {{ch,<0.564.0>},
> {cr,<0.564.0>,#Ref<0.0.0.7285>,
{[28718,28699,28680,28661,28642],[28623]}, 1,
{[],[]}, {qstate,<0.563.0>,active,
{0,nil}},
> 12}}, {{ch,<0.644.0>},
{cr,<0.644.0>,#Ref<0.0.0.10071>,
{[28713,28694,28675,28656,28637,28618,28600,28582,
28564,28546,28528,28510,28492,28474,28456,28438,
> 28420,28402,28384,28366,28348,28330,28311,28290,
28269,28248,28227,28206,28185,28164,28141,28118,
28095,28072,28049,28026,28003,27980,27957,27934,
> 27911,27888,27865,27842,27819,27796,27773,27750,
27727,27704,27681,27658,27635,27612],
[27589]}, 1, {[],[]},
> {qstate,<0.643.0>,active,{0,nil}},
11}}, {{ch,<0.676.0>}, {cr,
<0.676.0>,#Ref<0.0.0.11309>,
{[28614,28613,28596,28595,28578,28577,28560,28559,
> 28542,28541,28524,28523,28506,28505,28488,28487,
28470,28469,28452,28451,28434,28433,28416,28415,
28398,28397,28380,28379,28362,28361,28344,28343,
> 28326,28325,28307,28306,28286,28285,28265,28264,
28244,28243,28223,28222,28202,28201,28181,28180,
28160,28159,28137,28136,28114,28113,28091,28090,
> 28068,28067,28045,28044,28022,28021,27999,27998,
27976,27975,27953,27952,27930,27929,27907,27906,
27884,27883,27861,27860,27838,27837,27815,27814,
> 27792,27791,27769,27768,27746,27745,27723,27722,
27700,27699,27677,27676,27654,27653,27631,27630,
27608,27607,27585], [27584]},
> 2, {[{<0.676.0>,
{consumer,<<"amq.ctag-0cvDGmyaAbEW_j15AfhGHQ">>,
true}}],
> [{<0.676.0>, {consumer,
<<"amq.ctag-YqU04SIfQv6P-QVueEjbRg">>,
true}}]}, {qstate,<0.675.0>,suspended,{0,nil}},
> 25}},
{{xtype_to_module,direct},rabbit_exchange_type_direct},
{{credit_to,<7350.833.0>},31}, {{ch,<0.745.0>},
{cr,<0.745.0>,#Ref<0.0.0.12674>,
> {[28716,28697,28678,28659,28640,28621,28603,28585,
28567,28549,28531,28513,28495,28477,28459,28441,
28423,28405,28387,28369,28351,28333,28314,28293,
> 28272,28251,28230,28209,28188,28167,28144,28121,
28098,28075,28052,28029,28006,27983,27960,27937,
27914,27891,27868,27845,27822,27799,27776,27753,
> 27730,27707,27684,27661,27638,27615],
[27592]}, 1, {[],
[]}, {qstate,<0.744.0>,active,{0,nil}},
> 37}}, {{ch,<0.660.0>},
{cr,<0.660.0>,#Ref<0.0.0.10650>,
{[28712,28693,28674,28655,28636,28617,28599,28581,
28563,28545,28527,28509,28491,28473,28455,28437,
> 28419,28401,28383,28365,28347,28329,28310,28289,
28268,28247,28226,28205,28184,28163,28140,28117,
28094,28071,28048,28025,28002,27979,27956,27933,
> 27910,27887,27864,27841,27818,27795,27772,27749,
27726,27703,27680,27657,27634,27611],
[27588]}, 1, {[],[]},
> {qstate,<0.659.0>,active,{0,nil}},
11}}, {{credit_to,<7350.837.0>},32},
{{credit_to,<7350.881.0>},17}, {{ch,<0.737.0>},
> {cr,<0.737.0>,#Ref<0.0.0.12429>,
{[28612,28611,28594,28593,28576,28575,28558,28557,
28540,28539,28522,28521,28504,28503,28486,28485,
> 28468,28467,28450,28449,28432,28431,28414,28413,
28396,28395,28378,28377,28360,28359,28342,28341,
28324,28323,28305,28304,28284,28283,28263,28262,
> 28242,28241,28221,28220,28200,28199,28179,28178,
28158,28157,28135,28134,28112,28111,28089,28088,
28066,28065,28043,28042,28020,28019,27997,27996,
> 27974,27973,27951,27950,27928,27927,27905,27904,
27882,27881,27859,27858,27836,27835,27813,27812,
27790,27789,27767,27766,27744,27743,27721,27720,
> 27698,27697,27675,27674,27652,27651,27629,27628,
27606,27605,27583], [27582]},
2, {[{<0.737.0>,
> {consumer,<<"amq.ctag-
ITZqhulzHhUzx1uy1Eot3g">>, true}}],
[{<0.737.0>, {consumer,
<<"amq.ctag-hpo5ejdoJRdLrT6JS42jGw">>,
> true}}]},
{qstate,<0.736.0>,suspended,{0,nil}}, 11}},
{credit_blocked,[]}, {{credit_to,<7350.854.0>},3},
> {{ch,<0.716.0>}, {cr,
<0.716.0>,#Ref<0.0.0.11820>,
{[28610,28609,28592,28591,28574,28573,28556,28555,
28538,28537,28520,28519,28502,28501,28484,28483,
> 28466,28465,28448,28447,28430,28429,28412,28411,
28394,28393,28376,28375,28358,28357,28340,28339,
28321,28320,28300,28299,28279,28278,28258,28257,
> 28237,28236,28216,28215,28195,28194,28174,28173,
28153,28152,28130,28129,28107,28106,28084,28083,
28061,28060,28038,28037,28015,28014,27992,27991,
> 27969,27968,27946,27945,27923,27922,27900,27899,
27877,27876,27854,27853,27831,27830,27808,27807,
27785,27784,27762,27761,27739,27738,27716,27715,
> 27693,27692,27670,27669,27647,27646,27624,27623,
27601,27600,27578], [27577]},
2, {[{<0.716.0>,
> {consumer,<<"amq.ctag-BTlS-GmbXNpcU-
wUlMI60Q">>, true}}],
[{<0.716.0>, {consumer,<<"amq.ctag-
loc5monF8M1pz_Bb8ypumw">>,
> true}}]},
{qstate,<0.715.0>,suspended,{0,nil}}, 44}},
{{ch,<0.782.0>}, {cr,<0.782.0>,#Ref<0.0.0.15649>,
> {[28714,28695,28676,28657,28638,28619,28601,28583,
28565,28547,28529,28511,28493,28475,28457,28439,
28421,28403,28385,28367,28349,28331,28312,28291,
> 28270,28249,28228,28207,28186,28165,28142,28119,
28096,28073,28050,28027,28004,27981,27958,27935,
27912,27889,27866,27843,27820,27797,27774,27751,
> 27728,27705,27682,27659,27636,27613],
[27590]}, 1, {[],
[]}, {qstate,<0.781.0>,active,{0,nil}},
> 35}}, {{credit_to,
<7350.846.0>},22}, {{ch,<0.684.0>}, {cr,
<0.684.0>,#Ref<0.0.0.11198>,
{[28711,28692,28673,28654,28635,28616,28598,28580,
> 28562,28544,28526,28508,28490,28472,28454,28436,
28418,28400,28382,28364,28346,28328,28309,28288,
28267,28246,28225,28204,28183,28162,28139,28116,
> 28093,28070,28047,28024,28001,27978,27955,27932,
27909,27886,27863,27840,27817,27794,27771,27748,
27725,27702,27679,27656,27633,27610],
[27587]},
> 1, {[],[]},
{qstate,<0.683.0>,active,{0,nil}}, 11}},
{{ch,<0.692.0>}, {cr,<0.692.0>,#Ref<0.0.0.11088>,
> {[28710,28691,28672,28653,28634,28615,28597,28579,
28561,28543,28525,28507,28489,28471,28453,28435,
28417,28399,28381,28363,28345,28327,28308,28287,
> 28266,28245,28224,28203,28182,28161,28138,28115,
28092,28069,28046,28023,28000,27977,27954,27931,
27908,27885,27862,27839,27816,27793,27770,27747,
> 27724,27701,27678,27655,27632,27609],
[27586]}, 1, {[],
[]}, {qstate,<0.691.0>,active,{0,nil}},
> 12}}, {{ch,<0.648.0>},
{cr,<0.648.0>,#Ref<0.0.0.10283>,
{[28717,28698,28679,28660,28641],[28622]}, 1,
> {[],[]}, {qstate,
<0.647.0>,active,{0,nil}}, 12}},
{guid,{{2296174761,591959305,1562876388,1822892473},1}},
> {{ch,<0.720.0>}, {cr,
<0.720.0>,#Ref<0.0.0.12036>,
{[28723,28704,28685,28666,28647],[28628]}, 1,
{[],[]},
> {qstate,<0.719.0>,active,{0,nil}},
37}}, {{credit_to,<7350.869.0>},39},
{{ch,<0.636.0>}, {cr,<0.636.0>,#Ref<0.0.0.9960>,
> {[28705,28686,28667,28648,28629,28604,28586,28568,
28550,28532,28514,28496,28478,28460,28442,28424,
28406,28388,28370,28352,28334,28315,28294,28273,
> 28252,28231,28210,28189,28168,28147,28124,28101,
28078,28055,28032,28009,27986,27963,27940,27917,
27894,27871,27848,27825,27802,27779,27756,27733,
> 27710,27687,27664,27641,27618,27595],
[27572]}, 1, {[],
[]}, {qstate,<0.635.0>,active,{0,nil}},
> 15}}, {{credit_to,
<7350.850.0>},50}, {{ch,<0.616.0>}, {cr,
<0.616.0>,#Ref<0.0.0.9300>,
{[28719,28700,28681,28662,28643],[28624]},
> 1, {[],[]},
{qstate,<0.615.0>,active,{0,nil}}, 12}},
{{ch,<0.509.0>}, {cr,<0.509.0>,#Ref<0.0.0.2243>,
> {[28302,28281,28260,28239,28218,28197,28176,28155,
28132,28109,28086,28063,28040,28017,27994,27971],
[27580,27603,27626,27649,27672,27695,27718,27741,
> 27764,27787,27810,27833,27856,27879,27902,27925,
27948]}, 3,
{[],[]}, {qstate,<0.508.0>,active,{0,nil}},
> 36}}, {{credit_to,
<7350.873.0>},17}, {{ch,<0.588.0>}, {cr,
<0.588.0>,#Ref<0.0.0.7808>,
{[28721,28702,28683,28664,28645],[28626]},
> 1, {[],[]},
{qstate,<0.587.0>,active,{0,nil}}, 12}},
{{ch,<0.664.0>}, {cr,<0.664.0>,#Ref<0.0.0.10772>,
> {[28720,28701,28682,28663,28644],[28625]},
1, {[],[]},
{qstate,<0.663.0>,active,{0,nil}}, 13}},
> {fhc_age_tree,{0,nil}}, {{ch,
<0.708.0>}, {cr,<0.708.0>,#Ref<0.0.0.11695>,
{[28707,28688,28669,28650,28631,28606,28588,28570,
> 28552,28534,28516,28498,28480,28462,28444,28426,
28408,28390,28372,28354,28336,28317,28296,28275,
28254,28233,28212,28191,28170,28149,28126,28103,
> 28080,28057,28034,28011,27988,27965,27942,27919,
27896,27873,27850,27827,27804,27781,27758,27735,
27712,27689,27666,27643,27620,27597],
[27574]},
> 1, {[],[]},
{qstate,<0.707.0>,active,{0,nil}}, 35}},
{{credit_to,<7350.829.0>},13},
> {{ch,<0.757.0>}, {cr,
<0.757.0>,#Ref<0.0.0.13150>,
{[28706,28687,28668,28649,28630,28605,28587,28569,
28551,28533,28515,28497,28479,28461,28443,28425,
> 28407,28389,28371,28353,28335,28316,28295,28274,
28253,28232,28211,28190,28169,28148,28125,28102,
28079,28056,28033,28010,27987,27964,27941,27918,
> 27895,27872,27849,27826,27803,27780,27757,27734,
27711,27688,27665,27642,27619,27596],
[27573]}, 1, {[],[]},
> {qstate,<0.756.0>,active,{0,nil}},
49}}, {{ch,<0.572.0>}, {cr,
<0.572.0>,#Ref<0.0.0.7536>,
{[28709,28690,28671,28652,28633,28608,28590,28572,
> 28554,28536,28518,28500,28482,28464,28446,28428,
28410,28392,28374,28356,28338,28319,28298,28277,
28256,28235,28214,28193,28172,28151,28128,28105,
> 28082,28059,28036,28013,27990,27967,27944,27921,
27898,27875,27852,27829,27806,27783,27760,27737,
27714,27691,27668,27645,27622,27599],
[27576]},
> 1, {[],[]},
{qstate,<0.571.0>,active,{0,nil}}, 35}},
{{credit_to,<7350.825.0>},2}, {{credit_to,
<7350.858.0>},3}]
> trap_exit: true status: running heap_size: 121536 stack_size:
27 reductions: 9363545 neighbours:=SUPERVISOR REPORT==== 18-May-
2013::01:41:51 === Supervisor: {local,rabbit_amqqueue_sup}
> Context: child_terminated Reason: {{case_clause,{empty,
{[],[]}}}, [{rabbit_amqqueue_process,subtract_acks,3,[]},
{rabbit_amqqueue_process,subtract_acks,4,[]},
> {rabbit_amqqueue_process,handle_cast,2,[]},
{gen_server2,handle_msg,2,[]}, {proc_lib,wake_up,3,
[{file,"proc_lib.erl"},{line,249}]}]} Offender: [{pid,<0.301.0>},
> {name,rabbit_amqqueue}, {mfa,
{rabbit_amqqueue_process,start_link,
[{amqqueue, {resource,<<"/">>,queue,
> <<"document_queue">>},
true,false,none,[],<0.7622.0>,[],[],undefined,
[]}]}}, {restart_type,temporary},
> {shutdown,4294967295},
{child_type,worker}]
>
> Notice how the case_clause is empty. The other rabbit log looks similar:
>
> # cat /var/log/rabbitmq/rabbit-
72dRMgatfUn5amjFlPYPZgfGRAsWV3w2 at public.gmane.org
>
>
> =ERROR REPORT==== 18-May-2013::01:41:48 ===
> ** Generic server <0.301.0> terminating
> ** Last message in was {'$gen_cast',
>
> {ack,
> [27585,27584,27607,27630,27653,27676,27699,
> 27722,27745,27768,27791,27814,27837,27860,
> 27883,27906,27929,27952,27975,27998,28021,
> 28044,28067,28090,28113,28136,28159,28180,
> 28201,28222,28243,28264,28285,28306,28325,
> 28343,28361,28379,28397,28415,28433,28451,
> 28469,28487,28505,28523,28541,28559,28577,
> 28595,28613],
> <0.676.0>}}
> ** When Server state == {q,
>
> {amqqueue,
> {resource,<<"/">>,queue,
> <<"document_queue">>},
> true,false,none,[],<0.301.0>,[],[],undefined,
[]},
> none,true,rabbit_variable_queue,
> {vqstate,
>
> {0,{[],[]}},
> {0,{[],[]}},
> {delta,undefined,0,undefined},
> {0,{[],[]}},
>
> {0,{[],[]}},
> ... [ thousands of lines of state ] ...
>
>
> ** Reason for termination ==
> ** {{case_clause,{empty,{[],[]}}},
> [{rabbit_amqqueue_process,subtract_acks,3,[]},
> {rabbit_amqqueue_process,subtract_acks,4,[]},
> {rabbit_amqqueue_process,handle_cast,2,[]},
> {gen_server2,handle_msg,2,[]},
>
> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
>
>
>
> =ERROR REPORT==== 18-May-2013::01:41:51 ===
> connection <0.430.0>, channel 22 - soft error:
> {amqp_error,not_found,
>
> "no queue 'document_queue' in vhost '/'",
>
> 'queue.declare'}
>
> =ERROR REPORT==== 18-May-2013::01:41:51 ===
>
> connection <0.384.0>, channel 12 - soft error:
>
> {amqp_error,not_found,
> "no queue 'document_queue' in vhost '/'",
> 'queue.declare'}
>
>
>
> =ERROR REPORT==== 18-May-2013::01:41:51 ===
> connection <0.384.0>, channel 15 - soft error:
> {amqp_error,not_found,
>
> "no queue 'document_queue' in vhost '/'",
>
> 'queue.declare'}
>
> =ERROR REPORT==== 18-May-2013::01:41:51 ===
>
> connection <0.373.0>, channel 23 - soft error:
>
> {amqp_error,not_found,
> "no queue 'document_queue' in vhost '/'",
> 'queue.declare'}
>
> ... [ repeated a bunch of times ] ...
>
> I also see this in my /var/log/syslog with regards to beam.smp:
>
>
> # cat /var/log/syslogMay 17 22:46:23 ip-10-123-123-123 kernel:
[244800.176041] INFO: task beam.smp:18971 blocked for more than 120
seconds.May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176053] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176060] beam.smp
D ffff8801bfd93700 0 18971 18962 0x00000000May 17 22:46:23 ip-10-123-
123-123 kernel: [244800.176065] ffff8801b11cdcb8 0000000000000282
0000000000000000 ffffffffffffffe0
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176070]
ffff8801b11cdfd8 ffff8801b11cdfd8 ffff8801b11cdfd8 0000000000013700May 17
22:46:23 ip-10-123-123-123 kernel: [244800.176075] ffff8801b3180000
ffff8801b35244a0 00007f500e2079e0 ffff8801b146bb80
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176079] Call Trace:May
17 22:46:23 ip-10-123-123-123 kernel: [244800.176089] [<ffffffff8165434f>]
schedule+0x3f/0x60May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176094]
[<ffffffff8106aef5>] exit_mm+0x85/0x130
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176097]
[<ffffffff8106b10e>] do_exit+0x16e/0x450May 17 22:46:23 ip-10-123-123-123
kernel: [244800.176102] [<ffffffff810797aa>] ? __dequeue_signal+0x6a/0xb0
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176105]
[<ffffffff8106b594>] do_group_exit+0x44/0xa0May 17 22:46:23 ip-10-123-123-
123 kernel: [244800.176108] [<ffffffff8107c36c>]
get_signal_to_deliver+0x21c/0x420
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176113]
[<ffffffff81014825>] do_signal+0x45/0x130May 17 22:46:23 ip-10-123-123-123
kernel: [244800.176116] [<ffffffff8105539d>] ? set_next_entity+0xad/0xd0
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176120]
[<ffffffff81014ad5>] do_notify_resume+0x65/0x80May 17 22:46:23 ip-10-123-
123-123 kernel: [244800.176124] [<ffffffff8165ea90>]
int_signal+0x12/0x17... [ repeated 9 more times with same timestamp ] ...
>
> There are no strange CPU spikes on my EC2 node based on AWS monitoring.
The only thing that seems to stand out is that my writes to my EBS volume
spike during the crash (~90 write OPs/sec, 4 MiB/sec write bandwidth,
~30ms/op write latency).
>
> My team and I are completely stumped and would appreciate any insight you
might have to the crash reports. Is this a problem with writes hanging when
going against the EBS volume since EBS volumes are essentially network
mounted? Is there a way to configure rabbit to be more tolerant of these
delays?
>
>
> For debugging purposes, below is the status output of one of the nodes
when it is healthy (so you can see the versions of everything i'm running):
>
>
>
> # rabbitmqctl statusStatus of node 'rabbit <at> ip-10-123-123-123' ...
[{pid,28978}, {running_applications, [{rabbitmq_management,"RabbitMQ
Management Console","3.1.0"}, {rabbitmq_web_dispatch,"RabbitMQ Web
Dispatcher","3.1.0"},
> {webmachine,"webmachine","1.9.1-rmq3.1.0-git52e62bc"},
{mochiweb,"MochiMedia Web Server","2.3.1-rmq3.1.0-gitd541e9a"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.1.0"},
> {rabbit,"RabbitMQ","3.1.0"}, {os_mon,"CPO CXC 138
46","2.2.11"}, {inets,"INETS CXC 138 49","5.9.4"},
{mnesia,"MNESIA CXC 138 12","4.8"},
> {amqp_client,"RabbitMQ AMQP Client","3.1.0"}, {xmerl,"XML
parser","1.3.3"}, {sasl,"SASL CXC 138 11","2.3.1"}, {stdlib,"ERTS
CXC 138 10","1.19.1"},
> {kernel,"ERTS CXC 138 10","2.16.1"}]}, {os,
{unix,linux}}, {erlang_version, "Erlang R16B (erts-5.10.1) [source-
05f1189] [64-bit] [smp:8:8] [async-threads:30] [hipe] [kernel-
poll:true]\n"},
> {memory, [{total,60179048}, {connection_procs,5274808},
{queue_procs,13107264}, {plugins,354272}, {other_proc,9741830},
> {mnesia,150456}, {mgmt_db,12248}, {msg_index,34768},
{other_ets,1148248}, {binary,4905584}, {code,19565863},
{atom,703377}, {other_system,5180330}]},
>
{vm_memory_high_watermark,0.8}, {vm_memory_limit,5828434329}, {disk_free_li
mit,1000000000}, {disk_free,3953344512}, {file_descriptors,
[{total_limit,924},{total_used,6},{sockets_limit,829},{sockets_used,4}]},
> {processes,[{limit,1048576},
{used,348}]}, {run_queue,0}, {uptime,3796}]...done.
>
> Note that I am using Erlang v16 from the Erlang Solutions repo (esl-erlang
package). I originally was using Erlang v14 from Ubuntu repos (erlang-
base), but decided to upgrade hoping it would resolve the issue (which it
did not). I have also tried running RabbitMQ 3.0.4 without success. This
issue seems to affect both versions.
>
> Thanks,
> Karl
> _______________________________________________
> rabbitmq-discuss mailing listrabbitmq-discuss <at>
lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-
bin/mailman/listinfo/rabbitmq-discuss
>
>
>
>
>
>
> <div>
> <div dir="ltr">
> <div>Karl,<br><br>Wild speculation would be that messages have been
consumed from a queue but that queue (name = document_queue?) is for some
reason missing in action (gone, inaccessible, ...) when the consumer
subsequently tries to ack the message(s). <br><br>
> </div>Brett<br><br>
> </div>
> <div class="gmail_extra">
> <br><br><div class="gmail_quote">On Sat, May 18, 2013 at 3:20 PM, Karl
Rieb <span dir="ltr"><<a href="mailto:karl.rieb at ..."
target="_blank">karl.rieb <at> gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote">Hi,<div><br></div>
> <div>I am running a RabbitMQ cluster of 4 nodes in AWS EC2 on c1.xlarge
instances using Ubuntu 12.04 LTS (kernel 3.2.0-38). Periodically two
of my nodes holding the largest queues will crash and require me to restart
the rabbitmq-server service. Looking at the logs, I'm at a loss as to
why the crash is occurring. I've have tried reproducing the crash
manually by flooding my nodes with messages, but that doesn't seem to
trigger the issue. Below is output from various RabbitMQ logs:</div>
> <div><br></div>
> <blockquote class="gmail_quote"># cat /var/log/rabbitmq/rabbit <at> ip-10-
123-123-123-sasl.log<br>
> =CRASH REPORT==== 18-May-2013::01:41:51 ===<br> crasher:<br>
initial call: gen:init_it/6<br> pid: <0.301.0>
<br> registered_name: []<br> exception exit:
{{case_clause,{empty,{[],[]}}},<br>
[{rabbit_amqqueue_process,subtract_acks,3,
[]},<br>
>
{rabbit_amqqueue_process,subtract_acks,4,[]},<br>
{rabbit_amqqueue_process,handle_cast,2,[]},<br>
{gen_server2,handle_msg,2,
[]},<br>
{proc_lib,wake_up,3,<br>
>
[{file,"proc_lib.erl"},
{line,249}]}]}<br> in function
gen_server2:terminate/3<br> ancestors:
[rabbit_amqqueue_sup,rabbit_sup,<0.207.0>]<br> messages:
[{'$gen_cast',{resume,<0.676.0>}},<br>
>
{'$gen_cast',{ack,[27583,27606,27629,27652,27675,27698,<br>
27721,27744,27767,27790,27813,27836,<br>
27859,27882,27905,27928,27951,27974,<br>
>
27997,28020,28043,28066,28089,28112,<br>
28135,28158,28179,28200,28221,28242,<br>
28263,28284,28305,28324,28342,28360,<br>
>
28378,28396,28414,28432,28450,28468,<br>
28486,28504,28522,28540,28558,28576,<br>
28594,28612],<br>
<0.737.0>}},<br>
>
{'$gen_cast',{resume,<0.737.0>}},<br>
{'$gen_cast',{ack,
[27608,27631,27654,27677,27700,27723,<br>
27746,27769,27792,27815,27838,27861,<br>
>
27884,27907,27930,27953,27976,27999,<br>
28022,28045,28068,28091,28114,28137,<br>
28160,28181,28202,28223,28244,28265,<br>
>
28286,28307,28326,28344,28362,28380,<br>
28398,28416,28434,28452,28470,28488,<br>
28506,28524,28542,28560,28578,28596,<br>
>
28614],
<br>
<0.676.0>}},
<br>
{'$gen_cast',{ack,[27578,27601,27624,27647,27670,27693,<br>
27716,27739,27762,27785,27808,27831,<br>
>
27854,27877,27900,27923,27946,27969,<br>
27992,28015,28038,28061,28084,28107,<br>
28130,28153,28174,28195,28216,28237,<br>
>
28258,28279,28300,28321,28340,28358,<br>
28376,28394,28412,28430,28448,28466,<br>
28484,28502,28520,28538,28556,28574,<br>
>
28592,28610],
<br>
<0.716.0>}},
<br>
{'$gen_cast',{resume,<0.716.0>}},<br>
{'$gen_cast',{ack,
[27582,27605,27628,27651,27674,27697,<br>
>
27720,27743,27766,27789,27812,27835,<br>
27858,27881,27904,27927,27950,27973,<br>
27996,28019,28042,28065,28088,28111,<br>
>
28134,28157,28178,28199,28220,28241,<br>
28262,28283,28304,28323,28341,28359,<br>
28377,28395,28413,28431,28449,28467,<br>
>
28485,28503,28521,28539,28557,28575,<br>
28593,28611],<br>
<0.737.0>}}]<br> links:
[<0.300.0>]<br>
> dictionary: [{{ch,<0.729.0>},<br>
{cr,<0.729.0>,#Ref<0.0.0.12296>,<br>
{[28715,28696,28677,28658,28639,28620,28602,28584,<br>
28566,28548,28530,28512,28494,28476,28458,28440,<br>
>
28422,28404,28386,28368,28350,28332,28313,28292,
<br>
28271,28250,28229,28208,28187,28166,28143,28120,
<br>
28097,28074,28051,28028,28005,27982,27959,27936,<br>
>
27913,27890,27867,27844,27821,27798,27775,27752,
<br>
27729,27706,27683,27660,27637,27614],<br>
[27591]},<br>
1,<br>
{[],[]},<br>
>
{qstate,<0.728.0>,active,{0,nil}},<br>
11}},
<br>
{{credit_to,<7350.877.0>},22},<br>
{{ch,<0.608.0>},<br>
{cr,<0.608.0>,#Ref<0.0.0.9041>,<br>
>
{[28708,28689,28670,28651,28632,28607,28589,28571,<br>
28553,28535,28517,28499,28481,28463,28445,28427,<br>
28409,28391,28373,28355,28337,28318,28297,28276,<br>
>
28255,28234,28213,28192,28171,28150,28127,28104,
<br>
28081,28058,28035,28012,27989,27966,27943,27920,
<br>
27897,27874,27851,27828,27805,27782,27759,27736,<br>
>
27713,27690,27667,27644,27621,27598],<br>
[27575]},<br>
1,<br>
{[],[]},<br>
{qstate,<0.607.0>,active,{0,nil}},<br>
>
38}},<br>
{{ch,<0.624.0>},<br>
{cr,<0.624.0>,#Ref<0.0.0.9436>,<br>
{[28722,28703,28684,28665,28646],[28627]},<br>
1,<br>
>
{[],[]},<br>
{qstate,<0.623.0>,active,{0,nil}},
<br>
36}},<br>
{{ch,<0.696.0>},<br>
{cr,<0.696.0>,#Ref<0.0.0.11414>,<br>
>
{[],[]},<br>
2,<br>
{[],[]},<br>
{qstate,<0.695.0>,active,{0,nil}},<br>
24}},<br>
{{ch,<0.564.0>},<br>
>
{cr,<0.564.0>,#Ref<0.0.0.7285>,<br>
{[28718,28699,28680,28661,28642],[28623]},<br>
1,<br>
{[],
[]},<br>
{qstate,<0.563.0>,active,{0,nil}},<br>
>
12}},<br>
{{ch,<0.644.0>},<br>
{cr,<0.644.0>,#Ref<0.0.0.10071>,<br>
{[28713,28694,28675,28656,28637,28618,28600,28582,<br>
28564,28546,28528,28510,28492,28474,28456,28438,<br>
>
28420,28402,28384,28366,28348,28330,28311,28290,
<br>
28269,28248,28227,28206,28185,28164,28141,28118,
<br>
28095,28072,28049,28026,28003,27980,27957,27934,<br>
>
27911,27888,27865,27842,27819,27796,27773,27750,
<br>
27727,27704,27681,27658,27635,27612],<br>
[27589]},<br>
1,<br>
{[],[]},<br>
>
{qstate,<0.643.0>,active,{0,nil}},<br>
11}},
<br>
{{ch,<0.676.0>},<br>
{cr,<0.676.0>,#Ref<0.0.0.11309>,<br>
{[28614,28613,28596,28595,28578,28577,28560,28559,<br>
>
28542,28541,28524,28523,28506,28505,28488,28487,
<br>
28470,28469,28452,28451,28434,28433,28416,28415,
<br>
28398,28397,28380,28379,28362,28361,28344,28343,<br>
>
28326,28325,28307,28306,28286,28285,28265,28264,
<br>
28244,28243,28223,28222,28202,28201,28181,28180,
<br>
28160,28159,28137,28136,28114,28113,28091,28090,<br>
>
28068,28067,28045,28044,28022,28021,27999,27998,
<br>
27976,27975,27953,27952,27930,27929,27907,27906,
<br>
27884,27883,27861,27860,27838,27837,27815,27814,<br>
>
27792,27791,27769,27768,27746,27745,27723,27722,
<br>
27700,27699,27677,27676,27654,27653,27631,27630,
<br>
27608,27607,27585],<br>
[27584]},<br>
>
2,<br>
{[{<0.676.0>,<br>
{consumer,<<"amq.ctag-0cvDGmyaAbEW_j15AfhGHQ">>,<br>
true}}],<br>
>
[{<0.676.0>,<br>
{consumer,<<"amq.ctag-YqU04SIfQv6P-QVueEjbRg">>,<br>
true}}]},<br>
{qstate,<0.675.0>,suspended,{0,nil}},<br>
>
25}},<br>
{{xtype_to_module,direct},rabbit_exchange_type_direct},
<br>
{{credit_to,<7350.833.0>},31},<br>
{{ch,<0.745.0>},<br>
{cr,<0.745.0>,#Ref<0.0.0.12674>,<br>
>
{[28716,28697,28678,28659,28640,28621,28603,28585,<br>
28567,28549,28531,28513,28495,28477,28459,28441,<br>
28423,28405,28387,28369,28351,28333,28314,28293,<br>
>
28272,28251,28230,28209,28188,28167,28144,28121,
<br>
28098,28075,28052,28029,28006,27983,27960,27937,
<br>
27914,27891,27868,27845,27822,27799,27776,27753,<br>
>
27730,27707,27684,27661,27638,27615],<br>
[27592]},<br>
1,<br>
{[],[]},<br>
{qstate,<0.744.0>,active,{0,nil}},<br>
>
37}},<br>
{{ch,<0.660.0>},<br>
{cr,<0.660.0>,#Ref<0.0.0.10650>,<br>
{[28712,28693,28674,28655,28636,28617,28599,28581,<br>
28563,28545,28527,28509,28491,28473,28455,28437,<br>
>
28419,28401,28383,28365,28347,28329,28310,28289,
<br>
28268,28247,28226,28205,28184,28163,28140,28117,
<br>
28094,28071,28048,28025,28002,27979,27956,27933,<br>
>
27910,27887,27864,27841,27818,27795,27772,27749,
<br>
27726,27703,27680,27657,27634,27611],<br>
[27588]},<br>
1,<br>
{[],[]},<br>
>
{qstate,<0.659.0>,active,{0,nil}},<br>
11}},
<br>
{{credit_to,<7350.837.0>},32},<br>
{{credit_to,<7350.881.0>},17},<br>
{{ch,<0.737.0>},<br>
>
{cr,<0.737.0>,#Ref<0.0.0.12429>,<br>
{[28612,28611,28594,28593,28576,28575,28558,28557,<br>
28540,28539,28522,28521,28504,28503,28486,28485,<br>
>
28468,28467,28450,28449,28432,28431,28414,28413,
<br>
28396,28395,28378,28377,28360,28359,28342,28341,
<br>
28324,28323,28305,28304,28284,28283,28263,28262,<br>
>
28242,28241,28221,28220,28200,28199,28179,28178,
<br>
28158,28157,28135,28134,28112,28111,28089,28088,
<br>
28066,28065,28043,28042,28020,28019,27997,27996,<br>
>
27974,27973,27951,27950,27928,27927,27905,27904,
<br>
27882,27881,27859,27858,27836,27835,27813,27812,
<br>
27790,27789,27767,27766,27744,27743,27721,27720,<br>
>
27698,27697,27675,27674,27652,27651,27629,27628,
<br>
27606,27605,27583],<br>
[27582]},<br>
2,<br>
{[{<0.737.0>,<br>
>
{consumer,<<"amq.ctag-
ITZqhulzHhUzx1uy1Eot3g">>,<br>
true}}],<br>
[{<0.737.0>,<br>
{consumer,<<"amq.ctag-hpo5ejdoJRdLrT6JS42jGw">>,<br>
>
true}}]},<br>
{qstate,<0.736.0>,suspended,{0,nil}},<br>
11}},<br>
{credit_blocked,[]},
<br>
{{credit_to,<7350.854.0>},3},<br>
>
{{ch,<0.716.0>},<br>
{cr,<0.716.0>,#Ref<0.0.0.11820>,<br>
{[28610,28609,28592,28591,28574,28573,28556,28555,<br>
28538,28537,28520,28519,28502,28501,28484,28483,<br>
>
28466,28465,28448,28447,28430,28429,28412,28411,
<br>
28394,28393,28376,28375,28358,28357,28340,28339,
<br>
28321,28320,28300,28299,28279,28278,28258,28257,<br>
>
28237,28236,28216,28215,28195,28194,28174,28173,
<br>
28153,28152,28130,28129,28107,28106,28084,28083,
<br>
28061,28060,28038,28037,28015,28014,27992,27991,<br>
>
27969,27968,27946,27945,27923,27922,27900,27899,
<br>
27877,27876,27854,27853,27831,27830,27808,27807,
<br>
27785,27784,27762,27761,27739,27738,27716,27715,<br>
>
27693,27692,27670,27669,27647,27646,27624,27623,
<br>
27601,27600,27578],<br>
[27577]},<br>
2,<br>
{[{<0.716.0>,<br>
>
{consumer,<<"amq.ctag-BTlS-GmbXNpcU-
wUlMI60Q">>,<br>
true}}],<br>
[{<0.716.0>,<br>
{consumer,<<"amq.ctag-loc5monF8M1pz_Bb8ypumw">>,<br>
>
true}}]},<br>
{qstate,<0.715.0>,suspended,{0,nil}},<br>
44}},<br>
{{ch,<0.782.0>},<br>
{cr,<0.782.0>,#Ref<0.0.0.15649>,<br>
>
{[28714,28695,28676,28657,28638,28619,28601,28583,<br>
28565,28547,28529,28511,28493,28475,28457,28439,<br>
28421,28403,28385,28367,28349,28331,28312,28291,<br>
>
28270,28249,28228,28207,28186,28165,28142,28119,
<br>
28096,28073,28050,28027,28004,27981,27958,27935,
<br>
27912,27889,27866,27843,27820,27797,27774,27751,<br>
>
27728,27705,27682,27659,27636,27613],<br>
[27590]},<br>
1,<br>
{[],[]},<br>
{qstate,<0.781.0>,active,{0,nil}},<br>
>
35}},<br>
{{credit_to,<7350.846.0>},22},<br>
{{ch,<0.684.0>},<br>
{cr,<0.684.0>,#Ref<0.0.0.11198>,<br>
{[28711,28692,28673,28654,28635,28616,28598,28580,<br>
>
28562,28544,28526,28508,28490,28472,28454,28436,
<br>
28418,28400,28382,28364,28346,28328,28309,28288,
<br>
28267,28246,28225,28204,28183,28162,28139,28116,<br>
>
28093,28070,28047,28024,28001,27978,27955,27932,
<br>
27909,27886,27863,27840,27817,27794,27771,27748,
<br>
27725,27702,27679,27656,27633,27610],<br>
[27587]},<br>
>
1,<br>
{[],[]},<br>
{qstate,<0.683.0>,active,{0,nil}},<br>
11}},<br>
{{ch,<0.692.0>},<br>
{cr,<0.692.0>,#Ref<0.0.0.11088>,<br>
>
{[28710,28691,28672,28653,28634,28615,28597,28579,<br>
28561,28543,28525,28507,28489,28471,28453,28435,<br>
28417,28399,28381,28363,28345,28327,28308,28287,<br>
>
28266,28245,28224,28203,28182,28161,28138,28115,
<br>
28092,28069,28046,28023,28000,27977,27954,27931,
<br>
27908,27885,27862,27839,27816,27793,27770,27747,<br>
>
27724,27701,27678,27655,27632,27609],<br>
[27586]},<br>
1,<br>
{[],[]},<br>
{qstate,<0.691.0>,active,{0,nil}},<br>
>
12}},<br>
{{ch,<0.648.0>},<br>
{cr,<0.648.0>,#Ref<0.0.0.10283>,<br>
{[28717,28698,28679,28660,28641],[28622]},<br>
1,<br>
>
{[],[]},<br>
{qstate,<0.647.0>,active,{0,nil}},
<br>
12}},<br>
{guid,{{<a href="tel:2296174761" value="+12296174761"
target="_blank">2296174761</a>,591959305,1562876388,1822892473},1}},<br>
>
{{ch,<0.720.0>},<br>
{cr,<0.720.0>,#Ref<0.0.0.12036>,<br>
{[28723,28704,28685,28666,28647],[28628]},<br>
1,<br>
{[],
[]},<br>
>
{qstate,<0.719.0>,active,{0,nil}},<br>
37}},
<br>
{{credit_to,<7350.869.0>},39},<br>
{{ch,<0.636.0>},<br>
{cr,<0.636.0>,#Ref<0.0.0.9960>,<br>
>
{[28705,28686,28667,28648,28629,28604,28586,28568,<br>
28550,28532,28514,28496,28478,28460,28442,28424,<br>
28406,28388,28370,28352,28334,28315,28294,28273,<br>
>
28252,28231,28210,28189,28168,28147,28124,28101,
<br>
28078,28055,28032,28009,27986,27963,27940,27917,
<br>
27894,27871,27848,27825,27802,27779,27756,27733,<br>
>
27710,27687,27664,27641,27618,27595],<br>
[27572]},<br>
1,<br>
{[],[]},<br>
{qstate,<0.635.0>,active,{0,nil}},<br>
>
15}},<br>
{{credit_to,<7350.850.0>},50},<br>
{{ch,<0.616.0>},<br>
{cr,<0.616.0>,#Ref<0.0.0.9300>,<br>
{[28719,28700,28681,28662,28643],[28624]},<br>
>
1,<br>
{[],[]},<br>
{qstate,<0.615.0>,active,{0,nil}},<br>
12}},<br>
{{ch,<0.509.0>},<br>
{cr,<0.509.0>,#Ref<0.0.0.2243>,<br>
>
{[28302,28281,28260,28239,28218,28197,28176,28155,<br>
28132,28109,28086,28063,28040,28017,27994,27971],<br>
[27580,27603,27626,27649,27672,27695,27718,27741,<br>
>
27764,27787,27810,27833,27856,27879,27902,27925,
<br>
27948]},<br>
3,<br>
{[],[]},<br>
{qstate,<0.508.0>,active,{0,nil}},<br>
>
36}},<br>
{{credit_to,<7350.873.0>},17},<br>
{{ch,<0.588.0>},<br>
{cr,<0.588.0>,#Ref<0.0.0.7808>,<br>
{[28721,28702,28683,28664,28645],[28626]},<br>
>
1,<br>
{[],[]},<br>
{qstate,<0.587.0>,active,{0,nil}},<br>
12}},<br>
{{ch,<0.664.0>},<br>
{cr,<0.664.0>,#Ref<0.0.0.10772>,<br>
>
{[28720,28701,28682,28663,28644],[28625]},<br>
1,
<br>
{[],[]},<br>
{qstate,<0.663.0>,active,{0,nil}},
<br>
13}},<br>
>
{fhc_age_tree,{0,nil}},<br>
{{ch,<0.708.0>},<br>
{cr,<0.708.0>,#Ref<0.0.0.11695>,<br>
{[28707,28688,28669,28650,28631,28606,28588,28570,<br>
>
28552,28534,28516,28498,28480,28462,28444,28426,
<br>
28408,28390,28372,28354,28336,28317,28296,28275,
<br>
28254,28233,28212,28191,28170,28149,28126,28103,<br>
>
28080,28057,28034,28011,27988,27965,27942,27919,
<br>
27896,27873,27850,27827,27804,27781,27758,27735,
<br>
27712,27689,27666,27643,27620,27597],<br>
[27574]},<br>
>
1,<br>
{[],[]},<br>
{qstate,<0.707.0>,active,{0,nil}},<br>
35}},<br>
{{credit_to,<7350.829.0>},13},<br>
>
{{ch,<0.757.0>},<br>
{cr,<0.757.0>,#Ref<0.0.0.13150>,<br>
{[28706,28687,28668,28649,28630,28605,28587,28569,<br>
28551,28533,28515,28497,28479,28461,28443,28425,<br>
>
28407,28389,28371,28353,28335,28316,28295,28274,
<br>
28253,28232,28211,28190,28169,28148,28125,28102,
<br>
28079,28056,28033,28010,27987,27964,27941,27918,<br>
>
27895,27872,27849,27826,27803,27780,27757,27734,
<br>
27711,27688,27665,27642,27619,27596],<br>
[27573]},<br>
1,<br>
{[],[]},<br>
>
{qstate,<0.756.0>,active,{0,nil}},<br>
49}},
<br>
{{ch,<0.572.0>},<br>
{cr,<0.572.0>,#Ref<0.0.0.7536>,<br>
{[28709,28690,28671,28652,28633,28608,28590,28572,<br>
>
28554,28536,28518,28500,28482,28464,28446,28428,
<br>
28410,28392,28374,28356,28338,28319,28298,28277,
<br>
28256,28235,28214,28193,28172,28151,28128,28105,<br>
>
28082,28059,28036,28013,27990,27967,27944,27921,
<br>
27898,27875,27852,27829,27806,27783,27760,27737,
<br>
27714,27691,27668,27645,27622,27599],<br>
[27576]},<br>
>
1,<br>
{[],[]},<br>
{qstate,<0.571.0>,active,{0,nil}},<br>
35}},<br>
{{credit_to,<7350.825.0>},2},<br>
{{credit_to,<7350.858.0>},3}]<br>
> trap_exit: true<br> status: running<br>
heap_size: 121536<br> stack_size: 27<br>
reductions: 9363545<br> neighbours:<br><br>=SUPERVISOR REPORT==== 18-
May-2013::01:41:51 ===<br> Supervisor:
{local,rabbit_amqqueue_sup}<br>
> Context: child_terminated<br>
Reason: {{case_clause,{empty,{[],[]}}},<br>
[{rabbit_amqqueue_process,subtract_acks,3,[]},<br>
{rabbit_amqqueue_process,subtract_acks,4,[]},<br>
>
{rabbit_amqqueue_process,handle_cast,2,[]},<br>
{gen_server2,handle_msg,2,[]},
<br>
{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}<br>
Offender: [{pid,<0.301.0>},<br>
>
{name,rabbit_amqqueue},<br>
{mfa,<br>
{rabbit_amqqueue_process,start_link,<br>
[{amqqueue,<br>
{resource,<<"/">>,queue,<br>
>
<<"document_queue">>},<br>
true,false,none,[],<0.7622.0>,[],[],undefined,<br>
[]}]}},<br>
{restart_type,temporary},<br>
>
{shutdown,4294967295},<br>
{child_type,worker}]</blockquote>
> <div><br></div>
> <div>Notice how the case_clause is empty. The other rabbit log looks
similar:</div>
> <div><br></div>
> <blockquote class="gmail_quote"># cat /var/log/rabbitmq/rabbit at ...
</blockquote>
> <blockquote class="gmail_quote"> </blockquote>
> <blockquote class="gmail_quote">
> =ERROR REPORT==== 18-May-2013::01:41:48 ===</blockquote>
> <blockquote class="gmail_quote">** Generic server <0.301.0>
terminating</blockquote>
> <blockquote class="gmail_quote">** Last message in was {'$gen_cast',
</blockquote>
> <blockquote class="gmail_quote">
>
{ack,</blockquote>
> <blockquote class="gmail_quote">
[27585,27584,27607,27630,27653,27676,27699,</blockquote>
> <blockquote class="gmail_quote">
27722,27745,27768,27791,27814,27837,27860,</blockquote>
> <blockquote class="gmail_quote">
27883,27906,27929,27952,27975,27998,28021,</blockquote>
> <blockquote class="gmail_quote">
28044,28067,28090,28113,28136,28159,28180,</blockquote>
> <blockquote class="gmail_quote">
28201,28222,28243,28264,28285,28306,28325,</blockquote>
> <blockquote class="gmail_quote">
28343,28361,28379,28397,28415,28433,28451,</blockquote>
> <blockquote class="gmail_quote">
28469,28487,28505,28523,28541,28559,28577,</blockquote>
> <blockquote class="gmail_quote">
28595,28613],</blockquote>
> <blockquote class="gmail_quote">
<0.676.0>}}</blockquote>
> <blockquote class="gmail_quote">** When Server state == {q,</blockquote>
> <blockquote class="gmail_quote">
>
{amqqueue,</blockquote>
> <blockquote class="gmail_quote">
{resource,<<"/">>,queue,</blockquote>
> <blockquote class="gmail_quote">
<<"document_queue">>},</blockquote>
> <blockquote class="gmail_quote">
true,false,none,
[],<0.301.0>,[],[],undefined,[]},</blockquote>
> <blockquote class="gmail_quote">
none,true,rabbit_variable_queue,</blockquote>
> <blockquote class="gmail_quote">
{vqstate,</blockquote>
> <blockquote class="gmail_quote">
>
{0,{[],[]}},</blockquote>
> <blockquote class="gmail_quote">
{0,{[],[]}},</blockquote>
> <blockquote class="gmail_quote">
{delta,undefined,0,undefined},</blockquote>
> <blockquote class="gmail_quote">
{0,{[],[]}},</blockquote>
> <blockquote class="gmail_quote">
>
{0,{[],[]}}, </blockquote>
> <blockquote class="gmail_quote">
... [ thousands of lines of
state ] ... </blockquote>
> <blockquote class="gmail_quote"> </blockquote>
> <blockquote class="gmail_quote">
> ** Reason for termination ==</blockquote>
> <blockquote class="gmail_quote">** {{case_clause,{empty,{[],[]}}},
</blockquote>
> <blockquote class="gmail_quote">
[{rabbit_amqqueue_process,subtract_acks,3,[]},</blockquote>
> <blockquote class="gmail_quote">
{rabbit_amqqueue_process,subtract_acks,4,[]},</blockquote>
> <blockquote class="gmail_quote">
{rabbit_amqqueue_process,handle_cast,2,[]},</blockquote>
> <blockquote class="gmail_quote">
{gen_server2,handle_msg,2,[]},</blockquote>
> <blockquote class="gmail_quote">
> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},
{line,249}]}]}</blockquote>
> <blockquote class="gmail_quote">
> <br>
> </blockquote>
> <blockquote class="gmail_quote">=ERROR REPORT==== 18-May-2013::01:41:51
===</blockquote>
> <blockquote class="gmail_quote">connection <0.430.0>, channel 22 -
soft error:</blockquote>
> <blockquote class="gmail_quote">{amqp_error,not_found,</blockquote>
> <blockquote class="gmail_quote">
> "no queue 'document_queue' in
vhost '/'",</blockquote>
> <blockquote class="gmail_quote">
> 'queue.declare'}</blockquote>
> <blockquote class="gmail_quote"><br></blockquote>
> <blockquote class="gmail_quote">=ERROR REPORT==== 18-May-2013::01:41:51
===</blockquote>
> <blockquote class="gmail_quote">
> connection <0.384.0>, channel 12 - soft error:</blockquote>
> <blockquote class="gmail_quote">
> {amqp_error,not_found,</blockquote>
> <blockquote class="gmail_quote">
"no queue 'document_queue' in vhost '/'",</blockquote>
> <blockquote class="gmail_quote">
'queue.declare'}</blockquote>
> <blockquote class="gmail_quote">
> <br>
> </blockquote>
> <blockquote class="gmail_quote">=ERROR REPORT==== 18-May-2013::01:41:51
===</blockquote>
> <blockquote class="gmail_quote">connection <0.384.0>, channel 15 -
soft error:</blockquote>
> <blockquote class="gmail_quote">{amqp_error,not_found,</blockquote>
> <blockquote class="gmail_quote">
> "no queue 'document_queue' in
vhost '/'",</blockquote>
> <blockquote class="gmail_quote">
> 'queue.declare'}</blockquote>
> <blockquote class="gmail_quote"><br></blockquote>
> <blockquote class="gmail_quote">=ERROR REPORT==== 18-May-2013::01:41:51
===</blockquote>
> <blockquote class="gmail_quote">
> connection <0.373.0>, channel 23 - soft error:</blockquote>
> <blockquote class="gmail_quote">
> {amqp_error,not_found,</blockquote>
> <blockquote class="gmail_quote">
"no queue 'document_queue' in vhost '/'",</blockquote>
> <blockquote class="gmail_quote">
'queue.declare'} </blockquote>
> <blockquote class="gmail_quote">
> ... [ repeated a bunch of times ] ...</blockquote>
> <div><br></div>
> <div>I also see this in my /var/log/syslog with regards to beam.smp:
</div>
> <div><br></div>
> <blockquote class="gmail_quote">
> # cat /var/log/syslog<br>May 17 22:46:23 ip-10-123-123-123 kernel:
[244800.176041] INFO: task beam.smp:18971 blocked for more than 120 seconds.
<br>May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176053] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.<br>
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176060] beam.smp
D ffff8801bfd93700 0 18971 18962
0x00000000<br>May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176065]
ffff8801b11cdcb8 0000000000000282 0000000000000000
ffffffffffffffe0<br>
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176070]
ffff8801b11cdfd8 ffff8801b11cdfd8 ffff8801b11cdfd8
0000000000013700<br>May 17 22:46:23 ip-10-123-123-123 kernel:
[244800.176075] ffff8801b3180000 ffff8801b35244a0 00007f500e2079e0
ffff8801b146bb80<br>
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176079] Call Trace:
<br>May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176089]
[<ffffffff8165434f>] schedule+0x3f/0x60<br>May 17 22:46:23 ip-10-123-
123-123 kernel: [244800.176094] [<ffffffff8106aef5>]
exit_mm+0x85/0x130<br>
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176097]
[<ffffffff8106b10e>] do_exit+0x16e/0x450<br>May 17 22:46:23 ip-10-123-
123-123 kernel: [244800.176102] [<ffffffff810797aa>] ?
__dequeue_signal+0x6a/0xb0<br>
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176105]
[<ffffffff8106b594>] do_group_exit+0x44/0xa0<br>May 17 22:46:23 ip-10-
123-123-123 kernel: [244800.176108] [<ffffffff8107c36c>]
get_signal_to_deliver+0x21c/0x420<br>
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176113]
[<ffffffff81014825>] do_signal+0x45/0x130<br>May 17 22:46:23 ip-10-
123-123-123 kernel: [244800.176116] [<ffffffff8105539d>] ?
set_next_entity+0xad/0xd0<br>
> May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176120]
[<ffffffff81014ad5>] do_notify_resume+0x65/0x80<br>May 17 22:46:23 ip-
10-123-123-123 kernel: [244800.176124] [<ffffffff8165ea90>]
int_signal+0x12/0x17<br>... [ repeated 9 more times with same timestamp ]
...</blockquote>
> <div><br></div>
> <div>There are no strange CPU spikes on my EC2 node based on AWS
monitoring. The only thing that seems to stand out is that my writes to my
EBS volume spike during the crash (~90 write OPs/sec, 4 MiB/sec write
bandwidth, ~30ms/op write latency).</div>
> <div><br></div>
> <div>My team and I are completely stumped and would appreciate any insight
you might have to the crash reports. Is this a problem with writes
hanging when going against the EBS volume since EBS volumes are essentially
network mounted? Is there a way to configure rabbit to be more
tolerant of these delays? </div>
> <div><br></div>
> <div><br></div>
> <div>For debugging purposes, below is the status output of one of the
nodes when it is healthy (so you can see the versions of everything i'm
running):<br>
> </div>
> <div><br></div>
> <blockquote class="gmail_quote">
> # rabbitmqctl status<br>Status of node 'rabbit <at> ip-10-123-123-123' ...
<br>[{pid,28978},<br> {running_applications,<br>
[{rabbitmq_management,"RabbitMQ Management Console","3.1.0"},<br>
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.1.0"},<br>
> {webmachine,"webmachine","1.9.1-rmq3.1.0-
git52e62bc"},<br> {mochiweb,"MochiMedia Web
Server","2.3.1-rmq3.1.0-gitd541e9a"},<br>
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.1.0"},<br>
> {rabbit,"RabbitMQ","3.1.0"},<br>
{os_mon,"CPO CXC 138 46","2.2.11"},<br>
{inets,"INETS CXC 138 49","5.9.4"},<br>
{mnesia,"MNESIA CXC 138 12","4.8"},<br>
> {amqp_client,"RabbitMQ AMQP Client","3.1.0"},
<br> {xmerl,"XML parser","1.3.3"},<br>
{sasl,"SASL CXC 138 11","2.3.1"},<br>
{stdlib,"ERTS CXC 138 10","1.19.1"},<br>
> {kernel,"ERTS CXC 138 10","2.16.1"}]},
<br> {os,{unix,linux}},<br> {erlang_version,<br>
"Erlang R16B (erts-5.10.1) [source-05f1189] [64-bit] [smp:8:8] [async-
threads:30] [hipe] [kernel-poll:true]\n"},<br>
> {memory,<br> [{total,60179048},<br>
{connection_procs,5274808},<br> {queue_procs,<a
href="tel:13107264" value="+13107264" target="_blank">13107264</a>},
<br> {plugins,354272},<br>
{other_proc,9741830},<br>
> {mnesia,150456},<br>
{mgmt_db,12248},<br> {msg_index,34768},<br>
{other_ets,1148248},<br> {binary,4905584},
<br> {code,19565863},<br>
{atom,703377},<br> {other_system,5180330}]},<br>
> {vm_memory_high_watermark,0.8},<br>
{vm_memory_limit,5828434329},<br> {disk_free_limit,1000000000},
<br> {disk_free,3953344512},<br> {file_descriptors,<br>
[{total_limit,924},{total_used,6},{sockets_limit,829},
{sockets_used,4}]},<br>
> {processes,[{limit,1048576},{used,348}]},<br> {run_queue,0},
<br> {uptime,3796}]<br>...done.</blockquote>
> <div><br></div>
> <div>Note that I am using Erlang v16 from the Erlang Solutions repo (esl-
erlang package). I originally was using Erlang v14 from Ubuntu repos
(erlang-base), but decided to upgrade hoping it would resolve the issue
(which it did not). I have also tried running RabbitMQ 3.0.4 without
success. This issue seems to affect both versions.</div>
> <div><br></div>
> <div>Thanks,</div>
> <div>Karl</div>
> <br>_______________________________________________<br>
> rabbitmq-discuss mailing list<br><a href="mailto:rabbitmq-
discuss at ...">rabbitmq-discuss <at> lists.rabbitmq.com</a><br><a
href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss"
target="_blank">https://lists.rabbitmq.com/cgi-
bin/mailman/listinfo/rabbitmq-discuss</a><br><br>
> </blockquote>
> </div>
> <br>
> </div>
> </div>
>
Hi Brett,
That would explain why all hell breaks lose when the acks are being
subtracted, but I still don't understand what is causing the queue to
disappear in the first place. The only administrative actions performed on
the queues in all our code base are declares or passive declares (to check
for queue existence). We are just using Spring AMQP's
SimpleMessageListenerContainer (which, btw, the code for that class is
anything but "Simple") for consumers, and I've scanned through that code and
didn't find any place where it is dropping or destroying a queue or binding.
Is there are known bug where rabbit will cause a queue to disappear? All my
queues are durable and my messages are NOT persistent (we need high
throughput and loosing a few messages is not a big deal). I made the queue
durable after we first started noticing the crashes so the queue would come
back up on node restart. But durability doesn't seem to protect it from
disappearing.
Any thoughts on this would be greatly, greatly appreciated.
-Karl
More information about the rabbitmq-discuss
mailing list