[rabbitmq-discuss] Strange subtract_acks crash in RabbitMQ 3.1.0-1 (case_clause empty) on EC2
Brett Cameron
brett.r.cameron at gmail.com
Sun May 19 22:11:44 BST 2013
Karl,
I am not aware of any known issues in this regard (queues strangely
disapearing). The fact that you get the problem with different versions of
Erlang and dofferent versions of RabbitMQ would make me suspect that it is
more likely to be something external to the broker environment.
The "INFO: task beam.smp:18971 blocked for more than 120 seconds." messages
in syslog are interesting. What I think is happening here is as follows: By
default Linux uses up to 40% of the available memory for file system
caching and if/when this threshold is reached the file system flushes all
outstanding cached data to disk, causing all following I/O's become
synchronous (until the flush completes). There is a default time limit of
120 seconds for the flush to complete. If you're EC2 VM's have a lot of RAM
and are processing heavy I/O-intensive workloads, it is possible that the
EBS volumes might sometimes not be able to keep up (or possibly there's
some spike in network activity and available bandwidth goes through the
floor). You have mentioned that your messages are not persistent, but
RabbitMQ's memory usage might be such that it is occasionally deciding to
flush a whole pile of stuff to disk. I notice that you have
{vm_memory_high_watermark,0.8}, and this could have something to do with
the problem - maybe try reducing this, and/or maybe set vm.dirty_ratio to
something less than 40% in /etc/sysctl.conf (this might increase the
frequency of flushes, but you'll be flushing less data). I'd possibly start
by reducing vm_memory_high_watermark. I can't really correlate this with
your queue disappearing, but if I/O's are getting messed up then all manner
of weird things could happen I guess. Hopefully others might have some more
tangible ideas!
Brett
On Sun, May 19, 2013 at 12:57 AM, Karl Rieb <karl.rieb at gmail.com> wrote:
> Brett Cameron <brett.r.cameron at ...> writes:
>
> >
> >
> >
> > Karl,Wild speculation would be that messages have been consumed from a
> queue but that queue (name = document_queue?) is for some reason missing in
> action (gone, inaccessible, ...) when the consumer subsequently tries to
> ack
> the message(s).
> > Brett
> >
> >
> > On Sat, May 18, 2013 at 3:20 PM, Karl Rieb <karl.rieb <at> gmail.com>
> wrote:Hi,
> > I am running a RabbitMQ cluster of 4 nodes in AWS EC2 on c1.xlarge
> instances using Ubuntu 12.04 LTS (kernel 3.2.0-38). Periodically two of my
> nodes holding the largest queues will crash and require me to restart the
> rabbitmq-server service. Looking at the logs, I'm at a loss as to why the
> crash is occurring. I've have tried reproducing the crash manually by
> flooding my nodes with messages, but that doesn't seem to trigger the
> issue.
> Below is output from various RabbitMQ logs:
> >
> > # cat /var/log/rabbitmq/rabbit <at> ip-10-123-123-123-sasl.log
> > =CRASH REPORT==== 18-May-2013::01:41:51 === crasher: initial call:
> gen:init_it/6 pid: <0.301.0> registered_name: [] exception exit:
> {{case_clause,{empty,{[],[]}}},
> [{rabbit_amqqueue_process,subtract_acks,3,[]},
> > {rabbit_amqqueue_process,subtract_acks,4,[]},
>
> {rabbit_amqqueue_process,handle_cast,2,[]},
>
> {gen_server2,handle_msg,2,[]}, {proc_lib,wake_up,3,
> > [{file,"proc_lib.erl"},{line,249}]}]}
>
> in function gen_server2:terminate/3 ancestors:
> [rabbit_amqqueue_sup,rabbit_sup,<0.207.0>] messages: [{'$gen_cast',
> {resume,<0.676.0>}},
> >
> {'$gen_cast',{ack,[27583,27606,27629,27652,27675,27698,
> 27721,27744,27767,27790,27813,27836,
>
> 27859,27882,27905,27928,27951,27974,
> >
> 27997,28020,28043,28066,28089,28112,
> 28135,28158,28179,28200,28221,28242,
>
> 28263,28284,28305,28324,28342,28360,
> >
> 28378,28396,28414,28432,28450,28468,
> 28486,28504,28522,28540,28558,28576,
>
> 28594,28612],
> <0.737.0>}},
> > {'$gen_cast',{resume,<0.737.0>}},
> {'$gen_cast',{ack,[27608,27631,27654,27677,27700,27723,
>
> 27746,27769,27792,27815,27838,27861,
> >
> 27884,27907,27930,27953,27976,27999,
> 28022,28045,28068,28091,28114,28137,
>
> 28160,28181,28202,28223,28244,28265,
> >
> 28286,28307,28326,28344,28362,28380,
> 28398,28416,28434,28452,28470,28488,
>
> 28506,28524,28542,28560,28578,28596,
> > 28614],
> <0.676.0>}}, {'$gen_cast',{ack,
> [27578,27601,27624,27647,27670,27693,
> 27716,27739,27762,27785,27808,27831,
> >
> 27854,27877,27900,27923,27946,27969,
> 27992,28015,28038,28061,28084,28107,
>
> 28130,28153,28174,28195,28216,28237,
> >
> 28258,28279,28300,28321,28340,28358,
> 28376,28394,28412,28430,28448,28466,
>
> 28484,28502,28520,28538,28556,28574,
> > 28592,28610],
> <0.716.0>}},
> {'$gen_cast',{resume,<0.716.0>}},
> {'$gen_cast',{ack,[27582,27605,27628,27651,27674,27697,
> >
> 27720,27743,27766,27789,27812,27835,
> 27858,27881,27904,27927,27950,27973,
>
> 27996,28019,28042,28065,28088,28111,
> >
> 28134,28157,28178,28199,28220,28241,
> 28262,28283,28304,28323,28341,28359,
>
> 28377,28395,28413,28431,28449,28467,
> >
> 28485,28503,28521,28539,28557,28575,
> 28593,28611],
> <0.737.0>}}] links: [<0.300.0>]
> > dictionary: [{{ch,<0.729.0>}, {cr,
> <0.729.0>,#Ref<0.0.0.12296>,
> {[28715,28696,28677,28658,28639,28620,28602,28584,
> 28566,28548,28530,28512,28494,28476,28458,28440,
> >
> 28422,28404,28386,28368,28350,28332,28313,28292,
> 28271,28250,28229,28208,28187,28166,28143,28120,
>
> 28097,28074,28051,28028,28005,27982,27959,27936,
> >
> 27913,27890,27867,27844,27821,27798,27775,27752,
> 27729,27706,27683,27660,27637,27614],
> [27591]}, 1, {[],[]},
> > {qstate,<0.728.0>,active,{0,nil}},
>
> 11}}, {{credit_to,<7350.877.0>},22},
> {{ch,<0.608.0>}, {cr,<0.608.0>,#Ref<0.0.0.9041>,
> >
> {[28708,28689,28670,28651,28632,28607,28589,28571,
> 28553,28535,28517,28499,28481,28463,28445,28427,
>
> 28409,28391,28373,28355,28337,28318,28297,28276,
> >
> 28255,28234,28213,28192,28171,28150,28127,28104,
> 28081,28058,28035,28012,27989,27966,27943,27920,
>
> 27897,27874,27851,27828,27805,27782,27759,27736,
> > 27713,27690,27667,27644,27621,27598],
> [27575]}, 1, {[],
> []}, {qstate,<0.607.0>,active,{0,nil}},
> > 38}}, {{ch,<0.624.0>},
> {cr,<0.624.0>,#Ref<0.0.0.9436>,
> {[28722,28703,28684,28665,28646],[28627]}, 1,
> > {[],[]}, {qstate,
> <0.623.0>,active,{0,nil}}, 36}},
> {{ch,<0.696.0>}, {cr,<0.696.0>,#Ref<0.0.0.11414>,
> > {[],[]}, 2,
> {[],[]}, {qstate,<0.695.0>,active,{0,nil}},
> 24}}, {{ch,<0.564.0>},
> > {cr,<0.564.0>,#Ref<0.0.0.7285>,
> {[28718,28699,28680,28661,28642],[28623]}, 1,
> {[],[]}, {qstate,<0.563.0>,active,
> {0,nil}},
> > 12}}, {{ch,<0.644.0>},
> {cr,<0.644.0>,#Ref<0.0.0.10071>,
> {[28713,28694,28675,28656,28637,28618,28600,28582,
> 28564,28546,28528,28510,28492,28474,28456,28438,
> >
> 28420,28402,28384,28366,28348,28330,28311,28290,
> 28269,28248,28227,28206,28185,28164,28141,28118,
>
> 28095,28072,28049,28026,28003,27980,27957,27934,
> >
> 27911,27888,27865,27842,27819,27796,27773,27750,
> 27727,27704,27681,27658,27635,27612],
> [27589]}, 1, {[],[]},
> > {qstate,<0.643.0>,active,{0,nil}},
>
> 11}}, {{ch,<0.676.0>}, {cr,
> <0.676.0>,#Ref<0.0.0.11309>,
> {[28614,28613,28596,28595,28578,28577,28560,28559,
> >
> 28542,28541,28524,28523,28506,28505,28488,28487,
> 28470,28469,28452,28451,28434,28433,28416,28415,
>
> 28398,28397,28380,28379,28362,28361,28344,28343,
> >
> 28326,28325,28307,28306,28286,28285,28265,28264,
> 28244,28243,28223,28222,28202,28201,28181,28180,
>
> 28160,28159,28137,28136,28114,28113,28091,28090,
> >
> 28068,28067,28045,28044,28022,28021,27999,27998,
> 27976,27975,27953,27952,27930,27929,27907,27906,
>
> 27884,27883,27861,27860,27838,27837,27815,27814,
> >
> 27792,27791,27769,27768,27746,27745,27723,27722,
> 27700,27699,27677,27676,27654,27653,27631,27630,
>
> 27608,27607,27585], [27584]},
> > 2, {[{<0.676.0>,
>
> {consumer,<<"amq.ctag-0cvDGmyaAbEW_j15AfhGHQ">>,
> true}}],
> > [{<0.676.0>, {consumer,
> <<"amq.ctag-YqU04SIfQv6P-QVueEjbRg">>,
> true}}]}, {qstate,<0.675.0>,suspended,{0,nil}},
> > 25}},
> {{xtype_to_module,direct},rabbit_exchange_type_direct},
> {{credit_to,<7350.833.0>},31}, {{ch,<0.745.0>},
> {cr,<0.745.0>,#Ref<0.0.0.12674>,
> >
> {[28716,28697,28678,28659,28640,28621,28603,28585,
> 28567,28549,28531,28513,28495,28477,28459,28441,
>
> 28423,28405,28387,28369,28351,28333,28314,28293,
> >
> 28272,28251,28230,28209,28188,28167,28144,28121,
> 28098,28075,28052,28029,28006,27983,27960,27937,
>
> 27914,27891,27868,27845,27822,27799,27776,27753,
> > 27730,27707,27684,27661,27638,27615],
> [27592]}, 1, {[],
> []}, {qstate,<0.744.0>,active,{0,nil}},
> > 37}}, {{ch,<0.660.0>},
> {cr,<0.660.0>,#Ref<0.0.0.10650>,
> {[28712,28693,28674,28655,28636,28617,28599,28581,
> 28563,28545,28527,28509,28491,28473,28455,28437,
> >
> 28419,28401,28383,28365,28347,28329,28310,28289,
> 28268,28247,28226,28205,28184,28163,28140,28117,
>
> 28094,28071,28048,28025,28002,27979,27956,27933,
> >
> 27910,27887,27864,27841,27818,27795,27772,27749,
> 27726,27703,27680,27657,27634,27611],
> [27588]}, 1, {[],[]},
> > {qstate,<0.659.0>,active,{0,nil}},
>
> 11}}, {{credit_to,<7350.837.0>},32},
> {{credit_to,<7350.881.0>},17}, {{ch,<0.737.0>},
> > {cr,<0.737.0>,#Ref<0.0.0.12429>,
> {[28612,28611,28594,28593,28576,28575,28558,28557,
> 28540,28539,28522,28521,28504,28503,28486,28485,
> >
> 28468,28467,28450,28449,28432,28431,28414,28413,
> 28396,28395,28378,28377,28360,28359,28342,28341,
>
> 28324,28323,28305,28304,28284,28283,28263,28262,
> >
> 28242,28241,28221,28220,28200,28199,28179,28178,
> 28158,28157,28135,28134,28112,28111,28089,28088,
>
> 28066,28065,28043,28042,28020,28019,27997,27996,
> >
> 27974,27973,27951,27950,27928,27927,27905,27904,
> 27882,27881,27859,27858,27836,27835,27813,27812,
>
> 27790,27789,27767,27766,27744,27743,27721,27720,
> >
> 27698,27697,27675,27674,27652,27651,27629,27628,
> 27606,27605,27583],
> [27582]},
> 2, {[{<0.737.0>,
> > {consumer,<<"amq.ctag-
> ITZqhulzHhUzx1uy1Eot3g">>, true}}],
> [{<0.737.0>, {consumer,
> <<"amq.ctag-hpo5ejdoJRdLrT6JS42jGw">>,
> > true}}]},
> {qstate,<0.736.0>,suspended,{0,nil}}, 11}},
> {credit_blocked,[]}, {{credit_to,<7350.854.0>},3},
> > {{ch,<0.716.0>}, {cr,
> <0.716.0>,#Ref<0.0.0.11820>,
> {[28610,28609,28592,28591,28574,28573,28556,28555,
> 28538,28537,28520,28519,28502,28501,28484,28483,
> >
> 28466,28465,28448,28447,28430,28429,28412,28411,
> 28394,28393,28376,28375,28358,28357,28340,28339,
>
> 28321,28320,28300,28299,28279,28278,28258,28257,
> >
> 28237,28236,28216,28215,28195,28194,28174,28173,
> 28153,28152,28130,28129,28107,28106,28084,28083,
>
> 28061,28060,28038,28037,28015,28014,27992,27991,
> >
> 27969,27968,27946,27945,27923,27922,27900,27899,
> 27877,27876,27854,27853,27831,27830,27808,27807,
>
> 27785,27784,27762,27761,27739,27738,27716,27715,
> >
> 27693,27692,27670,27669,27647,27646,27624,27623,
> 27601,27600,27578],
> [27577]},
> 2, {[{<0.716.0>,
> > {consumer,<<"amq.ctag-BTlS-GmbXNpcU-
> wUlMI60Q">>, true}}],
> [{<0.716.0>, {consumer,<<"amq.ctag-
> loc5monF8M1pz_Bb8ypumw">>,
> > true}}]},
> {qstate,<0.715.0>,suspended,{0,nil}}, 44}},
> {{ch,<0.782.0>}, {cr,<0.782.0>,#Ref<0.0.0.15649>,
> >
> {[28714,28695,28676,28657,28638,28619,28601,28583,
> 28565,28547,28529,28511,28493,28475,28457,28439,
>
> 28421,28403,28385,28367,28349,28331,28312,28291,
> >
> 28270,28249,28228,28207,28186,28165,28142,28119,
> 28096,28073,28050,28027,28004,27981,27958,27935,
>
> 27912,27889,27866,27843,27820,27797,27774,27751,
> > 27728,27705,27682,27659,27636,27613],
> [27590]}, 1, {[],
> []}, {qstate,<0.781.0>,active,{0,nil}},
> > 35}}, {{credit_to,
> <7350.846.0>},22}, {{ch,<0.684.0>}, {cr,
> <0.684.0>,#Ref<0.0.0.11198>,
> {[28711,28692,28673,28654,28635,28616,28598,28580,
> >
> 28562,28544,28526,28508,28490,28472,28454,28436,
> 28418,28400,28382,28364,28346,28328,28309,28288,
>
> 28267,28246,28225,28204,28183,28162,28139,28116,
> >
> 28093,28070,28047,28024,28001,27978,27955,27932,
> 27909,27886,27863,27840,27817,27794,27771,27748,
>
> 27725,27702,27679,27656,27633,27610],
> [27587]},
> > 1, {[],[]},
> {qstate,<0.683.0>,active,{0,nil}}, 11}},
>
> {{ch,<0.692.0>}, {cr,<0.692.0>,#Ref<0.0.0.11088>,
> >
> {[28710,28691,28672,28653,28634,28615,28597,28579,
> 28561,28543,28525,28507,28489,28471,28453,28435,
>
> 28417,28399,28381,28363,28345,28327,28308,28287,
> >
> 28266,28245,28224,28203,28182,28161,28138,28115,
> 28092,28069,28046,28023,28000,27977,27954,27931,
>
> 27908,27885,27862,27839,27816,27793,27770,27747,
> > 27724,27701,27678,27655,27632,27609],
> [27586]}, 1, {[],
> []}, {qstate,<0.691.0>,active,{0,nil}},
> > 12}}, {{ch,<0.648.0>},
> {cr,<0.648.0>,#Ref<0.0.0.10283>,
> {[28717,28698,28679,28660,28641],[28622]}, 1,
> > {[],[]}, {qstate,
> <0.647.0>,active,{0,nil}}, 12}},
> {guid,{{2296174761,591959305,1562876388,1822892473},1}},
> > {{ch,<0.720.0>}, {cr,
> <0.720.0>,#Ref<0.0.0.12036>,
> {[28723,28704,28685,28666,28647],[28628]}, 1,
> {[],[]},
> > {qstate,<0.719.0>,active,{0,nil}},
>
> 37}}, {{credit_to,<7350.869.0>},39},
> {{ch,<0.636.0>}, {cr,<0.636.0>,#Ref<0.0.0.9960>,
> >
> {[28705,28686,28667,28648,28629,28604,28586,28568,
> 28550,28532,28514,28496,28478,28460,28442,28424,
>
> 28406,28388,28370,28352,28334,28315,28294,28273,
> >
> 28252,28231,28210,28189,28168,28147,28124,28101,
> 28078,28055,28032,28009,27986,27963,27940,27917,
>
> 27894,27871,27848,27825,27802,27779,27756,27733,
> > 27710,27687,27664,27641,27618,27595],
> [27572]}, 1, {[],
> []}, {qstate,<0.635.0>,active,{0,nil}},
> > 15}}, {{credit_to,
> <7350.850.0>},50}, {{ch,<0.616.0>}, {cr,
> <0.616.0>,#Ref<0.0.0.9300>,
> {[28719,28700,28681,28662,28643],[28624]},
> > 1, {[],[]},
> {qstate,<0.615.0>,active,{0,nil}}, 12}},
>
> {{ch,<0.509.0>}, {cr,<0.509.0>,#Ref<0.0.0.2243>,
> >
> {[28302,28281,28260,28239,28218,28197,28176,28155,
> 28132,28109,28086,28063,28040,28017,27994,27971],
> [27580,27603,27626,27649,27672,27695,27718,27741,
> >
> 27764,27787,27810,27833,27856,27879,27902,27925,
> 27948]}, 3,
> {[],[]}, {qstate,<0.508.0>,active,{0,nil}},
> > 36}}, {{credit_to,
> <7350.873.0>},17}, {{ch,<0.588.0>}, {cr,
> <0.588.0>,#Ref<0.0.0.7808>,
> {[28721,28702,28683,28664,28645],[28626]},
> > 1, {[],[]},
> {qstate,<0.587.0>,active,{0,nil}}, 12}},
>
> {{ch,<0.664.0>}, {cr,<0.664.0>,#Ref<0.0.0.10772>,
> > {[28720,28701,28682,28663,28644],[28625]},
>
> 1, {[],[]},
> {qstate,<0.663.0>,active,{0,nil}}, 13}},
> > {fhc_age_tree,{0,nil}}, {{ch,
> <0.708.0>}, {cr,<0.708.0>,#Ref<0.0.0.11695>,
> {[28707,28688,28669,28650,28631,28606,28588,28570,
> >
> 28552,28534,28516,28498,28480,28462,28444,28426,
> 28408,28390,28372,28354,28336,28317,28296,28275,
>
> 28254,28233,28212,28191,28170,28149,28126,28103,
> >
> 28080,28057,28034,28011,27988,27965,27942,27919,
> 27896,27873,27850,27827,27804,27781,27758,27735,
>
> 27712,27689,27666,27643,27620,27597],
> [27574]},
> > 1, {[],[]},
> {qstate,<0.707.0>,active,{0,nil}}, 35}},
>
> {{credit_to,<7350.829.0>},13},
> > {{ch,<0.757.0>}, {cr,
> <0.757.0>,#Ref<0.0.0.13150>,
> {[28706,28687,28668,28649,28630,28605,28587,28569,
> 28551,28533,28515,28497,28479,28461,28443,28425,
> >
> 28407,28389,28371,28353,28335,28316,28295,28274,
> 28253,28232,28211,28190,28169,28148,28125,28102,
>
> 28079,28056,28033,28010,27987,27964,27941,27918,
> >
> 27895,27872,27849,27826,27803,27780,27757,27734,
> 27711,27688,27665,27642,27619,27596],
> [27573]}, 1, {[],[]},
> > {qstate,<0.756.0>,active,{0,nil}},
>
> 49}}, {{ch,<0.572.0>}, {cr,
> <0.572.0>,#Ref<0.0.0.7536>,
> {[28709,28690,28671,28652,28633,28608,28590,28572,
> >
> 28554,28536,28518,28500,28482,28464,28446,28428,
> 28410,28392,28374,28356,28338,28319,28298,28277,
>
> 28256,28235,28214,28193,28172,28151,28128,28105,
> >
> 28082,28059,28036,28013,27990,27967,27944,27921,
> 27898,27875,27852,27829,27806,27783,27760,27737,
>
> 27714,27691,27668,27645,27622,27599],
> [27576]},
> > 1, {[],[]},
> {qstate,<0.571.0>,active,{0,nil}}, 35}},
>
> {{credit_to,<7350.825.0>},2}, {{credit_to,
> <7350.858.0>},3}]
> > trap_exit: true status: running heap_size: 121536
> stack_size:
> 27 reductions: 9363545 neighbours:=SUPERVISOR REPORT==== 18-May-
> 2013::01:41:51 === Supervisor: {local,rabbit_amqqueue_sup}
> > Context: child_terminated Reason: {{case_clause,{empty,
> {[],[]}}}, [{rabbit_amqqueue_process,subtract_acks,3,[]},
> {rabbit_amqqueue_process,subtract_acks,4,[]},
> > {rabbit_amqqueue_process,handle_cast,2,[]},
> {gen_server2,handle_msg,2,[]},
> {proc_lib,wake_up,3,
> [{file,"proc_lib.erl"},{line,249}]}]} Offender: [{pid,<0.301.0>},
> > {name,rabbit_amqqueue}, {mfa,
> {rabbit_amqqueue_process,start_link,
> [{amqqueue, {resource,<<"/">>,queue,
> > <<"document_queue">>},
>
> true,false,none,[],<0.7622.0>,[],[],undefined,
>
> []}]}}, {restart_type,temporary},
> > {shutdown,4294967295},
> {child_type,worker}]
> >
> > Notice how the case_clause is empty. The other rabbit log looks similar:
> >
> > # cat /var/log/rabbitmq/rabbit-
> 72dRMgatfUn5amjFlPYPZgfGRAsWV3w2 at public.gmane.org
> >
> >
> > =ERROR REPORT==== 18-May-2013::01:41:48 ===
> > ** Generic server <0.301.0> terminating
> > ** Last message in was {'$gen_cast',
> >
> > {ack,
> >
> [27585,27584,27607,27630,27653,27676,27699,
> >
> 27722,27745,27768,27791,27814,27837,27860,
> >
> 27883,27906,27929,27952,27975,27998,28021,
> >
> 28044,28067,28090,28113,28136,28159,28180,
> >
> 28201,28222,28243,28264,28285,28306,28325,
> >
> 28343,28361,28379,28397,28415,28433,28451,
> >
> 28469,28487,28505,28523,28541,28559,28577,
> > 28595,28613],
> > <0.676.0>}}
> > ** When Server state == {q,
> >
> > {amqqueue,
> > {resource,<<"/">>,queue,
> > <<"document_queue">>},
> > true,false,none,[],<0.301.0>,[],[],undefined,
> []},
> > none,true,rabbit_variable_queue,
> > {vqstate,
> >
> > {0,{[],[]}},
> > {0,{[],[]}},
> > {delta,undefined,0,undefined},
> > {0,{[],[]}},
> >
> > {0,{[],[]}},
> > ... [ thousands of lines of state ] ...
> >
> >
> > ** Reason for termination ==
> > ** {{case_clause,{empty,{[],[]}}},
> > [{rabbit_amqqueue_process,subtract_acks,3,[]},
> > {rabbit_amqqueue_process,subtract_acks,4,[]},
> > {rabbit_amqqueue_process,handle_cast,2,[]},
> > {gen_server2,handle_msg,2,[]},
> >
> > {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}
> >
> >
> >
> > =ERROR REPORT==== 18-May-2013::01:41:51 ===
> > connection <0.430.0>, channel 22 - soft error:
> > {amqp_error,not_found,
> >
> > "no queue 'document_queue' in vhost '/'",
> >
> > 'queue.declare'}
> >
> > =ERROR REPORT==== 18-May-2013::01:41:51 ===
> >
> > connection <0.384.0>, channel 12 - soft error:
> >
> > {amqp_error,not_found,
> > "no queue 'document_queue' in vhost '/'",
> > 'queue.declare'}
> >
> >
> >
> > =ERROR REPORT==== 18-May-2013::01:41:51 ===
> > connection <0.384.0>, channel 15 - soft error:
> > {amqp_error,not_found,
> >
> > "no queue 'document_queue' in vhost '/'",
> >
> > 'queue.declare'}
> >
> > =ERROR REPORT==== 18-May-2013::01:41:51 ===
> >
> > connection <0.373.0>, channel 23 - soft error:
> >
> > {amqp_error,not_found,
> > "no queue 'document_queue' in vhost '/'",
> > 'queue.declare'}
> >
> > ... [ repeated a bunch of times ] ...
> >
> > I also see this in my /var/log/syslog with regards to beam.smp:
> >
> >
> > # cat /var/log/syslogMay 17 22:46:23 ip-10-123-123-123 kernel:
> [244800.176041] INFO: task beam.smp:18971 blocked for more than 120
> seconds.May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176053] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176060] beam.smp
> D ffff8801bfd93700 0 18971 18962 0x00000000May 17 22:46:23 ip-10-123-
> 123-123 kernel: [244800.176065] ffff8801b11cdcb8 0000000000000282
> 0000000000000000 ffffffffffffffe0
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176070]
> ffff8801b11cdfd8 ffff8801b11cdfd8 ffff8801b11cdfd8 0000000000013700May 17
> 22:46:23 ip-10-123-123-123 kernel: [244800.176075] ffff8801b3180000
> ffff8801b35244a0 00007f500e2079e0 ffff8801b146bb80
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176079] Call Trace:May
> 17 22:46:23 ip-10-123-123-123 kernel: [244800.176089] [<ffffffff8165434f>]
> schedule+0x3f/0x60May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176094]
> [<ffffffff8106aef5>] exit_mm+0x85/0x130
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176097]
> [<ffffffff8106b10e>] do_exit+0x16e/0x450May 17 22:46:23 ip-10-123-123-123
> kernel: [244800.176102] [<ffffffff810797aa>] ? __dequeue_signal+0x6a/0xb0
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176105]
> [<ffffffff8106b594>] do_group_exit+0x44/0xa0May 17 22:46:23 ip-10-123-123-
> 123 kernel: [244800.176108] [<ffffffff8107c36c>]
> get_signal_to_deliver+0x21c/0x420
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176113]
> [<ffffffff81014825>] do_signal+0x45/0x130May 17 22:46:23 ip-10-123-123-123
> kernel: [244800.176116] [<ffffffff8105539d>] ? set_next_entity+0xad/0xd0
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176120]
> [<ffffffff81014ad5>] do_notify_resume+0x65/0x80May 17 22:46:23 ip-10-123-
> 123-123 kernel: [244800.176124] [<ffffffff8165ea90>]
> int_signal+0x12/0x17... [ repeated 9 more times with same timestamp ] ...
> >
> > There are no strange CPU spikes on my EC2 node based on AWS monitoring.
> The only thing that seems to stand out is that my writes to my EBS volume
> spike during the crash (~90 write OPs/sec, 4 MiB/sec write bandwidth,
> ~30ms/op write latency).
> >
> > My team and I are completely stumped and would appreciate any insight you
> might have to the crash reports. Is this a problem with writes hanging
> when
> going against the EBS volume since EBS volumes are essentially network
> mounted? Is there a way to configure rabbit to be more tolerant of these
> delays?
> >
> >
> > For debugging purposes, below is the status output of one of the nodes
> when it is healthy (so you can see the versions of everything i'm running):
> >
> >
> >
> > # rabbitmqctl statusStatus of node 'rabbit <at> ip-10-123-123-123' ...
> [{pid,28978}, {running_applications, [{rabbitmq_management,"RabbitMQ
> Management Console","3.1.0"}, {rabbitmq_web_dispatch,"RabbitMQ Web
> Dispatcher","3.1.0"},
> > {webmachine,"webmachine","1.9.1-rmq3.1.0-git52e62bc"},
> {mochiweb,"MochiMedia Web Server","2.3.1-rmq3.1.0-gitd541e9a"},
> {rabbitmq_management_agent,"RabbitMQ Management Agent","3.1.0"},
> > {rabbit,"RabbitMQ","3.1.0"}, {os_mon,"CPO CXC 138
> 46","2.2.11"}, {inets,"INETS CXC 138 49","5.9.4"},
> {mnesia,"MNESIA CXC 138 12","4.8"},
> > {amqp_client,"RabbitMQ AMQP Client","3.1.0"}, {xmerl,"XML
> parser","1.3.3"}, {sasl,"SASL CXC 138 11","2.3.1"},
> {stdlib,"ERTS
> CXC 138 10","1.19.1"},
> > {kernel,"ERTS CXC 138 10","2.16.1"}]}, {os,
> {unix,linux}}, {erlang_version, "Erlang R16B (erts-5.10.1) [source-
> 05f1189] [64-bit] [smp:8:8] [async-threads:30] [hipe] [kernel-
> poll:true]\n"},
> > {memory, [{total,60179048}, {connection_procs,5274808},
> {queue_procs,13107264}, {plugins,354272}, {other_proc,9741830},
> > {mnesia,150456}, {mgmt_db,12248}, {msg_index,34768},
> {other_ets,1148248}, {binary,4905584}, {code,19565863},
> {atom,703377}, {other_system,5180330}]},
> >
>
> {vm_memory_high_watermark,0.8}, {vm_memory_limit,5828434329}, {disk_free_li
> mit,1000000000}, {disk_free,3953344512}, {file_descriptors,
> [{total_limit,924},{total_used,6},{sockets_limit,829},{sockets_used,4}]},
> > {processes,[{limit,1048576},
> {used,348}]}, {run_queue,0}, {uptime,3796}]...done.
> >
> > Note that I am using Erlang v16 from the Erlang Solutions repo
> (esl-erlang
> package). I originally was using Erlang v14 from Ubuntu repos (erlang-
> base), but decided to upgrade hoping it would resolve the issue (which it
> did not). I have also tried running RabbitMQ 3.0.4 without success. This
> issue seems to affect both versions.
> >
> > Thanks,
> > Karl
> > _______________________________________________
> > rabbitmq-discuss mailing listrabbitmq-discuss <at>
> lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-
> bin/mailman/listinfo/rabbitmq-discuss
> >
> >
> >
> >
> >
> >
> > <div>
> > <div dir="ltr">
> > <div>Karl,<br><br>Wild speculation would be that messages have been
> consumed from a queue but that queue (name = document_queue?) is for some
> reason missing in action (gone, inaccessible, ...) when the consumer
> subsequently tries to ack the message(s). <br><br>
> > </div>Brett<br><br>
> > </div>
> > <div class="gmail_extra">
> > <br><br><div class="gmail_quote">On Sat, May 18, 2013 at 3:20 PM, Karl
> Rieb <span dir="ltr"><<a href="mailto:karl.rieb at ..."
> target="_blank">karl.rieb <at> gmail.com</a>></span> wrote:<br>
> <blockquote class="gmail_quote">Hi,<div><br></div>
> > <div>I am running a RabbitMQ cluster of 4 nodes in AWS EC2 on c1.xlarge
> instances using Ubuntu 12.04 LTS (kernel 3.2.0-38). Periodically two
> of my nodes holding the largest queues will crash and require me to restart
> the rabbitmq-server service. Looking at the logs, I'm at a loss as to
> why the crash is occurring. I've have tried reproducing the crash
> manually by flooding my nodes with messages, but that doesn't seem to
> trigger the issue. Below is output from various RabbitMQ logs:</div>
> > <div><br></div>
> > <blockquote class="gmail_quote"># cat /var/log/rabbitmq/rabbit <at>
> ip-10-
> 123-123-123-sasl.log<br>
> > =CRASH REPORT==== 18-May-2013::01:41:51 ===<br> crasher:<br>
> initial call: gen:init_it/6<br> pid: <0.301.0>
> <br> registered_name: []<br> exception exit:
> {{case_clause,{empty,{[],[]}}},<br>
>
>
> [{rabbit_amqqueue_process,subtract_acks,3,
> []},<br>
> >
> {rabbit_amqqueue_process,subtract_acks,4,[]},<br>
>
>
> {rabbit_amqqueue_process,handle_cast,2,[]},<br>
> {gen_server2,handle_msg,2,
> []},<br>
> {proc_lib,wake_up,3,<br>
> >
> [{file,"proc_lib.erl"},
> {line,249}]}]}<br> in function
> gen_server2:terminate/3<br> ancestors:
> [rabbit_amqqueue_sup,rabbit_sup,<0.207.0>]<br> messages:
> [{'$gen_cast',{resume,<0.676.0>}},<br>
> >
> {'$gen_cast',{ack,[27583,27606,27629,27652,27675,27698,<br>
>
>
>
> 27721,27744,27767,27790,27813,27836,<br>
>
>
> 27859,27882,27905,27928,27951,27974,<br>
> >
>
> 27997,28020,28043,28066,28089,28112,<br>
>
>
> 28135,28158,28179,28200,28221,28242,<br>
>
>
>
> 28263,28284,28305,28324,28342,28360,<br>
> >
>
> 28378,28396,28414,28432,28450,28468,<br>
>
>
> 28486,28504,28522,28540,28558,28576,<br>
>
>
> 28594,28612],<br>
>
>
> <0.737.0>}},<br>
> >
> {'$gen_cast',{resume,<0.737.0>}},<br>
> {'$gen_cast',{ack,
> [27608,27631,27654,27677,27700,27723,<br>
>
>
> 27746,27769,27792,27815,27838,27861,<br>
> >
>
> 27884,27907,27930,27953,27976,27999,<br>
>
>
> 28022,28045,28068,28091,28114,28137,<br>
>
>
>
> 28160,28181,28202,28223,28244,28265,<br>
> >
>
> 28286,28307,28326,28344,28362,28380,<br>
>
>
> 28398,28416,28434,28452,28470,28488,<br>
>
>
>
> 28506,28524,28542,28560,28578,28596,<br>
> >
> 28614],
> <br>
> <0.676.0>}},
> <br>
> {'$gen_cast',{ack,[27578,27601,27624,27647,27670,27693,<br>
>
>
>
> 27716,27739,27762,27785,27808,27831,<br>
> >
>
> 27854,27877,27900,27923,27946,27969,<br>
>
>
> 27992,28015,28038,28061,28084,28107,<br>
>
>
>
> 28130,28153,28174,28195,28216,28237,<br>
> >
>
> 28258,28279,28300,28321,28340,28358,<br>
>
>
> 28376,28394,28412,28430,28448,28466,<br>
>
>
>
> 28484,28502,28520,28538,28556,28574,<br>
> >
> 28592,28610],
> <br>
> <0.716.0>}},
> <br>
> {'$gen_cast',{resume,<0.716.0>}},<br>
> {'$gen_cast',{ack,
> [27582,27605,27628,27651,27674,27697,<br>
> >
>
> 27720,27743,27766,27789,27812,27835,<br>
>
>
> 27858,27881,27904,27927,27950,27973,<br>
>
>
>
> 27996,28019,28042,28065,28088,28111,<br>
> >
>
> 28134,28157,28178,28199,28220,28241,<br>
>
>
> 28262,28283,28304,28323,28341,28359,<br>
>
>
>
> 28377,28395,28413,28431,28449,28467,<br>
> >
>
> 28485,28503,28521,28539,28557,28575,<br>
>
>
> 28593,28611],<br>
>
>
> <0.737.0>}}]<br> links:
> [<0.300.0>]<br>
> > dictionary: [{{ch,<0.729.0>},<br>
>
> {cr,<0.729.0>,#Ref<0.0.0.12296>,<br>
>
> {[28715,28696,28677,28658,28639,28620,28602,28584,<br>
>
> 28566,28548,28530,28512,28494,28476,28458,28440,<br>
> >
> 28422,28404,28386,28368,28350,28332,28313,28292,
> <br>
> 28271,28250,28229,28208,28187,28166,28143,28120,
> <br>
> 28097,28074,28051,28028,28005,27982,27959,27936,<br>
> >
> 27913,27890,27867,27844,27821,27798,27775,27752,
> <br>
> 27729,27706,27683,27660,27637,27614],<br>
>
> [27591]},<br>
> 1,<br>
> {[],[]},<br>
> >
> {qstate,<0.728.0>,active,{0,nil}},<br>
> 11}},
> <br>
> {{credit_to,<7350.877.0>},22},<br>
> {{ch,<0.608.0>},<br>
>
> {cr,<0.608.0>,#Ref<0.0.0.9041>,<br>
> >
> {[28708,28689,28670,28651,28632,28607,28589,28571,<br>
>
>
> 28553,28535,28517,28499,28481,28463,28445,28427,<br>
>
> 28409,28391,28373,28355,28337,28318,28297,28276,<br>
> >
> 28255,28234,28213,28192,28171,28150,28127,28104,
> <br>
> 28081,28058,28035,28012,27989,27966,27943,27920,
> <br>
> 27897,27874,27851,27828,27805,27782,27759,27736,<br>
> >
> 27713,27690,27667,27644,27621,27598],<br>
>
> [27575]},<br>
> 1,<br>
> {[],[]},<br>
>
> {qstate,<0.607.0>,active,{0,nil}},<br>
> >
> 38}},<br>
> {{ch,<0.624.0>},<br>
>
> {cr,<0.624.0>,#Ref<0.0.0.9436>,<br>
>
> {[28722,28703,28684,28665,28646],[28627]},<br>
> 1,<br>
> >
> {[],[]},<br>
> {qstate,<0.623.0>,active,{0,nil}},
> <br>
> 36}},<br>
> {{ch,<0.696.0>},<br>
>
> {cr,<0.696.0>,#Ref<0.0.0.11414>,<br>
> >
> {[],[]},<br>
> 2,<br>
> {[],[]},<br>
>
> {qstate,<0.695.0>,active,{0,nil}},<br>
> 24}},<br>
>
> {{ch,<0.564.0>},<br>
> >
> {cr,<0.564.0>,#Ref<0.0.0.7285>,<br>
>
> {[28718,28699,28680,28661,28642],[28623]},<br>
> 1,<br>
> {[],
> []},<br>
> {qstate,<0.563.0>,active,{0,nil}},<br>
> >
> 12}},<br>
> {{ch,<0.644.0>},<br>
>
> {cr,<0.644.0>,#Ref<0.0.0.10071>,<br>
>
> {[28713,28694,28675,28656,28637,28618,28600,28582,<br>
>
> 28564,28546,28528,28510,28492,28474,28456,28438,<br>
> >
> 28420,28402,28384,28366,28348,28330,28311,28290,
> <br>
> 28269,28248,28227,28206,28185,28164,28141,28118,
> <br>
> 28095,28072,28049,28026,28003,27980,27957,27934,<br>
> >
> 27911,27888,27865,27842,27819,27796,27773,27750,
> <br>
> 27727,27704,27681,27658,27635,27612],<br>
>
> [27589]},<br>
> 1,<br>
> {[],[]},<br>
> >
> {qstate,<0.643.0>,active,{0,nil}},<br>
> 11}},
> <br>
> {{ch,<0.676.0>},<br>
> {cr,<0.676.0>,#Ref<0.0.0.11309>,<br>
>
>
> {[28614,28613,28596,28595,28578,28577,28560,28559,<br>
> >
> 28542,28541,28524,28523,28506,28505,28488,28487,
> <br>
> 28470,28469,28452,28451,28434,28433,28416,28415,
> <br>
> 28398,28397,28380,28379,28362,28361,28344,28343,<br>
> >
> 28326,28325,28307,28306,28286,28285,28265,28264,
> <br>
> 28244,28243,28223,28222,28202,28201,28181,28180,
> <br>
> 28160,28159,28137,28136,28114,28113,28091,28090,<br>
> >
> 28068,28067,28045,28044,28022,28021,27999,27998,
> <br>
> 27976,27975,27953,27952,27930,27929,27907,27906,
> <br>
> 27884,27883,27861,27860,27838,27837,27815,27814,<br>
> >
> 27792,27791,27769,27768,27746,27745,27723,27722,
> <br>
> 27700,27699,27677,27676,27654,27653,27631,27630,
> <br>
> 27608,27607,27585],<br>
> [27584]},<br>
> >
> 2,<br>
> {[{<0.676.0>,<br>
>
> {consumer,<<"amq.ctag-0cvDGmyaAbEW_j15AfhGHQ">>,<br>
>
>
> true}}],<br>
> >
> [{<0.676.0>,<br>
>
> {consumer,<<"amq.ctag-YqU04SIfQv6P-QVueEjbRg">>,<br>
>
>
> true}}]},<br>
>
> {qstate,<0.675.0>,suspended,{0,nil}},<br>
> >
> 25}},<br>
> {{xtype_to_module,direct},rabbit_exchange_type_direct},
> <br>
> {{credit_to,<7350.833.0>},31},<br>
> {{ch,<0.745.0>},<br>
>
> {cr,<0.745.0>,#Ref<0.0.0.12674>,<br>
> >
> {[28716,28697,28678,28659,28640,28621,28603,28585,<br>
>
>
> 28567,28549,28531,28513,28495,28477,28459,28441,<br>
>
> 28423,28405,28387,28369,28351,28333,28314,28293,<br>
> >
> 28272,28251,28230,28209,28188,28167,28144,28121,
> <br>
> 28098,28075,28052,28029,28006,27983,27960,27937,
> <br>
> 27914,27891,27868,27845,27822,27799,27776,27753,<br>
> >
> 27730,27707,27684,27661,27638,27615],<br>
>
> [27592]},<br>
> 1,<br>
> {[],[]},<br>
>
> {qstate,<0.744.0>,active,{0,nil}},<br>
> >
> 37}},<br>
> {{ch,<0.660.0>},<br>
>
> {cr,<0.660.0>,#Ref<0.0.0.10650>,<br>
>
> {[28712,28693,28674,28655,28636,28617,28599,28581,<br>
>
> 28563,28545,28527,28509,28491,28473,28455,28437,<br>
> >
> 28419,28401,28383,28365,28347,28329,28310,28289,
> <br>
> 28268,28247,28226,28205,28184,28163,28140,28117,
> <br>
> 28094,28071,28048,28025,28002,27979,27956,27933,<br>
> >
> 27910,27887,27864,27841,27818,27795,27772,27749,
> <br>
> 27726,27703,27680,27657,27634,27611],<br>
>
> [27588]},<br>
> 1,<br>
> {[],[]},<br>
> >
> {qstate,<0.659.0>,active,{0,nil}},<br>
> 11}},
> <br>
> {{credit_to,<7350.837.0>},32},<br>
> {{credit_to,<7350.881.0>},17},<br>
>
> {{ch,<0.737.0>},<br>
> >
> {cr,<0.737.0>,#Ref<0.0.0.12429>,<br>
>
> {[28612,28611,28594,28593,28576,28575,28558,28557,<br>
>
> 28540,28539,28522,28521,28504,28503,28486,28485,<br>
> >
> 28468,28467,28450,28449,28432,28431,28414,28413,
> <br>
> 28396,28395,28378,28377,28360,28359,28342,28341,
> <br>
> 28324,28323,28305,28304,28284,28283,28263,28262,<br>
> >
> 28242,28241,28221,28220,28200,28199,28179,28178,
> <br>
> 28158,28157,28135,28134,28112,28111,28089,28088,
> <br>
> 28066,28065,28043,28042,28020,28019,27997,27996,<br>
> >
> 27974,27973,27951,27950,27928,27927,27905,27904,
> <br>
> 27882,27881,27859,27858,27836,27835,27813,27812,
> <br>
> 27790,27789,27767,27766,27744,27743,27721,27720,<br>
> >
> 27698,27697,27675,27674,27652,27651,27629,27628,
> <br>
> 27606,27605,27583],<br>
> [27582]},<br>
>
> 2,<br>
> {[{<0.737.0>,<br>
> >
> {consumer,<<"amq.ctag-
> ITZqhulzHhUzx1uy1Eot3g">>,<br>
>
>
> true}}],<br>
> [{<0.737.0>,<br>
>
>
> {consumer,<<"amq.ctag-hpo5ejdoJRdLrT6JS42jGw">>,<br>
> >
> true}}]},<br>
>
>
> {qstate,<0.736.0>,suspended,{0,nil}},<br>
> 11}},<br>
>
> {credit_blocked,[]},
> <br>
> {{credit_to,<7350.854.0>},3},<br>
> >
> {{ch,<0.716.0>},<br>
> {cr,<0.716.0>,#Ref<0.0.0.11820>,<br>
>
>
> {[28610,28609,28592,28591,28574,28573,28556,28555,<br>
>
> 28538,28537,28520,28519,28502,28501,28484,28483,<br>
> >
> 28466,28465,28448,28447,28430,28429,28412,28411,
> <br>
> 28394,28393,28376,28375,28358,28357,28340,28339,
> <br>
> 28321,28320,28300,28299,28279,28278,28258,28257,<br>
> >
> 28237,28236,28216,28215,28195,28194,28174,28173,
> <br>
> 28153,28152,28130,28129,28107,28106,28084,28083,
> <br>
> 28061,28060,28038,28037,28015,28014,27992,27991,<br>
> >
> 27969,27968,27946,27945,27923,27922,27900,27899,
> <br>
> 27877,27876,27854,27853,27831,27830,27808,27807,
> <br>
> 27785,27784,27762,27761,27739,27738,27716,27715,<br>
> >
> 27693,27692,27670,27669,27647,27646,27624,27623,
> <br>
> 27601,27600,27578],<br>
> [27577]},<br>
>
> 2,<br>
> {[{<0.716.0>,<br>
> >
> {consumer,<<"amq.ctag-BTlS-GmbXNpcU-
> wUlMI60Q">>,<br>
>
>
> true}}],<br>
> [{<0.716.0>,<br>
>
> {consumer,<<"amq.ctag-loc5monF8M1pz_Bb8ypumw">>,<br>
> >
> true}}]},<br>
>
>
> {qstate,<0.715.0>,suspended,{0,nil}},<br>
> 44}},<br>
>
> {{ch,<0.782.0>},<br>
> {cr,<0.782.0>,#Ref<0.0.0.15649>,<br>
> >
> {[28714,28695,28676,28657,28638,28619,28601,28583,<br>
>
>
> 28565,28547,28529,28511,28493,28475,28457,28439,<br>
>
> 28421,28403,28385,28367,28349,28331,28312,28291,<br>
> >
> 28270,28249,28228,28207,28186,28165,28142,28119,
> <br>
> 28096,28073,28050,28027,28004,27981,27958,27935,
> <br>
> 27912,27889,27866,27843,27820,27797,27774,27751,<br>
> >
> 27728,27705,27682,27659,27636,27613],<br>
>
> [27590]},<br>
> 1,<br>
> {[],[]},<br>
>
> {qstate,<0.781.0>,active,{0,nil}},<br>
> >
> 35}},<br>
> {{credit_to,<7350.846.0>},22},<br>
> {{ch,<0.684.0>},<br>
>
> {cr,<0.684.0>,#Ref<0.0.0.11198>,<br>
>
> {[28711,28692,28673,28654,28635,28616,28598,28580,<br>
> >
> 28562,28544,28526,28508,28490,28472,28454,28436,
> <br>
> 28418,28400,28382,28364,28346,28328,28309,28288,
> <br>
> 28267,28246,28225,28204,28183,28162,28139,28116,<br>
> >
> 28093,28070,28047,28024,28001,27978,27955,27932,
> <br>
> 27909,27886,27863,27840,27817,27794,27771,27748,
> <br>
> 27725,27702,27679,27656,27633,27610],<br>
>
> [27587]},<br>
> >
> 1,<br>
> {[],[]},<br>
>
> {qstate,<0.683.0>,active,{0,nil}},<br>
> 11}},<br>
>
> {{ch,<0.692.0>},<br>
> {cr,<0.692.0>,#Ref<0.0.0.11088>,<br>
> >
> {[28710,28691,28672,28653,28634,28615,28597,28579,<br>
>
>
> 28561,28543,28525,28507,28489,28471,28453,28435,<br>
>
> 28417,28399,28381,28363,28345,28327,28308,28287,<br>
> >
> 28266,28245,28224,28203,28182,28161,28138,28115,
> <br>
> 28092,28069,28046,28023,28000,27977,27954,27931,
> <br>
> 27908,27885,27862,27839,27816,27793,27770,27747,<br>
> >
> 27724,27701,27678,27655,27632,27609],<br>
>
> [27586]},<br>
> 1,<br>
> {[],[]},<br>
>
> {qstate,<0.691.0>,active,{0,nil}},<br>
> >
> 12}},<br>
> {{ch,<0.648.0>},<br>
>
> {cr,<0.648.0>,#Ref<0.0.0.10283>,<br>
>
> {[28717,28698,28679,28660,28641],[28622]},<br>
> 1,<br>
> >
> {[],[]},<br>
> {qstate,<0.647.0>,active,{0,nil}},
> <br>
> 12}},<br>
> {guid,{{<a href="tel:2296174761" value="+12296174761"
> target="_blank">2296174761</a>,591959305,1562876388,1822892473},1}},<br>
> >
> {{ch,<0.720.0>},<br>
> {cr,<0.720.0>,#Ref<0.0.0.12036>,<br>
>
>
> {[28723,28704,28685,28666,28647],[28628]},<br>
> 1,<br>
> {[],
> []},<br>
> >
> {qstate,<0.719.0>,active,{0,nil}},<br>
> 37}},
> <br>
> {{credit_to,<7350.869.0>},39},<br>
> {{ch,<0.636.0>},<br>
>
> {cr,<0.636.0>,#Ref<0.0.0.9960>,<br>
> >
> {[28705,28686,28667,28648,28629,28604,28586,28568,<br>
>
>
> 28550,28532,28514,28496,28478,28460,28442,28424,<br>
>
> 28406,28388,28370,28352,28334,28315,28294,28273,<br>
> >
> 28252,28231,28210,28189,28168,28147,28124,28101,
> <br>
> 28078,28055,28032,28009,27986,27963,27940,27917,
> <br>
> 27894,27871,27848,27825,27802,27779,27756,27733,<br>
> >
> 27710,27687,27664,27641,27618,27595],<br>
>
> [27572]},<br>
> 1,<br>
> {[],[]},<br>
>
> {qstate,<0.635.0>,active,{0,nil}},<br>
> >
> 15}},<br>
> {{credit_to,<7350.850.0>},50},<br>
> {{ch,<0.616.0>},<br>
>
> {cr,<0.616.0>,#Ref<0.0.0.9300>,<br>
>
> {[28719,28700,28681,28662,28643],[28624]},<br>
> >
> 1,<br>
> {[],[]},<br>
>
> {qstate,<0.615.0>,active,{0,nil}},<br>
> 12}},<br>
>
> {{ch,<0.509.0>},<br>
> {cr,<0.509.0>,#Ref<0.0.0.2243>,<br>
> >
> {[28302,28281,28260,28239,28218,28197,28176,28155,<br>
>
>
> 28132,28109,28086,28063,28040,28017,27994,27971],<br>
>
> [27580,27603,27626,27649,27672,27695,27718,27741,<br>
> >
> 27764,27787,27810,27833,27856,27879,27902,27925,
> <br>
> 27948]},<br>
> 3,<br>
> {[],[]},<br>
>
>
> {qstate,<0.508.0>,active,{0,nil}},<br>
> >
> 36}},<br>
> {{credit_to,<7350.873.0>},17},<br>
> {{ch,<0.588.0>},<br>
>
> {cr,<0.588.0>,#Ref<0.0.0.7808>,<br>
>
> {[28721,28702,28683,28664,28645],[28626]},<br>
> >
> 1,<br>
> {[],[]},<br>
>
> {qstate,<0.587.0>,active,{0,nil}},<br>
> 12}},<br>
>
> {{ch,<0.664.0>},<br>
> {cr,<0.664.0>,#Ref<0.0.0.10772>,<br>
> >
> {[28720,28701,28682,28663,28644],[28625]},<br>
> 1,
> <br>
> {[],[]},<br>
> {qstate,<0.663.0>,active,{0,nil}},
> <br>
> 13}},<br>
> >
> {fhc_age_tree,{0,nil}},<br>
> {{ch,<0.708.0>},<br>
>
> {cr,<0.708.0>,#Ref<0.0.0.11695>,<br>
>
> {[28707,28688,28669,28650,28631,28606,28588,28570,<br>
> >
> 28552,28534,28516,28498,28480,28462,28444,28426,
> <br>
> 28408,28390,28372,28354,28336,28317,28296,28275,
> <br>
> 28254,28233,28212,28191,28170,28149,28126,28103,<br>
> >
> 28080,28057,28034,28011,27988,27965,27942,27919,
> <br>
> 27896,27873,27850,27827,27804,27781,27758,27735,
> <br>
> 27712,27689,27666,27643,27620,27597],<br>
>
> [27574]},<br>
> >
> 1,<br>
> {[],[]},<br>
>
> {qstate,<0.707.0>,active,{0,nil}},<br>
> 35}},<br>
>
> {{credit_to,<7350.829.0>},13},<br>
> >
> {{ch,<0.757.0>},<br>
> {cr,<0.757.0>,#Ref<0.0.0.13150>,<br>
>
>
> {[28706,28687,28668,28649,28630,28605,28587,28569,<br>
>
> 28551,28533,28515,28497,28479,28461,28443,28425,<br>
> >
> 28407,28389,28371,28353,28335,28316,28295,28274,
> <br>
> 28253,28232,28211,28190,28169,28148,28125,28102,
> <br>
> 28079,28056,28033,28010,27987,27964,27941,27918,<br>
> >
> 27895,27872,27849,27826,27803,27780,27757,27734,
> <br>
> 27711,27688,27665,27642,27619,27596],<br>
>
> [27573]},<br>
> 1,<br>
> {[],[]},<br>
> >
> {qstate,<0.756.0>,active,{0,nil}},<br>
> 49}},
> <br>
> {{ch,<0.572.0>},<br>
> {cr,<0.572.0>,#Ref<0.0.0.7536>,<br>
>
>
> {[28709,28690,28671,28652,28633,28608,28590,28572,<br>
> >
> 28554,28536,28518,28500,28482,28464,28446,28428,
> <br>
> 28410,28392,28374,28356,28338,28319,28298,28277,
> <br>
> 28256,28235,28214,28193,28172,28151,28128,28105,<br>
> >
> 28082,28059,28036,28013,27990,27967,27944,27921,
> <br>
> 27898,27875,27852,27829,27806,27783,27760,27737,
> <br>
> 27714,27691,27668,27645,27622,27599],<br>
>
> [27576]},<br>
> >
> 1,<br>
> {[],[]},<br>
>
> {qstate,<0.571.0>,active,{0,nil}},<br>
> 35}},<br>
>
> {{credit_to,<7350.825.0>},2},<br>
> {{credit_to,<7350.858.0>},3}]<br>
> > trap_exit: true<br> status: running<br>
> heap_size: 121536<br> stack_size: 27<br>
> reductions: 9363545<br> neighbours:<br><br>=SUPERVISOR REPORT==== 18-
> May-2013::01:41:51 ===<br> Supervisor:
> {local,rabbit_amqqueue_sup}<br>
> > Context: child_terminated<br>
> Reason:
> {{case_clause,{empty,{[],[]}}},<br>
>
> [{rabbit_amqqueue_process,subtract_acks,3,[]},<br>
>
> {rabbit_amqqueue_process,subtract_acks,4,[]},<br>
> >
> {rabbit_amqqueue_process,handle_cast,2,[]},<br>
> {gen_server2,handle_msg,2,[]},
> <br>
> {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,249}]}]}<br>
> Offender: [{pid,<0.301.0>},<br>
> >
> {name,rabbit_amqqueue},<br>
> {mfa,<br>
> {rabbit_amqqueue_process,start_link,<br>
>
>
> [{amqqueue,<br>
>
> {resource,<<"/">>,queue,<br>
> >
>
> <<"document_queue">>},<br>
>
>
> true,false,none,[],<0.7622.0>,[],[],undefined,<br>
>
>
> []}]}},<br>
> {restart_type,temporary},<br>
> >
> {shutdown,4294967295},<br>
> {child_type,worker}]</blockquote>
> > <div><br></div>
> > <div>Notice how the case_clause is empty. The other rabbit log
> looks
> similar:</div>
> > <div><br></div>
> > <blockquote class="gmail_quote"># cat /var/log/rabbitmq/rabbit at ...
> </blockquote>
> > <blockquote class="gmail_quote"> </blockquote>
> > <blockquote class="gmail_quote">
> > =ERROR REPORT==== 18-May-2013::01:41:48 ===</blockquote>
> > <blockquote class="gmail_quote">** Generic server <0.301.0>
> terminating</blockquote>
> > <blockquote class="gmail_quote">** Last message in was {'$gen_cast',
> </blockquote>
> > <blockquote class="gmail_quote">
> >
> {ack,</blockquote>
> > <blockquote class="gmail_quote">
>
> [27585,27584,27607,27630,27653,27676,27699,</blockquote>
> > <blockquote class="gmail_quote">
>
> 27722,27745,27768,27791,27814,27837,27860,</blockquote>
> > <blockquote class="gmail_quote">
>
> 27883,27906,27929,27952,27975,27998,28021,</blockquote>
> > <blockquote class="gmail_quote">
>
> 28044,28067,28090,28113,28136,28159,28180,</blockquote>
> > <blockquote class="gmail_quote">
>
> 28201,28222,28243,28264,28285,28306,28325,</blockquote>
> > <blockquote class="gmail_quote">
>
> 28343,28361,28379,28397,28415,28433,28451,</blockquote>
> > <blockquote class="gmail_quote">
>
> 28469,28487,28505,28523,28541,28559,28577,</blockquote>
> > <blockquote class="gmail_quote">
>
> 28595,28613],</blockquote>
> > <blockquote class="gmail_quote">
>
> <0.676.0>}}</blockquote>
> > <blockquote class="gmail_quote">** When Server state == {q,</blockquote>
> > <blockquote class="gmail_quote">
> >
> {amqqueue,</blockquote>
> > <blockquote class="gmail_quote">
>
> {resource,<<"/">>,queue,</blockquote>
> > <blockquote class="gmail_quote">
>
> <<"document_queue">>},</blockquote>
> > <blockquote class="gmail_quote">
> true,false,none,
> [],<0.301.0>,[],[],undefined,[]},</blockquote>
> > <blockquote class="gmail_quote">
>
> none,true,rabbit_variable_queue,</blockquote>
> > <blockquote class="gmail_quote">
> {vqstate,</blockquote>
> > <blockquote class="gmail_quote">
> >
> {0,{[],[]}},</blockquote>
> > <blockquote class="gmail_quote">
> {0,{[],[]}},</blockquote>
> > <blockquote class="gmail_quote">
>
> {delta,undefined,0,undefined},</blockquote>
> > <blockquote class="gmail_quote">
> {0,{[],[]}},</blockquote>
> > <blockquote class="gmail_quote">
> >
> {0,{[],[]}}, </blockquote>
> > <blockquote class="gmail_quote">
> ... [ thousands of lines
> of
> state ] ... </blockquote>
> > <blockquote class="gmail_quote"> </blockquote>
> > <blockquote class="gmail_quote">
> > ** Reason for termination ==</blockquote>
> > <blockquote class="gmail_quote">** {{case_clause,{empty,{[],[]}}},
> </blockquote>
> > <blockquote class="gmail_quote">
> [{rabbit_amqqueue_process,subtract_acks,3,[]},</blockquote>
> > <blockquote class="gmail_quote">
> {rabbit_amqqueue_process,subtract_acks,4,[]},</blockquote>
> > <blockquote class="gmail_quote">
> {rabbit_amqqueue_process,handle_cast,2,[]},</blockquote>
> > <blockquote class="gmail_quote">
> {gen_server2,handle_msg,2,[]},</blockquote>
> > <blockquote class="gmail_quote">
> > {proc_lib,wake_up,3,[{file,"proc_lib.erl"},
> {line,249}]}]}</blockquote>
> > <blockquote class="gmail_quote">
> > <br>
> > </blockquote>
> > <blockquote class="gmail_quote">=ERROR REPORT==== 18-May-2013::01:41:51
> ===</blockquote>
> > <blockquote class="gmail_quote">connection <0.430.0>, channel 22 -
> soft error:</blockquote>
> > <blockquote class="gmail_quote">{amqp_error,not_found,</blockquote>
> > <blockquote class="gmail_quote">
> > "no queue 'document_queue' in
> vhost '/'",</blockquote>
> > <blockquote class="gmail_quote">
> > 'queue.declare'}</blockquote>
> > <blockquote class="gmail_quote"><br></blockquote>
> > <blockquote class="gmail_quote">=ERROR REPORT==== 18-May-2013::01:41:51
> ===</blockquote>
> > <blockquote class="gmail_quote">
> > connection <0.384.0>, channel 12 - soft error:</blockquote>
> > <blockquote class="gmail_quote">
> > {amqp_error,not_found,</blockquote>
> > <blockquote class="gmail_quote">
> "no queue 'document_queue' in vhost '/'",</blockquote>
> > <blockquote class="gmail_quote">
> 'queue.declare'}</blockquote>
> > <blockquote class="gmail_quote">
> > <br>
> > </blockquote>
> > <blockquote class="gmail_quote">=ERROR REPORT==== 18-May-2013::01:41:51
> ===</blockquote>
> > <blockquote class="gmail_quote">connection <0.384.0>, channel 15 -
> soft error:</blockquote>
> > <blockquote class="gmail_quote">{amqp_error,not_found,</blockquote>
> > <blockquote class="gmail_quote">
> > "no queue 'document_queue' in
> vhost '/'",</blockquote>
> > <blockquote class="gmail_quote">
> > 'queue.declare'}</blockquote>
> > <blockquote class="gmail_quote"><br></blockquote>
> > <blockquote class="gmail_quote">=ERROR REPORT==== 18-May-2013::01:41:51
> ===</blockquote>
> > <blockquote class="gmail_quote">
> > connection <0.373.0>, channel 23 - soft error:</blockquote>
> > <blockquote class="gmail_quote">
> > {amqp_error,not_found,</blockquote>
> > <blockquote class="gmail_quote">
> "no queue 'document_queue' in vhost '/'",</blockquote>
> > <blockquote class="gmail_quote">
> 'queue.declare'} </blockquote>
> > <blockquote class="gmail_quote">
> > ... [ repeated a bunch of times ] ...</blockquote>
> > <div><br></div>
> > <div>I also see this in my /var/log/syslog with regards to beam.smp:
> </div>
> > <div><br></div>
> > <blockquote class="gmail_quote">
> > # cat /var/log/syslog<br>May 17 22:46:23 ip-10-123-123-123 kernel:
> [244800.176041] INFO: task beam.smp:18971 blocked for more than 120
> seconds.
> <br>May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176053] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br>
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176060] beam.smp
> D ffff8801bfd93700 0 18971 18962
> 0x00000000<br>May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176065]
> ffff8801b11cdcb8 0000000000000282 0000000000000000
> ffffffffffffffe0<br>
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176070]
> ffff8801b11cdfd8 ffff8801b11cdfd8 ffff8801b11cdfd8
> 0000000000013700<br>May 17 22:46:23 ip-10-123-123-123 kernel:
> [244800.176075] ffff8801b3180000 ffff8801b35244a0 00007f500e2079e0
> ffff8801b146bb80<br>
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176079] Call Trace:
> <br>May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176089]
> [<ffffffff8165434f>] schedule+0x3f/0x60<br>May 17 22:46:23 ip-10-123-
> 123-123 kernel: [244800.176094] [<ffffffff8106aef5>]
> exit_mm+0x85/0x130<br>
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176097]
> [<ffffffff8106b10e>] do_exit+0x16e/0x450<br>May 17 22:46:23
> ip-10-123-
> 123-123 kernel: [244800.176102] [<ffffffff810797aa>] ?
> __dequeue_signal+0x6a/0xb0<br>
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176105]
> [<ffffffff8106b594>] do_group_exit+0x44/0xa0<br>May 17 22:46:23
> ip-10-
> 123-123-123 kernel: [244800.176108] [<ffffffff8107c36c>]
> get_signal_to_deliver+0x21c/0x420<br>
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176113]
> [<ffffffff81014825>] do_signal+0x45/0x130<br>May 17 22:46:23 ip-10-
> 123-123-123 kernel: [244800.176116] [<ffffffff8105539d>] ?
> set_next_entity+0xad/0xd0<br>
> > May 17 22:46:23 ip-10-123-123-123 kernel: [244800.176120]
> [<ffffffff81014ad5>] do_notify_resume+0x65/0x80<br>May 17 22:46:23
> ip-
> 10-123-123-123 kernel: [244800.176124] [<ffffffff8165ea90>]
> int_signal+0x12/0x17<br>... [ repeated 9 more times with same timestamp ]
> ...</blockquote>
> > <div><br></div>
> > <div>There are no strange CPU spikes on my EC2 node based on AWS
> monitoring. The only thing that seems to stand out is that my writes to my
> EBS volume spike during the crash (~90 write OPs/sec, 4 MiB/sec write
> bandwidth, ~30ms/op write latency).</div>
> > <div><br></div>
> > <div>My team and I are completely stumped and would appreciate any
> insight
> you might have to the crash reports. Is this a problem with writes
> hanging when going against the EBS volume since EBS volumes are essentially
> network mounted? Is there a way to configure rabbit to be more
> tolerant of these delays? </div>
> > <div><br></div>
> > <div><br></div>
> > <div>For debugging purposes, below is the status output of one of the
> nodes when it is healthy (so you can see the versions of everything i'm
> running):<br>
> > </div>
> > <div><br></div>
> > <blockquote class="gmail_quote">
> > # rabbitmqctl status<br>Status of node 'rabbit <at> ip-10-123-123-123'
> ...
> <br>[{pid,28978},<br> {running_applications,<br>
> [{rabbitmq_management,"RabbitMQ Management Console","3.1.0"},<br>
> {rabbitmq_web_dispatch,"RabbitMQ Web
> Dispatcher","3.1.0"},<br>
> > {webmachine,"webmachine","1.9.1-rmq3.1.0-
> git52e62bc"},<br> {mochiweb,"MochiMedia Web
> Server","2.3.1-rmq3.1.0-gitd541e9a"},<br>
> {rabbitmq_management_agent,"RabbitMQ Management Agent","3.1.0"},<br>
> > {rabbit,"RabbitMQ","3.1.0"},<br>
> {os_mon,"CPO CXC 138 46","2.2.11"},<br>
> {inets,"INETS CXC 138 49","5.9.4"},<br>
> {mnesia,"MNESIA CXC 138 12","4.8"},<br>
> > {amqp_client,"RabbitMQ AMQP Client","3.1.0"},
> <br> {xmerl,"XML parser","1.3.3"},<br>
> {sasl,"SASL CXC 138 11","2.3.1"},<br>
> {stdlib,"ERTS CXC 138 10","1.19.1"},<br>
> > {kernel,"ERTS CXC 138 10","2.16.1"}]},
> <br> {os,{unix,linux}},<br> {erlang_version,<br>
> "Erlang R16B (erts-5.10.1) [source-05f1189] [64-bit] [smp:8:8]
> [async-
> threads:30] [hipe] [kernel-poll:true]\n"},<br>
> > {memory,<br> [{total,60179048},<br>
> {connection_procs,5274808},<br> {queue_procs,<a
> href="tel:13107264" value="+13107264" target="_blank">13107264</a>},
> <br> {plugins,354272},<br>
> {other_proc,9741830},<br>
> > {mnesia,150456},<br>
> {mgmt_db,12248},<br> {msg_index,34768},<br>
>
> {other_ets,1148248},<br> {binary,4905584},
> <br> {code,19565863},<br>
> {atom,703377},<br> {other_system,5180330}]},<br>
> > {vm_memory_high_watermark,0.8},<br>
> {vm_memory_limit,5828434329},<br> {disk_free_limit,1000000000},
> <br> {disk_free,3953344512},<br> {file_descriptors,<br>
> [{total_limit,924},{total_used,6},{sockets_limit,829},
> {sockets_used,4}]},<br>
> > {processes,[{limit,1048576},{used,348}]},<br> {run_queue,0},
> <br> {uptime,3796}]<br>...done.</blockquote>
> > <div><br></div>
> > <div>Note that I am using Erlang v16 from the Erlang Solutions repo (esl-
> erlang package). I originally was using Erlang v14 from Ubuntu repos
> (erlang-base), but decided to upgrade hoping it would resolve the issue
> (which it did not). I have also tried running RabbitMQ 3.0.4 without
> success. This issue seems to affect both versions.</div>
> > <div><br></div>
> > <div>Thanks,</div>
> > <div>Karl</div>
> > <br>_______________________________________________<br>
> > rabbitmq-discuss mailing list<br><a href="mailto:rabbitmq-
> discuss at ...">rabbitmq-discuss <at> lists.rabbitmq.com</a><br><a
> href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> "
> target="_blank">https://lists.rabbitmq.com/cgi-
> bin/mailman/listinfo/rabbitmq-discuss</a><br><br>
> > </blockquote>
> > </div>
> > <br>
> > </div>
> > </div>
> >
>
> Hi Brett,
>
> That would explain why all hell breaks lose when the acks are being
> subtracted, but I still don't understand what is causing the queue to
> disappear in the first place. The only administrative actions performed on
> the queues in all our code base are declares or passive declares (to check
> for queue existence). We are just using Spring AMQP's
> SimpleMessageListenerContainer (which, btw, the code for that class is
> anything but "Simple") for consumers, and I've scanned through that code
> and
> didn't find any place where it is dropping or destroying a queue or
> binding.
>
> Is there are known bug where rabbit will cause a queue to disappear? All
> my
> queues are durable and my messages are NOT persistent (we need high
> throughput and loosing a few messages is not a big deal). I made the queue
> durable after we first started noticing the crashes so the queue would come
> back up on node restart. But durability doesn't seem to protect it from
> disappearing.
>
> Any thoughts on this would be greatly, greatly appreciated.
>
> -Karl
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130520/71043d8f/attachment.htm>
More information about the rabbitmq-discuss
mailing list