[rabbitmq-discuss] queue_disappeared?

Tue Jul 7 20:38:51 BST 2009

> So I was running some tests of my system with Rabbit 1.5.3, and all of
> a sudden the system died because a central queue "disappeared".  The
> error message in the rabbit.log is this:
>
> =ERROR REPORT==== 7-Jul-2009::14:03:53 ===
> connection <0.15624.1> (running), channel 1 - error:
> {amqp,internal_error,
>       "commit failed:
> [{exit,{queue_disappeared,<0.15429.1>}}]",'tx.commit'}

Ok, so that was the first thing that I saw  from reading the log from
the bottom up.  Going from the top down is probably more useful.  The
full(-ish) log of what happened is:

=ERROR REPORT==== 7-Jul-2009::14:03:12 ===
** Generic server <0.15604.1> terminating
** Last message in was {commit,{{1,<0.15636.1>},14828}}
** When Server state == {q,{amqqueue,
                               {resource,<<"/">>,queue,<<"DocSearchEmails">>},
                               true,false,[],none},
                           none,none,true,3036,
                           {[],[]},
                           {[{<0.15729.1>,
                              {consumer,
                                  <<"amq.ctag-ONnWMrwJlReM+3ObdgstOA==">>,
                                  true}}],
                            []}}
** Reason for termination ==
** {timeout,
       {gen_server,call,
           [rabbit_persister,
            {commit_transaction,
                {{{1,<0.15636.1>},14828},
                 {resource,<<"/">>,queue,<<"DocSearchEmails">>}}}]}}

=ERROR REPORT==== 7-Jul-2009::14:03:14 ===
connection <0.6457.2> (running), channel 1 - error:
{{timeout,{gen_server,call,[<0.15508.1>,{basic_get,<0.6461.2>,false}]}},
 [{gen_server,call,2},
  {rabbit_misc,with_exit_handler,2},
  {rabbit_channel,handle_method,3},
  {rabbit_channel,handle_message,2},
  {buffering_proxy,'-mainloop/4-fun-0-',3},
  {lists,foldl,3},
  {buffering_proxy,mainloop,4}]}

=WARNING REPORT==== 7-Jul-2009::14:03:14 ===
Non-AMQP exit reason '{{timeout,
                           {gen_server,call,
                               [<0.15508.1>,{basic_get,<0.6461.2>,false}]}},
                       [{gen_server,call,2},
                        {rabbit_misc,with_exit_handler,2},
                        {rabbit_channel,handle_method,3},
                        {rabbit_channel,handle_message,2},
                        {buffering_proxy,'-mainloop/4-fun-0-',3},
                        {lists,foldl,3},
                        {buffering_proxy,mainloop,4}]}'

=INFO REPORT==== 7-Jul-2009::14:03:14 ===
closing TCP connection <0.6457.2> from 127.0.0.1:55212

=ERROR REPORT==== 7-Jul-2009::14:03:14 ===
** Generic server <0.15508.1> terminating
** Last message in was {commit,{{1,<0.15636.1>},14828}}
** When Server state == {q,{amqqueue,{resource,<<"/">>,queue,<<"EmailCount">>},
                                     true,false,[],none},
                           none,none,false,4306,
                           {[],[]},
                           {[],[]}}
** Reason for termination ==
** {timeout,
       {gen_server,call,
           [rabbit_persister,
            {commit_transaction,
                {{{1,<0.15636.1>},14828},
                 {resource,<<"/">>,queue,<<"EmailCount">>}}}]}}

And then a few more dumps of queues that were _really_ huge.  They
appear to have every single message that was in each of the queues,
which I guess is what an erlang server state dump usually shows.  Not
all of my queues died, and neither of my largest queues died.  The
server was only using about 1GB of RAM when the queues started dieing,
so it shouldn't have been an out of memory condition.  It might have
been swapping though.  Could that cause the timeouts that then caused
the cascading process deaths?