[rabbitmq-discuss] RabbitMQ crashed in ets:insert_new - looks like a genuine bug...

Matthew Sackman matthew at rabbitmq.com
Fri Aug 12 11:26:51 BST 2011


Hi,

On Fri, Aug 12, 2011 at 09:06:48AM +0100, Matthew Sackman wrote:
> Which version of Rabbit is this and which version of Erlang, and on what
> platform?

Well I've figured out it's some form of Windows given the c:\ paths...

> >     exception exit: {badarg,
> >                         [{ets,insert_new,
> >                              [303172,
> 
> What that means is that there's been an attempt to insert into a non
> existant table. This normally suggests the msg_store has exited for
> other reasons and there are still queues trying to write to the
> msg_store's tables. This could indicate other bugs, but this itself is
> unlikely to be the root cause.

Indeed, and having tracked through your logs, I've found what caused the
msg_store to exit (when the msg_store exits, its tables, which are
publically used, die with it, which then cause all the queues to die
when they next try to use said tables):

=ERROR REPORT==== 11-Aug-2011::19:57:26 ===
** Generic server msg_store_persistent terminating
** Last message in was {'$gen_cast',
                           {sync,
                               [<<....>>],
                               #Fun<rabbit_variable_queue.11.110161290>}}
** When Server state == {msstate,
                            "c:/...../RabbitMQ/db/rabbit at RACK2UNIT004-mnesia/msg_store_persistent",
                            rabbit_msg_store_ets_index,
                            {state,348229,
                                "c:/...../RabbitMQ/db/rabbit at RACK2UNIT004-mnesia/msg_store_persistent"},
                            503,#Ref<0.0.2.67161>,
                            {dict,0,16,16,8,80,48,
                                {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            [...SNIPS...]
** Reason for termination ==
** {{badmatch,not_found},
    [{rabbit_msg_store,'-handle_cast/2-fun-2-',4},
     {lists,any,2},
     {rabbit_msg_store,handle_cast,2},
     {gen_server2,handle_msg,2},
     {proc_lib,wake_up,3}]}

There is only one use of lists:any in the msg_store, and thus it's
fairly easy to work out what's happened: somehow, there's been an
attempt made to sync on disk a msg that turns out not to exist. This is,
unsurprisingly, an unexpected situation, hence the dramatic crash.

Could you provide details of:

1. What version of Windows, Erlang and Rabbit, you are using;
2. What are your clients doing when this occurs - it sounds like you're
doing stress tests: what exactly are these clients doing - if we can
reproduce this ourselves, it makes it _vastly_ easier to track it down
and fix it.
3. The above error entry was in your sasl log. If you could find the
corresponding entry in there in the non-anonymized log and send it to us
offlist, it may well have additional information that we would find
useful - if that's possible.

Best wishes,

Matthew


More information about the rabbitmq-discuss mailing list