[rabbitmq-discuss] RabbitMQ crashed in ets:insert_new - looks like a genuine bug...

Eugene Kirpichov ekirpichov at gmail.com
Fri Aug 12 04:45:55 BST 2011


I ran another test like this and looked a little more closely at the
logs. This time just 1 of 4 nodes crashed and some new errors
appeared.
I'm attaching a slightly snipped version of the logs (all binaries and
some too repetitive stuff snipped).

So:
* There's not only this failure in ets:insert_new, there's also ets:lookup
* There are supervisor reports in sasl.log about
reached_max_restart_intensity; they happen after a few similar
child_terminated reports about rabbit_channels and amqp_queues
* After these things, apparently msg_store_persistent crashes, and so
everything crashes.

Again, the failed rabbitmq node started successfully after a manual restart.

(Folks, is this the right place to report this kind of things? Is it
ok to attach several hundred kb files?)

On Thu, Aug 11, 2011 at 6:23 PM, Eugene Kirpichov <ekirpichov at gmail.com> wrote:
> A lot of clients (a thousand or more) were rapidly publishing 1kb
> messages to a queue, and then rabbitmq crashed.
>
> In fact I had a cluster of 4 rabbits, and 2 of them crashed as a
> result. The remaining 2 continued working ok.
>
> Here's a crash report from rabbit-sasl.log. I do not give the full log
> because it's large, contains message data (which my employer might not
> like) and I'm too lazy to automatically snip it.
> But the log is really full of things exactly like what I show. This
> exact message gets repeated many times in the same second, and then it
> finally crashed.
>
> What other information can I provide to resolve this? Could this be an
> error on my, not rabbit's, part? Having a sudden rabbitmq crash is not
> really what I'd like to have in production :-|
>
> =CRASH REPORT==== 11-Aug-2011::17:56:19 ===
>  crasher:
>    initial call: gen:init_it/6
>    pid: <0.16624.0>
>    registered_name: []
>    exception exit: {badarg,
>                        [{ets,insert_new,
>                             [303172,
>                              {<<223,221,16,201,23,190,196,251,169,11,157,145,
>                                 94,36,1,105>>,
>                               {basic_message,
>                                   {resource,<<"/">>,exchange,<<>>},
>
> [<<"results-8808E5FBBC714C9E880F9FD30F443151.TestApp.rmq002">>],
>                                   {content,60,none,
>                                       <<....>>, % (snipped)
>                                       rabbit_framing_amqp_0_9_1,
>                                       [<<....>>]}, % (snipped too)
>                                   <<223,221,16,201,23,190,196,251,169,11,157,
>                                     145,94,36,1,105>>,
>                                   true},
>                               1}]},
>                         {rabbit_msg_store,update_msg_cache,3},
>                         {rabbit_msg_store,write,3},
>                         {rabbit_variable_queue,
>                             '-with_immutable_msg_store_state/3-fun-0-',2},
>                         {rabbit_variable_queue,with_msg_store_state,3},
>                         {rabbit_variable_queue,
>                             with_immutable_msg_store_state,3},
>                         {rabbit_variable_queue,maybe_write_msg_to_disk,3},
>                         {rabbit_variable_queue,maybe_write_to_disk,4}]}
>      in function  gen_server2:terminate/3
>    ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.137.0>]
>    messages: [{'$gen_cast',{ack,none,[46689],<0.16623.0>}},
>                  {'$gen_cast',{ack,none,[46690],<0.16623.0>}}]
>    links: [<0.263.0>]
>    dictionary: [{fhc_age_tree,{0,nil}},
>                  {{ch,<0.16623.0>},
>                   {cr,1,<0.16623.0>,<0.16628.0>,#Ref<0.0.0.16807>,
>                       {set,2,16,16,8,80,48,
>                            {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>                            {{[],[],[],[],[],[],[],[],
>                              [46690],
>                              [],[],[],[],
>                              [46689],
>                              [],[]}}},
>                       false,none,0}},
>                  {guid,{{7,<0.16624.0>},1}}]
>    trap_exit: true
>    status: running
>    heap_size: 1682835
>    stack_size: 24
>    reductions: 1260360700
>  neighbours:
>
>
> --
> Eugene Kirpichov
> Principal Engineer, Mirantis Inc. http://www.mirantis.com/
> Editor, http://fprog.ru/
>



-- 
Eugene Kirpichov
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
Editor, http://fprog.ru/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit at RACK2UNIT004-sasl.anonymized.log
Type: application/octet-stream
Size: 185260 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20110811/14c53dc4/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rabbit at RACK2UNIT004.anonymized.log
Type: application/octet-stream
Size: 457193 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20110811/14c53dc4/attachment-0003.obj>


More information about the rabbitmq-discuss mailing list