[rabbitmq-discuss] RabbitMQ crashed in ets:insert_new - looks like a genuine bug...

Fri Aug 12 04:49:20 BST 2011

...And also this error I found in the log of one of the nodes that didn't crash.

What does this error mean?

=ERROR REPORT==== 11-Aug-2011::19:57:27 ===
connection <0.745.0>, channel 2 - error:
{amqp_error,internal_error,
            "commit failed:
[{<7566.257.0>,{exit,{{noproc,{gen_server2,call,[msg_store_persistent,{client_terminate,<<64,136,153,249,138,79,147,63,147,97,52,189,50,189,255,66>>},infinity]}},{gen_server2,call,[<7566.257.0>,{commit,<<131,20,0,33,243,147,31,205,97,98,116,217,18,139,207,116>>,<0.19127.8>},infinity]}},[{gen_server2,call,3},{rabbit_misc,with_exit_handler,2},{delegate,safe_invoke,2},{delegate,'-safe_invoke/2-lc$^0/1-0-',2},{delegate,handle_call,3},{gen_server2,handle_msg,2},{proc_lib,wake_up,3}]}}]",
            'tx.commit'}

On Thu, Aug 11, 2011 at 8:45 PM, Eugene Kirpichov <ekirpichov at gmail.com> wrote:
> I ran another test like this and looked a little more closely at the
> logs. This time just 1 of 4 nodes crashed and some new errors
> appeared.
> I'm attaching a slightly snipped version of the logs (all binaries and
> some too repetitive stuff snipped).
>
> So:
> * There's not only this failure in ets:insert_new, there's also ets:lookup
> * There are supervisor reports in sasl.log about
> reached_max_restart_intensity; they happen after a few similar
> child_terminated reports about rabbit_channels and amqp_queues
> * After these things, apparently msg_store_persistent crashes, and so
> everything crashes.
>
> Again, the failed rabbitmq node started successfully after a manual restart.
>
> (Folks, is this the right place to report this kind of things? Is it
> ok to attach several hundred kb files?)
>
> On Thu, Aug 11, 2011 at 6:23 PM, Eugene Kirpichov <ekirpichov at gmail.com> wrote:
>> A lot of clients (a thousand or more) were rapidly publishing 1kb
>> messages to a queue, and then rabbitmq crashed.
>>
>> In fact I had a cluster of 4 rabbits, and 2 of them crashed as a
>> result. The remaining 2 continued working ok.
>>
>> Here's a crash report from rabbit-sasl.log. I do not give the full log
>> because it's large, contains message data (which my employer might not
>> like) and I'm too lazy to automatically snip it.
>> But the log is really full of things exactly like what I show. This
>> exact message gets repeated many times in the same second, and then it
>> finally crashed.
>>
>> What other information can I provide to resolve this? Could this be an
>> error on my, not rabbit's, part? Having a sudden rabbitmq crash is not
>> really what I'd like to have in production :-|
>>
>> =CRASH REPORT==== 11-Aug-2011::17:56:19 ===
>>  crasher:
>>    initial call: gen:init_it/6
>>    pid: <0.16624.0>
>>    registered_name: []
>>    exception exit: {badarg,
>>                        [{ets,insert_new,
>>                             [303172,
>>                              {<<223,221,16,201,23,190,196,251,169,11,157,145,
>>                                 94,36,1,105>>,
>>                               {basic_message,
>>                                   {resource,<<"/">>,exchange,<<>>},
>>
>> [<<"results-8808E5FBBC714C9E880F9FD30F443151.TestApp.rmq002">>],
>>                                   {content,60,none,
>>                                       <<....>>, % (snipped)
>>                                       rabbit_framing_amqp_0_9_1,
>>                                       [<<....>>]}, % (snipped too)
>>                                   <<223,221,16,201,23,190,196,251,169,11,157,
>>                                     145,94,36,1,105>>,
>>                                   true},
>>                               1}]},
>>                         {rabbit_msg_store,update_msg_cache,3},
>>                         {rabbit_msg_store,write,3},
>>                         {rabbit_variable_queue,
>>                             '-with_immutable_msg_store_state/3-fun-0-',2},
>>                         {rabbit_variable_queue,with_msg_store_state,3},
>>                         {rabbit_variable_queue,
>>                             with_immutable_msg_store_state,3},
>>                         {rabbit_variable_queue,maybe_write_msg_to_disk,3},
>>                         {rabbit_variable_queue,maybe_write_to_disk,4}]}
>>      in function  gen_server2:terminate/3
>>    ancestors: [rabbit_amqqueue_sup,rabbit_sup,<0.137.0>]
>>    messages: [{'$gen_cast',{ack,none,[46689],<0.16623.0>}},
>>                  {'$gen_cast',{ack,none,[46690],<0.16623.0>}}]
>>    links: [<0.263.0>]
>>    dictionary: [{fhc_age_tree,{0,nil}},
>>                  {{ch,<0.16623.0>},
>>                   {cr,1,<0.16623.0>,<0.16628.0>,#Ref<0.0.0.16807>,
>>                       {set,2,16,16,8,80,48,
>>                            {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
>>                            {{[],[],[],[],[],[],[],[],
>>                              [46690],
>>                              [],[],[],[],
>>                              [46689],
>>                              [],[]}}},
>>                       false,none,0}},
>>                  {guid,{{7,<0.16624.0>},1}}]
>>    trap_exit: true
>>    status: running
>>    heap_size: 1682835
>>    stack_size: 24
>>    reductions: 1260360700
>>  neighbours:
>>
>>
>> --
>> Eugene Kirpichov
>> Principal Engineer, Mirantis Inc. http://www.mirantis.com/
>> Editor, http://fprog.ru/
>>
>
>
>
> --
> Eugene Kirpichov
> Principal Engineer, Mirantis Inc. http://www.mirantis.com/
> Editor, http://fprog.ru/
>

-- 
Eugene Kirpichov
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
Editor, http://fprog.ru/