[rabbitmq-discuss] RabbitMQ crashed in ets:insert_new - looks like a genuine bug...

Eugene Kirpichov ekirpichov at gmail.com
Fri Aug 12 16:49:34 BST 2011


Hi,

On Fri, Aug 12, 2011 at 8:47 AM, Matthew Sackman <matthew at rabbitmq.com> wrote:
> Hi Eugene,
>
> On Fri, Aug 12, 2011 at 08:38:18AM -0700, Eugene Kirpichov wrote:
>> Thanks a lot for taking the time for investigation.
>
> No problem. I rather enjoy trying to track down such bugs.
>
>> Do you refer to reproducing the INTERNAL_ERROR bug in tx.commit (which
>> only happened once and didn't cause a node crash), or to the bug that
>> was causing the node crash (and happened on a different node of the
>> cluster)?
>
> Well, a msg_store on a node crashed when one of the queues that was
> using it was deleted. The crash of said msg_store subsequently took out
> all other queues that were using that msg_store. The loss of those
> queues would have caused all in-flight tx.commits to abort with an
> INTERNAL_ERROR.
>
> So I strongly suspect they're all one and the same thing.
The issue is that these errors happened on *different* nodes. Do you
think they still should be part of the same story?

>
>> By the way, I ran the stress test with even more stress on the cluster
>> several times afterwards and wasn't able to cause it to crash again,
>> though before that 2 of 2 tests crashed. So I was "lucky" in a sense.
>
> All the best bugs have this property :D
>
> Matthew
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>



-- 
Eugene Kirpichov
Principal Engineer, Mirantis Inc. http://www.mirantis.com/
Editor, http://fprog.ru/


More information about the rabbitmq-discuss mailing list