[rabbitmq-discuss] New persister (bug21673) crashed

Matthew Sackman matthew at lshift.net
Sat May 15 21:22:54 BST 2010


On Sun, Apr 18, 2010 at 05:28:51PM +0100, Matthew Sackman wrote:
> On Sun, Apr 18, 2010 at 08:50:14AM -0700, Scott White wrote:
> > I am running the new persister and after running fine for several
> > weeks, it
> > crashed last night. Please see crash low below. I'm not sure how to
> > interpret this, any ideas what the problem might be?
> 
> Yes, it recovered more off disk than it was expecting to. I have no idea
> how that could happen...[snipped]

I do now. During QA of the new persister this afternoon, Matthias
spotted a simple accountancy error in the queue index module. See the
commit comment for the gory details (3e0e01f3591e), but suffice to say,
this would be able to cause the crash that you reported.

The circumstances that would cause this are that Rabbit would have had
to have died suddenly (either crashed or be killed) during flushing of
the queue index journal (which is an internal operation which happens
from time to time), on a durable queue with persistent messages.

I have to say, I think this is excellent means of justification for the
extensive QA process that's been going on of the new persister. Of the
~7000 new lines of code that make up the new persister, we're down to
just two modules left to QA, with about 2500 lines in total to go. There
is most certainly light at the end of the tunnel - I know many of you
have been working off the new persister branch for a long time now and
yet more of you just waiting for us to release it. Much progress has
been made in the last month, and we really are getting there now.
Hopefully not too much longer!

Matthew



More information about the rabbitmq-discuss mailing list