[rabbitmq-discuss] Silent crash causes persistent durable message loss

Will Koffel will at thumb.it
Mon May 21 20:22:34 BST 2012


How's that for a useless mysterious title?  A bit more on what we're seeing:

I'm running 2.8.1, and I have one queue in our setup with a long TTL ("expiring-queue", we'll call it), which then uses dead-letter-exchange to reroute to another queue ("action-queue").  The TTL for these expiring messages is 7 days.  So it looks like this:

[ec2-user at web03 current]$ sudo rabbitmqctl list_queues name messages arguments durable
expiring-queue	311443	[{"x-dead-letter-exchange","my-exchange"},{"x-message-ttl",604800000},{"x-dead-letter-routing-key","action-queue"}]	true
action-queue	0	[]	true

(BTW, What I'm doing is using the TTL as a way to keep track of an event that expires after one week.  Namely, we keep a count of a particular event for the last 7 days.  So each time the event happens we write to the action-queue which increments the value, and to the expiring queue. 7 days later, that message gets expired into the action-queue again, and we decrement the counter.  So we have a real-time, running 7 day counter.)

This for the most part is stable.  Except when it's not.  We've seen 3 or 4 crashes of this system since we set it up 3 weeks ago.  I can't find any information in the logs to tell me why Rabbit crashed, it just dies silently.  But more distributing is that when I bring it back up, all the messages (typically millions) in the expiring-queue are gone.  That's death for me, because that's the only record of when those things are supposed to expire.

Any leads on where to look for more crash reports or evidence of what's happening here?  And importantly, in what case would these messages be lost (seems like that should never happen!)  The queue is durable, and I'm using deliveryMode=2 for the messages.  I'm pretty sure that persistence works in general, because I can stop rabbit and restart it and all the messages are still there...they are only lost in the case of this odd silent crash.  I've also tried doing evil things like kill -9 various server processes, and haven't been able to reproduce the message loss in any controlled environment.

I'm LOVING that this could work in Rabbit with the 2.8 updates, hoping to not have to move to another queueing system where I have to build all the dead-letter-routing stuff myself, when this is so close to a clean solution.

Thanks in advance for any thoughts.

-Will




________________
Will Koffel
CTO, Thumb™
51 E 12th St., 4th Floor
New York, NY 10003
Office: (212) 673-8650
Mobile: (617) 575-WILL
@thumb
www.thumb.it






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120521/ef9e8e9d/attachment.htm>


More information about the rabbitmq-discuss mailing list