[rabbitmq-discuss] Application architecture question: queue failure

Wed Jun 6 14:05:33 BST 2012

We try and design our architecture (for a large web application) in a way
that we expect any part to fail: "design for failure".   There's parts of
our system that are prime for RabbitMQ but the concern is that a message
must never be lost -- even when RabbitMQ is setup for HA.

So, designing for that rare situation that a message might get lost, my
approach has been to maintain state on the application.  When I send a
message to get some work down I flag it as "in process" or "pending" with a
start time and a retry counter.  I can then (say with cron) find the
uncompleted tasks that have been waiting for some value of too long.

But, then the problem is what to do with that information?  How do I know
that the message is really lost and not just backed up in the queue?  Don't
want to queue it again in this case as it just compounds the problem (and
then if the first job finally completes the state of the wrong message is
updated).

That, and I frankly think the overhead of my state tracking is possibly
more problematic than the potential for a loss of a message.

Anyway, sorry if this is a mundane (if not a bit off-topic) question -- and
I know it's application-specific. But, it's a question that comes up often
in our design discussions.

Do you have these concerns and how do you handle the possibility of message
or queue loss?

Thanks,

-- 
Bill Moseley
moseley at hank.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120606/1d87fb62/attachment.htm>