<div>We try and design our architecture (for a large web application) in a way that we expect any part to fail: "design for failure". � There's parts of our system that are prime for RabbitMQ but the concern is that a message must never be lost -- even when RabbitMQ is setup for HA.</div>
<div><br></div><div>So, designing for that rare situation that a message might get lost, my approach has been to maintain state on the application. �When I send a message to get some work down I flag it as "in process" or "pending" with a start time and a retry counter. �I can then (say with cron) find the uncompleted tasks that have been waiting for some value of too long.</div>
<div><br></div><div>But, then the problem is what to do with that information? �How do I know that the message is really lost and not just backed up in the queue? �Don't want to queue it again in this case as it just compounds the problem (and then if the first job finally completes the state of the wrong message is updated).</div>
<div><br></div><div>That, and I frankly think the overhead of my state tracking is possibly more�problematic than the potential for a loss of a message.�</div><div><br></div><div>Anyway, sorry if this is a mundane (if not a bit off-topic) question -- and I know it's application-specific. But, it's a question that comes up often in our design discussions.</div>
<div><br></div><div>Do you have these concerns and how do you handle the possibility of message or queue loss?</div><div><br></div><div>Thanks,</div><div><br></div>-- <br>Bill Moseley<br><a href="mailto:moseley@hank.org" target="_blank">moseley@hank.org</a><br>