<div>We try and design our architecture (for a large web application) in a way that we expect any part to fail: "design for failure". There's parts of our system that are prime for RabbitMQ but the concern is that a message must never be lost -- even when RabbitMQ is setup for HA.</div>
<div><br></div><div>So, designing for that rare situation that a message might get lost, my approach has been to maintain state on the application. When I send a message to get some work down I flag it as "in process" or "pending" with a start time and a retry counter. I can then (say with cron) find the uncompleted tasks that have been waiting for some value of too long.</div>
<div><br></div><div>But, then the problem is what to do with that information? How do I know that the message is really lost and not just backed up in the queue? Don't want to queue it again in this case as it just compounds the problem (and then if the first job finally completes the state of the wrong message is updated).</div>
<div><br></div><div>That, and I frankly think the overhead of my state tracking is possibly more problematic than the potential for a loss of a message. </div><div><br></div><div>Anyway, sorry if this is a mundane (if not a bit off-topic) question -- and I know it's application-specific. But, it's a question that comes up often in our design discussions.</div>
<div><br></div><div>Do you have these concerns and how do you handle the possibility of message or queue loss?</div><div><br></div><div>Thanks,</div><div><br></div>-- <br>Bill Moseley<br><a href="mailto:moseley@hank.org" target="_blank">moseley@hank.org</a><br>