<div>We try and design our architecture (for a large web application) in a way that we expect any part to fail: &quot;design for failure&quot;.   There&#39;s parts of our system that are prime for RabbitMQ but the concern is that a message must never be lost -- even when RabbitMQ is setup for HA.</div>


<div><br></div><div>So, designing for that rare situation that a message might get lost, my approach has been to maintain state on the application.  When I send a message to get some work down I flag it as &quot;in process&quot; or &quot;pending&quot; with a start time and a retry counter.  I can then (say with cron) find the uncompleted tasks that have been waiting for some value of too long.</div>


<div><br></div><div>But, then the problem is what to do with that information?  How do I know that the message is really lost and not just backed up in the queue?  Don&#39;t want to queue it again in this case as it just compounds the problem (and then if the first job finally completes the state of the wrong message is updated).</div>


<div><br></div><div>That, and I frankly think the overhead of my state tracking is possibly more problematic than the potential for a loss of a message. </div><div><br></div><div>Anyway, sorry if this is a mundane (if not a bit off-topic) question -- and I know it&#39;s application-specific. But, it&#39;s a question that comes up often in our design discussions.</div>


<div><br></div><div>Do you have these concerns and how do you handle the possibility of message or queue loss?</div><div><br></div><div>Thanks,</div><div><br></div>-- <br>Bill Moseley<br><a href="mailto:moseley@hank.org" target="_blank">moseley@hank.org</a><br>