On Wed, Jun 13, 2012 at 3:53 AM, Tim Watson <span dir="ltr"><<a href="mailto:tim@rabbitmq.com" target="_blank">tim@rabbitmq.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>Anyway, sorry if this is a mundane (if not a bit off-topic) question --,<div class="im"><br>
and I know it's application-specific. But, it's a question that comes up<br>
often in our design discussions.<br>
<br>
Do you have these concerns and how do you handle the possibility of<br>
message or queue loss?<br>
<br>
</div></blockquote>
<br>
Well rabbit fight's like a cornered... rabbit ... to make sure this doesn't happen! ;)<br>
<br>
Please feel free to elaborate on your concerns and questions, as that's what the list is for! I'd certainly like to understand a bit more about how your application works, what constitutes a job and how these are identified throughout the system. I often find the issue of identity can be particularly vexing in any non-trivial architecture.<br>
</blockquote></div><div><br></div><div>Let me give you an example -- which is an actual workflow we have.</div><div><br></div><div>In our web app a user can select to receive a report. In the web app we want the user to see that the report is indeed queued, so in the database we set a flag saying that the job was sent, and when. This allows us to display "pending" so the user doesn't submit the request multiple times.</div>
<div><br></div><div>The web app queues the message for the background report generation. Anything is possible -- so imagine first that the message is somehow lost. The web app is still showing "pending" to the user.</div>
<div><br></div><div>But, we do want the task to complete -- it's a revenue generator, for example. So, one option is to use cron to look for stale "pending" request on the web side and assume the message was lost and re-queue. But, after X attempts maybe the cron job decides to give up.</div>
<div><br></div><div>Now, this report generation actually uses a third-party web service, and this web service has gone down for extended periods for maintenance. So, in this case the report request jobs stack up in the queue.</div>
<div><br></div><div>So, if it's down long enough then cron might run again and re-queue the same job that is already in the queue. What I have done for this is atomically change the state from "pending" to "in process" so that only one message gets processed. But, using some kind of UUID and a store is another option, of course.</div>
<div><br></div><div>Maybe you are right that durable queues are the correct solution for this -- I still need to track state on the web app side to show "pending" or "in process". And maybe just use cron to report/clean up any stale pending job on the web app side.</div>
<div><br></div><div>I'm just curious if the above is a common design pattern when using RabbitMQ in this way. Obviously, depends on the specifics of the task, but we seem to have quite a few situations like this.</div>
<div><br></div><div>Oh, and with this example of the third-party web service another problem is knowing if a failure of this service is permanent or temporary. I have not done this, but I'm tempted to have my workers pull the jobs off the queue and if the job fails for an unclear reason then ack the original job and then send it to a "try again later" queue and have separate workers handle those.</div>
<div><br></div><div><br></div><div>Thanks,</div><br clear="all"><div><br></div>-- <br>Bill Moseley<br><a href="mailto:moseley@hank.org" target="_blank">moseley@hank.org</a><br>