[rabbitmq-discuss] Application architecture question: queue failure

Mon Jun 18 21:29:35 BST 2012

On 18/06/2012 19:02, Bill Moseley wrote:
>
> The idea is the worker picks up the job and atomically sets it from
> "pending" to "in process" -- which means even if the job was queued
> multiple times only one process would pick up the actual work.
>
> Then when the job is completed again the state is changed from "in
> process" to "completed".
>

Ah ok, I get it now. Thanks for clarifying.

>
>
> Well, that's essentially my question.  Obviously, if I want the web app
> to know that a report request was made so it can display to the user
> that the report is in the process of being generated.  And I also want
> to prevent multiple submissions by a user for the same thing.  So, the
> database serves this function.
>
> The difficulty is when it gets stuck in pending.   At what point do we
> give up or try again?
>
> Thans for your comments below.  I think the solution with the dead
> letter is the way to go as it avoids using something like cron to handle
> extra processing.   This way the task is always "in the system" in a
> controlled way.
>
> Then not over engineer for the very rare chance of a failure.   May
> don't even really need the durable queues if I can run a utility to
> resubmit stuck "pending" jobs in those rare cases.
>

Yes I do agree that keeping it simple is always best. A couple of points 
to bare in mind though. Durable queues are queue 'definitions' that will 
survive a restart or crash and have nothing to do with storing messages 
on disk. Persistent messages (sent with delivery_mode=2) are what causes 
things to hit the disk before they're confirmed. As these two factors 
are influenced merely by configuration (of the queue during the 
'declare' method) and header setting (on the client) I'd expect they add 
enough value for minimal design input not to be considered overhead, 
although there is obviously a cost in performance and disk use when 
fsync'ing everything to disk.

Without durable queues, your application startup code needs to handle 
queue declaration and setup. Doing without persistence is fine, but bare 
in mind that the broker could ack a message and crash moments later, 
causing the job to be completely lost if the messages aren't persistent 
and the queues are not HA/mirrored queues. Admittedly your database 
record of the job submission alleviates this risk to some extent, but 
your housekeeping utility will need to handle messages that are 
completely missing in this case, as well as re-submiting 'stuck' jobs. 
Producer confirms and persistence remove this risk, such that restarting 
the broker (after a crash or otherwise) will cause the queues to come up 
into their proper state and the producer will always know for sure 
whether or not the broker actually received a message due to the confirm 
being sent back. This pattern should simplify the rest of the 
application code somewhat.

>
> Thanks very much for your input.
>

My pleasure! :)