[rabbitmq-discuss] Consumer crash, redelivery and prefetch

Laing, Michael michael.laing at nytimes.com
Fri Mar 14 12:28:22 GMT 2014


It's a good topic.

In our std framework, based on python pika, a service may fail in
processing a message due to an exception being raised - something
unanticipated - the service will have chosen a default action to take in
that case when it was initialized, typically 'reject'. Typically it will
log a warning as well.

We gather rejected messages in a 'reject' exchange and process them enough
(via their headers) to route them back to their originators as well as to
our own 'triage' queue.

Our messages all carry their processing history in their headers: region,
zone, instance, pid, service, timestamp, etc. - again part of the framework.

We also gather and coordinate the logs of all services on all instances.

Additionally we replicate messages and process them in parallel through our
Core clusters in multiple regions.

A truly poison message will fail spectacularly everywhere. We have not
actually encountered one yet in production. We do get them in staging, and
bells go off everywhere.

A failure of infrastructure will be localized to a region, zone, instance,
or supporting service like Cassandra or the AWS control plane. Anticipated
failures are retried. Unanticipated failures result in rejection of that
message replica but other replicas should succeed. We do get these in
production and can immediately tell where failures occurred and take
appropriate action, e.g. shifting load away from failure if it has not yet
taken place automatically.

Of course it would be nice to get more info upon rejection. We compensate
by creating context around rejection and coordinating the context in near
real time across the nyt⨍aбrik.

ml


On Fri, Mar 14, 2014 at 6:06 AM, Simon MacMullen <simon at rabbitmq.com> wrote:

> On 14/03/2014 9:42AM, Karl Nilsson wrote:
>
>> It is a great shame that a mature message broker such as RabbitMQ is so
>> lacking in sensible poison message handling (or any strategies regarding
>> redelivery).
>>
>
> Agreed.
>
> But there are a great many things we want to do, and only limited time to
> do them in.
>
> I suspect it will happen one day. Sorry I can't be more specific than
> that, but we tend not to plan out a long way in advance.
>
> Cheers, Simon
>
> --
> Simon MacMullen
> RabbitMQ, Pivotal
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140314/413368ff/attachment.html>


More information about the rabbitmq-discuss mailing list