<p>Hi I would like to share some thoughts about error handling when interacting with the broker, and specifically with the .NET API.</p>
<p>By looking at the source code of the client, reading the docs and some experience with using it I am under the impression that it's not easy to figure out whether, which ones, and how to handle and react to exceptions thrown when calling methods on the client.</p>
<p>I think that most of the exceptions that can arise under normal usage of the client should appear in this list:</p>
<p>- BrokerUnreachableException<br>
- AlreadyClosedException<br>
- OperationInterruptedException<br>
- IOException<br>
- EndOfStreamException</p>
<p>Aiming at creating highly reliable applications making use of the API which are not supposed to crash and loose, duplicate or get any messages out of order I realized that the behavior of the client is a bit underspecified. The straightforward approach would be to catch every exception thrown by the client, but it's not straightforward to figure out whether all of them should really be handled in the same way.<br>
Also, failure conditions are also notified by events on the various IConnection and IModel interfaces, which is useful but also confusing in that they partially overlap with the exceptions.</p>
<p>Although all of the above concerns mostly the .NET client I'm very interested in how others are solving the same problem.</p>
<p>Following is a rough overview of the behavior of the API I expose to such kind of applications.</p>
<p>On the publishing side:</p>
<p>- publishes don't make it to the broker immediately, but rather collected into an in-memory queue which is gated by a semaphore watching the status of the connection by subscribing to the ConnectionShutdown event.<br>
- Connection shutdowns are monitored and reconnection is scheduled.<br>
- Publisher confirms are used. Unconfirmed or nacked messages are republished at intervals and also enqueued for publishing as soon as the connection shuts down, in order to be the first to be published when the connection is restored<br>
- BasicPublish invocations are wrapped into try-catch blocks catching several kind of exceptions but doing nothing except logging. Since the message appears as unconfirmed it will eventually be republished.</p>
<p>On the consuming side:</p>
<p>- QueueingConsumer is used, looping its queue on a separate thread. Deliveries are marshalled to the application and acked after that. Issues to deal with are: what happens if an exception is thrown by the application while processing the message? </p>
<p>- Ahead of the consuming loop there are deduplication and resequencing filters, in the order. Deduplication is required especially as publisher confirms may cause it, resequencing is needed because of limited guarantees RabbitMQ provides about ordering and also because connection and broker shutdowns may lead to messages arriving out of order.</p>
<p>- Connection status is monitored and connection is rescheduled when it drops. It is not clear whether exceptions thrown when calling QueueDeclare, QueueBind, BasicConsume, should be interpreted as connection going down or if in such cases the connection should be closed explicitly and scheduled for later. The fact is, if any of those throw you really want to start from the beginning even if they weren't caused by the connection dropping, if this is at all possible.</p>
<p>There are lots of other pernicious details, but these are the main things I beliveve.<br>
Sorry for the long mail, I'd appreciate some feedback.</p>