[rabbitmq-discuss] .NET API - Using PublishConfirms to get Reliability

Wed Feb 26 08:02:13 GMT 2014

On 26 Feb 2014, at 11:48, Marty Wasznicky <marty.wasznicky at neudesic.com> wrote:

> As far as I can tell...it seems like Rabbit MQ loses messages when the servers are taking down.

So far we’ve seen no evidence of this.

> For instance...you send a message...and maybe you hit a race condition i.e. that for whatever reason the .net client API isn't designed to throw an error immediately because under the hood it sounds like your using peer networking for communication.

RabbitMQ .NET client use TCP sockets to communicate with RabbitMQ. When one side of the peer goes down
or becomes unreachable, it takes time for the OS to detect this. RabbitMQ .NET client relies on the OS
to do this and .NET APIs to throw exceptions accordingly:

http://hg.rabbitmq.com/rabbitmq-dotnet-client/file/348a50e651cd/projects/client/RabbitMQ.Client/src/client/impl/ConnectionBase.cs#l618

> 
> I can understand that it takes the peer network perhaps a few seconds to detect that the servers are no longer available.  However, after I took down both servers, I let that consumer loop that reads off the queue run.  It continued to read , without error (of course no messages) for 10 minutes before I finally just killed the process in task manager.  Before I killed it...the channel and connection's IsOpen property was still true.

OK, this sounds like a .NET client bug. Can you put together a small program that reproduces it? What .NET version do you use,
what version of Windows?

>  That's a serious problem.  I can understand an argument that it takes time for the OS to get a network signal of some sort..but it doesn't take 10 minutes.
> 
> Acks/Nacks get lost.  Sure...we send 50,000 messages through.  Once they all go through. A few minutes later we'll check on our internal queue only to find that there are sometimes several thousand orphaned records that the publisher never received an ack/nack for.

Again, can you provide a small self-contained program that reproduces this? I expect this to be a .NET client bug
of some kind.

> 
> I've so far only tried to resubmit the messages I have in the internal queue I maintain when there's a server failure....and the server comes back...then I resubmit everything in the queue.  But from your email...its making me think that Rabbit MQ is more unreliable than I thought....if I have to monitor the condition in healthy conditions as well and prepare to resubmit batches…..

It has nothing to do with the server but certain (fairly objective) limitations of the client library.

Note that even if there was no such limitation, to ensure a “never lose a message delivery” you still need to implement a WAL,
because in the time window between BasicPublish and subsequent socket write your process might die
(OS kills it, someone pulls the plug, etc).
--
MK

Software Engineer, Pivotal/RabbitMQ