[rabbitmq-discuss] .NET API - Using PublishConfirms to get Reliability
marty.wasznicky at neudesic.com
Wed Feb 26 08:28:39 GMT 2014
Using Windows 7, 64 bit os, dual cpu/quad core, 8 gig ram
Client is 188.8.131.52
I'll see what we can do to put together a sample that replicates the lost acks/nacks
From: Michael Klishin [mailto:mklishin at gopivotal.com]
Sent: Wednesday, February 26, 2014 12:02 AM
To: Marty Wasznicky
Cc: Discussions about RabbitMQ
Subject: Re: [rabbitmq-discuss] .NET API - Using PublishConfirms to get Reliability
On 26 Feb 2014, at 11:48, Marty Wasznicky <marty.wasznicky at neudesic.com> wrote:
> As far as I can tell...it seems like Rabbit MQ loses messages when the servers are taking down.
So far we've seen no evidence of this.
> For instance...you send a message...and maybe you hit a race condition i.e. that for whatever reason the .net client API isn't designed to throw an error immediately because under the hood it sounds like your using peer networking for communication.
RabbitMQ .NET client use TCP sockets to communicate with RabbitMQ. When one side of the peer goes down or becomes unreachable, it takes time for the OS to detect this. RabbitMQ .NET client relies on the OS to do this and .NET APIs to throw exceptions accordingly:
> I can understand that it takes the peer network perhaps a few seconds to detect that the servers are no longer available. However, after I took down both servers, I let that consumer loop that reads off the queue run. It continued to read , without error (of course no messages) for 10 minutes before I finally just killed the process in task manager. Before I killed it...the channel and connection's IsOpen property was still true.
OK, this sounds like a .NET client bug. Can you put together a small program that reproduces it? What .NET version do you use, what version of Windows?
> That's a serious problem. I can understand an argument that it takes time for the OS to get a network signal of some sort..but it doesn't take 10 minutes.
> Acks/Nacks get lost. Sure...we send 50,000 messages through. Once they all go through. A few minutes later we'll check on our internal queue only to find that there are sometimes several thousand orphaned records that the publisher never received an ack/nack for.
Again, can you provide a small self-contained program that reproduces this? I expect this to be a .NET client bug of some kind.
> I've so far only tried to resubmit the messages I have in the internal queue I maintain when there's a server failure....and the server comes back...then I resubmit everything in the queue. But from your email...its making me think that Rabbit MQ is more unreliable than I thought....if I have to monitor the condition in healthy conditions as well and prepare to resubmit batches.....
It has nothing to do with the server but certain (fairly objective) limitations of the client library.
Note that even if there was no such limitation, to ensure a "never lose a message delivery" you still need to implement a WAL, because in the time window between BasicPublish and subsequent socket write your process might die (OS kills it, someone pulls the plug, etc).
Software Engineer, Pivotal/RabbitMQ
Confidentiality Notice: This email and any attachments are confidential. If you have received this in error, please let us know by email reply and delete the email and all attachments from your system.
More information about the rabbitmq-discuss