[rabbitmq-discuss] .NET API - Using PublishConfirms to get Reliability
marty.wasznicky at neudesic.com
Wed Feb 26 07:48:55 GMT 2014
As far as I can tell...it seems like Rabbit MQ loses messages when the servers are taking down. I've been testing this for weeks now.
For instance...you send a message...and maybe you hit a race condition i.e. that for whatever reason the .net client API isn't designed to throw an error immediately because under the hood it sounds like your using peer networking for communication. Using Publishconfirm model...I'm ok with it being async. But if the message actually never gets to RabbitMQ, it will still make it to my internal queue for resubmittal. But that also means I should never get an ack or nack back for it...since it never reached.
Yet I'm still getting sometimes (actually ) always losing messages. I'm still trying to track it down.
I can understand that it takes the peer network perhaps a few seconds to detect that the servers are no longer available. However, after I took down both servers, I let that consumer loop that reads off the queue run. It continued to read , without error (of course no messages) for 10 minutes before I finally just killed the process in task manager. Before I killed it...the channel and connection's IsOpen property was still true. That's a serious problem. I can understand an argument that it takes time for the OS to get a network signal of some sort..but it doesn't take 10 minutes.
Acks/Nacks get lost. Sure...we send 50,000 messages through. Once they all go through. A few minutes later we'll check on our internal queue only to find that there are sometimes several thousand orphaned records that the publisher never received an ack/nack for.
I've thought about using the WaitForConfirms function....but considering that it seems like a good chunk just never get sent, I'm hesitant about using it since I can't afford to wait minutes or hours for control to come back to app.
I've so far only tried to resubmit the messages I have in the internal queue I maintain when there's a server failure....and the server comes back...then I resubmit everything in the queue. But from your email...its making me think that Rabbit MQ is more unreliable than I thought....if I have to monitor the condition in healthy conditions as well and prepare to resubmit batches.....
All this also forced to build duplicate detection on the consumer side to ensure once only delivery.
From: Michael Klishin [mailto:mklishin at gopivotal.com]
Sent: Tuesday, February 25, 2014 11:08 PM
To: Discussions about RabbitMQ
Cc: Marty Wasznicky
Subject: Re: [rabbitmq-discuss] .NET API - Using PublishConfirms to get Reliability
On 26 Feb 2014, at 09:07, martywaz <marty.wasznicky at neudesic.com> wrote:
> First one is that Rabbit MQ is losing a few messages if both servers in the cluster are shut down.
See my earlier reply about peer unavailability not being detected immediately.
> Second one, Acks/Nacks seem to just get lost by Rabbit MQ.
Can you be more specific?
> Third one...only happens now again...The producer actually receives Acks for delivery tags/messages that don't exist in the its internal queue.
Can you isolate this problem?
> Fourth one....when both servers go down, sometimes, but not always, the consumer will not throw an exception when it tries to read the message in the while loop. i.e : if (!consumer.Queue.Dequeue(3000, out item)). The item comes back null, but If I look in the debugger, the consumer's and connection's isopen property is true...and the CloseReason is null.
Again, it takes time for OS to report a network failure.
QueueingConsumer will enqueue deliveries in a local queue (collection). When you shut down your entire cluster, deliveries stop flowing so Dequeue returns null but connection hasn't detected network failure yet.
Software Engineer, Pivotal/RabbitMQ
Confidentiality Notice: This email and any attachments are confidential. If you have received this in error, please let us know by email reply and delete the email and all attachments from your system.
More information about the rabbitmq-discuss