Problem found. Thanks for your help. The problem is a strange one and has to do with me not shutting my amqp_client listener down properly if my server dies. Here is how it manifests:<div><br></div><div>1.) Server starts up and starts up a amqp client connection and channel</div>
<div>2.) The server binds to that channel and starts the subscription using a registered name name as the process to which messages will be sent</div><div>3.) Messages start coming in and are ack-ing fine</div><div>4.) Poor error handling in farming out processes brings the server down</div>
<div>5.) The server does no close the amqp_client connection</div><div>6.) The server supervisor restarts the server which creates a new listener, but the old listener is still hanging around trying to send the the registered name</div>
<div>7.) The older listener sends a message to the server</div><div>8.) The server tries to ack to the new listener which did not send the message</div><div>9.) The new server pukes because it never sent a message with that tag</div>
<div><br></div><div>So my question then is how should I kill the amqp_client? If I send it an exit its supervisor will restart it. This is what I was getting at with my tangential questions in the last email. How should I shut down the amqp_client without shutting down all the other servers' amqp client listeners?</div>
<div><br></div><div>Thanks for all the help,</div><div>-Max<br><br><div class="gmail_quote">On Thu, Jun 30, 2011 at 9:23 AM, Max Warnock <span dir="ltr"><<a href="mailto:maxjwarnock@gmail.com">maxjwarnock@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Thanks, that's very helpful from both the possible issues to chase and sanity check perspectives.<div><br></div><div>
I'm using erlang R13B04 with a rabbitmq server installed via gentoo's portage at version 2.4.1. I pulled the client library from github (tag 2.3.0, commit: 844738f9b56d34104c1ea2ac5700d0898126c5b4).</div>
<div><br></div><div>I'm going to write some debug code to store all the tags I try to ack on and see if I can get this error to where it's easily reproducible. Thanks for narrowing my search, it's very helpful. I'll keep you updated. I must be doing something wrong somewhere. I have a hard time believing such a widely used library could fail so hard myself.</div>
<div><br></div><div>One thing that would be extremely helpful is if you could point me to some documentation which I haven't been able to find: I'm looking for a listing of all the events/messages that are sent out by the amqp client to a subscriber. What does it send when it goes down, what other soft errors will it send out, etc. Additionally, is there a doc somewhere for best practices in connecting a listener to another server/long-running process? Not having either of those there has been some struggle to know how to restart the subscription/listening process if my server dies. The amqp_client tutorial has been a great help, but when it comes to error handling from the listening module perspective it doesn't tell me what the library is expecting me to do. I don't want to have to do a bunch of engineering because I'm square peg, round hole-ing the library. The primary issues I'm concerned with are when my server dies hard and is destined to be restarted by its supervisor what should I send to the amqp client process? Should I send it close messages and then start a new one? Or should I reconnect to the client library. This wouldn't be as big of an issue but I need to use durable/persistent queues and if I still have a listener hanging around with the same bindings on the same queue it will eat all my messages and send them nowhere.</div>
<div><br></div><div>Thanks,</div><div>-Max</div><div><div></div><div class="h5"><div><br><div class="gmail_quote">On Thu, Jun 30, 2011 at 7:48 AM, Matthew Sackman <span dir="ltr"><<a href="mailto:matthew@rabbitmq.com" target="_blank">matthew@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Max,<br>
<div><br>
On Wed, Jun 29, 2011 at 06:28:59PM -0400, Max Warnock wrote:<br>
> I've built a behavior in erlang to subscribe to a given topic exchange and<br>
> farm out message handling. I'm using the rabbitmq amqp_client library for<br>
> erlang and when I put the system under heavy load I get, on occasion, the<br>
> following error:<br>
<br>
</div>Could you let us know which version of Rabbit, Erlang and the Erlang<br>
client you're using?<br>
<div><br>
> =ERROR REPORT==== 29-Jun-2011::18:02:18 ===<br>
> ** Generic server <0.1117.0> terminating<br>
> ** Last message in was {'$gen_cast',<br>
> {method,<br>
> {'channel.close',406,<br>
> <<"PRECONDITION_FAILED - unknown delivery<br>
> tag 856">>,<br>
> 60,80},<br>
<br>
</div>That's a double-ack (probably). Sadly, the AMQP 0-9-1 spec says that<br>
acking is not idempotent, thus it's a fault to ack the same message<br>
multiple times...<br>
<div><br>
> The server receive loop where the ack happens looks like this:<br>
> receive<br>
> ...<br>
> {#'basic.deliver'{delivery_tag = Tag, routing_key = RoutingKey},<br>
> #amqp_msg{payload = Payload}} -><br>
> amqp_channel:cast(get(amqp_channel_pid), #'basic.ack'{delivery_tag =<br>
> Tag}),<br>
> spawn_and_queue(spawn_handle_message, Module, RoutingKey, Payload),<br>
> loop(Module);<br>
> ...<br>
> end<br>
<br>
</div>...hmmm, which is so simple that I can't see how it could go wrong: if<br>
you're not double acking then something else must be going on to make<br>
the broker think that it's not expecting an ack for that message, hence<br>
the error. If you're doing some sort of reject operation - either<br>
basic.nack or basic.reject on messages and you then subsequently ack one<br>
of those messages then that would also cause this error. There may be<br>
other cases as well.<br>
<div><br>
> The amqp_client_sup can't seem to bring back the the client either and dies<br>
> from the retry intensity being reached. I've done a hefty amount of<br>
> googling and can't seem to find where things could be going wrong. Before<br>
> jumping into the amqp_client code I thought I'd ask the mailing list if they<br>
> have any ideas. The only thing I can think is that there is a race<br>
> condition within the client library. I will be double checking my code to<br>
> be sure it isn't sending the ack twice, but given the simplicity of the ack<br>
> the only way it could is if it receives the same message (with identical<br>
> delivery tag) from the amqp_client library twice.<br>
<br>
</div>It could be a bug in the client library, but I'd be a little surprised<br>
if we're managing to duplicate messages somehow - that would be a new<br>
level of fail for us. ;) However, the fact that the entire connection<br>
dies is alarming and almost certainly a bug: PRECONDITION_FAILED is a<br>
soft error and should only tear down the channel, not the whole<br>
connection. After that, all you should have to do is create a new<br>
channel and everything else should be ok. If that's not the case please<br>
let us know.<br>
<br>
Best wishes,<br>
<br>
Matthew<br>
_______________________________________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com" target="_blank">rabbitmq-discuss@lists.rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>