Sorry about the ambiguity,<div><br></div><div>For the sake of clarity here is the glossary of terms I used in my last email (which probably clashes with the erlang/amqp_client context you're coming from):</div><div><br>
</div><div>Server - I'm referring to my long running process in erlang that I have given a registered name and passed as the 3rd argument to amqp_client:subscribe/3.<br><div>Listener - the process created by the amqp_client library when a connection and channel are opened</div>
<div>Subscribe - calling of amqp_client:subscribe/3</div><div>My - I'm using this pronoun to distinguish code written by me from code written by you cats at rabbitmq (client library, rabbitmq server, etc)</div><div><br>
</div><div>I've attached a diagram (approximation/abstraction) of how I'm interacting with the amqp_client library. (sorry to the mailing list if attaching a 40K diagram breaks etiquette).</div><div><br></div><div>
I'm using the amqp_client library in network mode, i.e., amqp_client:start(network, #amqp_params{host = Host, heartbeat=60000})</div><div><br></div><div>Yes a list of messages that the amqp_client process sends to a subscriber, particularly pertaining to errors in amqp_client land, would be very helpful. I'd like to be able to handle all {'DOWN',Etc} messages with my long running process (server). I'm hoping to handle all hard errors so that a restart from either supervisor (my long running process or the amqp_client's) won't break the communication between the two.</div>
<div><br></div><div>But I get the impression that I'm missing something about how I'm supposed to treat the amqp_client library with regard to amqp_client:start/2. Should I be treating the amqp_client connection like mnesia (an application entirely independent of mine), add it to my existing supervision tree and share one connection throughout my application, or, what I'm currently doing, let each part of my application that needs to talk to amqp spin up and close their own connection/channel?<br>
<br></div><div>Thanks,</div><div>-Max</div><div><br><div class="gmail_quote">On Mon, Jul 4, 2011 at 12:16 PM, Alexandru Scvorţov <span dir="ltr"><<a href="mailto:alexandru@rabbitmq.com">alexandru@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi Max,<br>
<br>
I'm trying to run through the steps you provided, but I'm having a bit<br>
of trouble following.<br>
<br>
Are you using a network or a direct connection? (I assume network, but<br>
it probably doesn't matter)<br>
<br>
By server, do you mean the actual RabbitMQ server, or you application?<br>
(I'm guessing your long-running application)<br>
<br>
By subscribe, do you mean calling amqp_channel:subscribe/3? If so, do<br>
you still need a list of the messages the channel may send its<br>
subscriber?<br>
<br>
Or do you mean that your application is sending messages to its<br>
subscribers?<br>
<div class="im"><br>
> 6.) The server supervisor restarts the server which creates a new listener,<br>
> but the old listener is still hanging around trying to send the the<br>
> registered name<br>
<br>
</div>What's a listener? Is it a process that receives messages from the<br>
erlang client because it's the endpoint of a subscription to a queue?<br>
<br>
Can't you link listeners to the server so that when the server goes<br>
down, it takes the listeners with it?<br>
<div class="im"><br>
> So my question then is how should I kill the amqp_client?<br>
<br>
</div>What do you mean by amqp_client? If it's an amqp_connection process,<br>
you can just send it an shutdown exit.<br>
<br>
Thanks for the information.<br>
<br>
Cheers,<br>
Alex<br>
<div><div></div><div class="h5"><br>
On Fri, Jul 01, 2011 at 01:51:10PM -0400, Max Warnock wrote:<br>
> Problem found. Thanks for your help. The problem is a strange one and has<br>
> to do with me not shutting my amqp_client listener down properly if my<br>
> server dies. Here is how it manifests:<br>
><br>
> 1.) Server starts up and starts up a amqp client connection and channel<br>
> 2.) The server binds to that channel and starts the subscription using a<br>
> registered name name as the process to which messages will be sent<br>
> 3.) Messages start coming in and are ack-ing fine<br>
> 4.) Poor error handling in farming out processes brings the server down<br>
> 5.) The server does no close the amqp_client connection<br>
> 6.) The server supervisor restarts the server which creates a new listener,<br>
> but the old listener is still hanging around trying to send the the<br>
> registered name<br>
> 7.) The older listener sends a message to the server<br>
> 8.) The server tries to ack to the new listener which did not send the<br>
> message<br>
> 9.) The new server pukes because it never sent a message with that tag<br>
><br>
> So my question then is how should I kill the amqp_client? If I send it an<br>
> exit its supervisor will restart it. This is what I was getting at with my<br>
> tangential questions in the last email. How should I shut down the<br>
> amqp_client without shutting down all the other servers' amqp client<br>
> listeners?<br>
><br>
> Thanks for all the help,<br>
> -Max<br>
><br>
> On Thu, Jun 30, 2011 at 9:23 AM, Max Warnock <<a href="mailto:maxjwarnock@gmail.com">maxjwarnock@gmail.com</a>> wrote:<br>
><br>
> > Thanks, that's very helpful from both the possible issues to chase and<br>
> > sanity check perspectives.<br>
> ><br>
> > I'm using erlang R13B04 with a rabbitmq server installed via gentoo's<br>
> > portage at version 2.4.1. I pulled the client library from github (tag<br>
> > 2.3.0, commit: 844738f9b56d34104c1ea2ac5700d0898126c5b4).<br>
> ><br>
> > I'm going to write some debug code to store all the tags I try to ack on<br>
> > and see if I can get this error to where it's easily reproducible. Thanks<br>
> > for narrowing my search, it's very helpful. I'll keep you updated. I must<br>
> > be doing something wrong somewhere. I have a hard time believing such a<br>
> > widely used library could fail so hard myself.<br>
> ><br>
> > One thing that would be extremely helpful is if you could point me to some<br>
> > documentation which I haven't been able to find: I'm looking for a listing<br>
> > of all the events/messages that are sent out by the amqp client to a<br>
> > subscriber. What does it send when it goes down, what other soft errors<br>
> > will it send out, etc. Additionally, is there a doc somewhere for best<br>
> > practices in connecting a listener to another server/long-running process?<br>
> > Not having either of those there has been some struggle to know how to<br>
> > restart the subscription/listening process if my server dies. The<br>
> > amqp_client tutorial has been a great help, but when it comes to error<br>
> > handling from the listening module perspective it doesn't tell me what the<br>
> > library is expecting me to do. I don't want to have to do a bunch of<br>
> > engineering because I'm square peg, round hole-ing the library. The primary<br>
> > issues I'm concerned with are when my server dies hard and is destined to be<br>
> > restarted by its supervisor what should I send to the amqp client process?<br>
> > Should I send it close messages and then start a new one? Or should I<br>
> > reconnect to the client library. This wouldn't be as big of an issue but I<br>
> > need to use durable/persistent queues and if I still have a listener hanging<br>
> > around with the same bindings on the same queue it will eat all my messages<br>
> > and send them nowhere.<br>
> ><br>
> > Thanks,<br>
> > -Max<br>
> ><br>
> > On Thu, Jun 30, 2011 at 7:48 AM, Matthew Sackman <<a href="mailto:matthew@rabbitmq.com">matthew@rabbitmq.com</a>>wrote:<br>
> ><br>
> >> Hi Max,<br>
> >><br>
> >> On Wed, Jun 29, 2011 at 06:28:59PM -0400, Max Warnock wrote:<br>
> >> > I've built a behavior in erlang to subscribe to a given topic exchange<br>
> >> and<br>
> >> > farm out message handling. I'm using the rabbitmq amqp_client library<br>
> >> for<br>
> >> > erlang and when I put the system under heavy load I get, on occasion,<br>
> >> the<br>
> >> > following error:<br>
> >><br>
> >> Could you let us know which version of Rabbit, Erlang and the Erlang<br>
> >> client you're using?<br>
> >><br>
> >> > =ERROR REPORT==== 29-Jun-2011::18:02:18 ===<br>
> >> > ** Generic server <0.1117.0> terminating<br>
> >> > ** Last message in was {'$gen_cast',<br>
> >> > {method,<br>
> >> > {'channel.close',406,<br>
> >> > <<"PRECONDITION_FAILED - unknown<br>
> >> delivery<br>
> >> > tag 856">>,<br>
> >> > 60,80},<br>
> >><br>
> >> That's a double-ack (probably). Sadly, the AMQP 0-9-1 spec says that<br>
> >> acking is not idempotent, thus it's a fault to ack the same message<br>
> >> multiple times...<br>
> >><br>
> >> > The server receive loop where the ack happens looks like this:<br>
> >> > receive<br>
> >> > ...<br>
> >> > {#'basic.deliver'{delivery_tag = Tag, routing_key = RoutingKey},<br>
> >> > #amqp_msg{payload = Payload}} -><br>
> >> > amqp_channel:cast(get(amqp_channel_pid), #'basic.ack'{delivery_tag =<br>
> >> > Tag}),<br>
> >> > spawn_and_queue(spawn_handle_message, Module, RoutingKey, Payload),<br>
> >> > loop(Module);<br>
> >> > ...<br>
> >> > end<br>
> >><br>
> >> ...hmmm, which is so simple that I can't see how it could go wrong: if<br>
> >> you're not double acking then something else must be going on to make<br>
> >> the broker think that it's not expecting an ack for that message, hence<br>
> >> the error. If you're doing some sort of reject operation - either<br>
> >> basic.nack or basic.reject on messages and you then subsequently ack one<br>
> >> of those messages then that would also cause this error. There may be<br>
> >> other cases as well.<br>
> >><br>
> >> > The amqp_client_sup can't seem to bring back the the client either and<br>
> >> dies<br>
> >> > from the retry intensity being reached. I've done a hefty amount of<br>
> >> > googling and can't seem to find where things could be going wrong.<br>
> >> Before<br>
> >> > jumping into the amqp_client code I thought I'd ask the mailing list if<br>
> >> they<br>
> >> > have any ideas. The only thing I can think is that there is a race<br>
> >> > condition within the client library. I will be double checking my code<br>
> >> to<br>
> >> > be sure it isn't sending the ack twice, but given the simplicity of the<br>
> >> ack<br>
> >> > the only way it could is if it receives the same message (with identical<br>
> >> > delivery tag) from the amqp_client library twice.<br>
> >><br>
> >> It could be a bug in the client library, but I'd be a little surprised<br>
> >> if we're managing to duplicate messages somehow - that would be a new<br>
> >> level of fail for us. ;) However, the fact that the entire connection<br>
> >> dies is alarming and almost certainly a bug: PRECONDITION_FAILED is a<br>
> >> soft error and should only tear down the channel, not the whole<br>
> >> connection. After that, all you should have to do is create a new<br>
> >> channel and everything else should be ok. If that's not the case please<br>
> >> let us know.<br>
> >><br>
> >> Best wishes,<br>
> >><br>
> >> Matthew<br>
> >> _______________________________________________<br>
> >> rabbitmq-discuss mailing list<br>
> >> <a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br>
> >> <a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>
> >><br>
> ><br>
> ><br>
<br>
> _______________________________________________<br>
> rabbitmq-discuss mailing list<br>
> <a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br>
> <a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>
<br>
</div></div></blockquote></div><br></div></div>