[rabbitmq-discuss] precondition_failed error with amqp_client for erlang

Max Warnock maxjwarnock at gmail.com
Tue Jul 5 16:28:46 BST 2011


Sorry about the ambiguity,

For the sake of clarity here is the glossary of terms I used in my last
email (which probably clashes with the erlang/amqp_client context you're
coming from):

Server - I'm referring to my long running process in erlang that I have
given a registered name and passed as the 3rd argument to
amqp_client:subscribe/3.
Listener - the process created by the amqp_client library when a connection
and channel are opened
Subscribe - calling of amqp_client:subscribe/3
My - I'm using this pronoun to distinguish code written by me from code
written by you cats at rabbitmq (client library, rabbitmq server, etc)

I've attached a diagram (approximation/abstraction) of how I'm interacting
with the amqp_client library. (sorry to the mailing list if attaching a 40K
diagram breaks etiquette).

I'm using the amqp_client library in network mode, i.e.,
amqp_client:start(network, #amqp_params{host = Host, heartbeat=60000})

Yes a list of messages that the amqp_client process sends to a subscriber,
particularly pertaining to errors in amqp_client land, would be very
helpful.  I'd like to be able to handle all {'DOWN',Etc} messages with my
long running process (server).  I'm hoping to handle all hard errors so that
a restart from either supervisor (my long running process or the
amqp_client's) won't break the communication between the two.

But I get the impression that I'm missing something about how I'm supposed
to treat the amqp_client library with regard to amqp_client:start/2.  Should
I be treating the amqp_client connection like mnesia (an application
entirely independent of mine), add it to my existing supervision tree and
share one connection throughout my application, or, what I'm currently
doing, let each part of my application that needs to talk to amqp spin up
and close their own connection/channel?

Thanks,
-Max

On Mon, Jul 4, 2011 at 12:16 PM, Alexandru Scvorţov
<alexandru at rabbitmq.com>wrote:

> Hi Max,
>
> I'm trying to run through the steps you provided, but I'm having a bit
> of trouble following.
>
> Are you using a network or a direct connection? (I assume network, but
> it probably doesn't matter)
>
> By server, do you mean the actual RabbitMQ server, or you application?
> (I'm guessing your long-running application)
>
> By subscribe, do you mean calling amqp_channel:subscribe/3?  If so, do
> you still need a list of the messages the channel may send its
> subscriber?
>
> Or do you mean that your application is sending messages to its
> subscribers?
>
> > 6.) The server supervisor restarts the server which creates a new
> listener,
> > but the old listener is still hanging around trying to send the the
> > registered name
>
> What's a listener?  Is it a process that receives messages from the
> erlang client because it's the endpoint of a subscription to a queue?
>
> Can't you link listeners to the server so that when the server goes
> down, it takes the listeners with it?
>
> > So my question then is how should I kill the amqp_client?
>
> What do you mean by amqp_client?  If it's an amqp_connection process,
> you can just send it an shutdown exit.
>
> Thanks for the information.
>
> Cheers,
> Alex
>
> On Fri, Jul 01, 2011 at 01:51:10PM -0400, Max Warnock wrote:
> > Problem found.  Thanks for your help.  The problem is a strange one and
> has
> > to do with me not shutting my amqp_client listener down properly if my
> > server dies.  Here is how it manifests:
> >
> > 1.) Server starts up and starts up a amqp client connection and channel
> > 2.) The server binds to that channel and starts the subscription using a
> > registered name name as the process to which messages will be sent
> > 3.) Messages start coming in and are ack-ing fine
> > 4.) Poor error handling in farming out processes brings the server down
> > 5.) The server does no close the amqp_client connection
> > 6.) The server supervisor restarts the server which creates a new
> listener,
> > but the old listener is still hanging around trying to send the the
> > registered name
> > 7.) The older listener sends a message to the server
> > 8.) The server tries to ack to the new listener which did not send the
> > message
> > 9.) The new server pukes because it never sent a message with that tag
> >
> > So my question then is how should I kill the amqp_client? If I send it an
> > exit its supervisor will restart it.  This is what I was getting at with
> my
> > tangential questions in the last email.  How should I shut down the
> > amqp_client without shutting down all the other servers' amqp client
> > listeners?
> >
> > Thanks for all the help,
> > -Max
> >
> > On Thu, Jun 30, 2011 at 9:23 AM, Max Warnock <maxjwarnock at gmail.com>
> wrote:
> >
> > > Thanks, that's very helpful from both the possible issues to chase and
> > > sanity check perspectives.
> > >
> > > I'm using erlang R13B04 with a rabbitmq server installed via gentoo's
> > > portage at version 2.4.1. I pulled the client library from github (tag
> > > 2.3.0, commit: 844738f9b56d34104c1ea2ac5700d0898126c5b4).
> > >
> > > I'm going to write some debug code to store all the tags I try to ack
> on
> > > and see if I can get this error to where it's easily reproducible.
> Thanks
> > > for narrowing my search, it's very helpful.  I'll keep you updated. I
> must
> > > be doing something wrong somewhere.  I have a hard time believing such
> a
> > > widely used library could fail so hard myself.
> > >
> > > One thing that would be extremely helpful is if you could point me to
> some
> > > documentation which I haven't been able to find:  I'm looking for a
> listing
> > > of all the events/messages that are sent out by the amqp client to a
> > > subscriber.  What does it send when it goes down, what other soft
> errors
> > > will it send out, etc.  Additionally, is there a doc somewhere for best
> > > practices in connecting a listener to another server/long-running
> process?
> > >  Not having either of those there has been some struggle to know how to
> > > restart the subscription/listening process if my server dies.  The
> > > amqp_client tutorial has been a great help, but when it comes to error
> > > handling from the listening module perspective it doesn't tell me what
> the
> > > library is expecting me to do.  I don't want to have to do a bunch of
> > > engineering because I'm square peg, round hole-ing the library.  The
> primary
> > > issues I'm concerned with are when my server dies hard and is destined
> to be
> > > restarted by its supervisor what should I send to the amqp client
> process?
> > > Should I send it close messages and then start a new one? Or should I
> > > reconnect to the client library.  This wouldn't be as big of an issue
> but I
> > > need to use durable/persistent queues and if I still have a listener
> hanging
> > > around with the same bindings on the same queue it will eat all my
> messages
> > > and send them nowhere.
> > >
> > > Thanks,
> > > -Max
> > >
> > > On Thu, Jun 30, 2011 at 7:48 AM, Matthew Sackman <matthew at rabbitmq.com
> >wrote:
> > >
> > >> Hi Max,
> > >>
> > >> On Wed, Jun 29, 2011 at 06:28:59PM -0400, Max Warnock wrote:
> > >> > I've built a behavior in erlang to subscribe to a given topic
> exchange
> > >> and
> > >> > farm out message handling.  I'm using the rabbitmq amqp_client
> library
> > >> for
> > >> > erlang and when I put the system under heavy load I get, on
> occasion,
> > >> the
> > >> > following error:
> > >>
> > >> Could you let us know which version of Rabbit, Erlang and the Erlang
> > >> client you're using?
> > >>
> > >> > =ERROR REPORT==== 29-Jun-2011::18:02:18 ===
> > >> > ** Generic server <0.1117.0> terminating
> > >> > ** Last message in was {'$gen_cast',
> > >> >                            {method,
> > >> >                                {'channel.close',406,
> > >> >                                    <<"PRECONDITION_FAILED - unknown
> > >> delivery
> > >> > tag 856">>,
> > >> >                                    60,80},
> > >>
> > >> That's a double-ack (probably). Sadly, the AMQP 0-9-1 spec says that
> > >> acking is not idempotent, thus it's a fault to ack the same message
> > >> multiple times...
> > >>
> > >> > The server receive loop where the ack happens looks like this:
> > >> > receive
> > >> > ...
> > >> > {#'basic.deliver'{delivery_tag = Tag, routing_key = RoutingKey},
> > >> > #amqp_msg{payload = Payload}} ->
> > >> >     amqp_channel:cast(get(amqp_channel_pid),
> #'basic.ack'{delivery_tag =
> > >> > Tag}),
> > >> >     spawn_and_queue(spawn_handle_message, Module, RoutingKey,
> Payload),
> > >> >     loop(Module);
> > >> > ...
> > >> > end
> > >>
> > >> ...hmmm, which is so simple that I can't see how it could go wrong: if
> > >> you're not double acking then something else must be going on to make
> > >> the broker think that it's not expecting an ack for that message,
> hence
> > >> the error. If you're doing some sort of reject operation - either
> > >> basic.nack or basic.reject on messages and you then subsequently ack
> one
> > >> of those messages then that would also cause this error. There may be
> > >> other cases as well.
> > >>
> > >> > The amqp_client_sup can't seem to bring back the the client either
> and
> > >> dies
> > >> > from the retry intensity being reached.  I've done a hefty amount of
> > >> > googling and can't seem to find where things could be going wrong.
> > >>  Before
> > >> > jumping into the amqp_client code I thought I'd ask the mailing list
> if
> > >> they
> > >> > have any ideas.  The only thing I can think is that there is a race
> > >> > condition within the client library.  I will be double checking my
> code
> > >> to
> > >> > be sure it isn't sending the ack twice, but given the simplicity of
> the
> > >> ack
> > >> > the only way it could is if it receives the same message (with
> identical
> > >> > delivery tag) from the amqp_client library twice.
> > >>
> > >> It could be a bug in the client library, but I'd be a little surprised
> > >> if we're managing to duplicate messages somehow - that would be a new
> > >> level of fail for us. ;) However, the fact that the entire connection
> > >> dies is alarming and almost certainly a bug: PRECONDITION_FAILED is a
> > >> soft error and should only tear down the channel, not the whole
> > >> connection. After that, all you should have to do is create a new
> > >> channel and everything else should be ok. If that's not the case
> please
> > >> let us know.
> > >>
> > >> Best wishes,
> > >>
> > >> Matthew
> > >> _______________________________________________
> > >> rabbitmq-discuss mailing list
> > >> rabbitmq-discuss at lists.rabbitmq.com
> > >> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> > >>
> > >
> > >
>
> > _______________________________________________
> > rabbitmq-discuss mailing list
> > rabbitmq-discuss at lists.rabbitmq.com
> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20110705/2da3e2ee/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gen_amqp_subscriberinteractionwithamqp_client.png
Type: image/png
Size: 40314 bytes
Desc: not available
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20110705/2da3e2ee/attachment.png>


More information about the rabbitmq-discuss mailing list