[rabbitmq-discuss] precondition_failed error with amqp_client for erlang

Max Warnock maxjwarnock at gmail.com
Fri Jul 1 18:51:10 BST 2011


Problem found.  Thanks for your help.  The problem is a strange one and has
to do with me not shutting my amqp_client listener down properly if my
server dies.  Here is how it manifests:

1.) Server starts up and starts up a amqp client connection and channel
2.) The server binds to that channel and starts the subscription using a
registered name name as the process to which messages will be sent
3.) Messages start coming in and are ack-ing fine
4.) Poor error handling in farming out processes brings the server down
5.) The server does no close the amqp_client connection
6.) The server supervisor restarts the server which creates a new listener,
but the old listener is still hanging around trying to send the the
registered name
7.) The older listener sends a message to the server
8.) The server tries to ack to the new listener which did not send the
message
9.) The new server pukes because it never sent a message with that tag

So my question then is how should I kill the amqp_client? If I send it an
exit its supervisor will restart it.  This is what I was getting at with my
tangential questions in the last email.  How should I shut down the
amqp_client without shutting down all the other servers' amqp client
listeners?

Thanks for all the help,
-Max

On Thu, Jun 30, 2011 at 9:23 AM, Max Warnock <maxjwarnock at gmail.com> wrote:

> Thanks, that's very helpful from both the possible issues to chase and
> sanity check perspectives.
>
> I'm using erlang R13B04 with a rabbitmq server installed via gentoo's
> portage at version 2.4.1. I pulled the client library from github (tag
> 2.3.0, commit: 844738f9b56d34104c1ea2ac5700d0898126c5b4).
>
> I'm going to write some debug code to store all the tags I try to ack on
> and see if I can get this error to where it's easily reproducible. Thanks
> for narrowing my search, it's very helpful.  I'll keep you updated. I must
> be doing something wrong somewhere.  I have a hard time believing such a
> widely used library could fail so hard myself.
>
> One thing that would be extremely helpful is if you could point me to some
> documentation which I haven't been able to find:  I'm looking for a listing
> of all the events/messages that are sent out by the amqp client to a
> subscriber.  What does it send when it goes down, what other soft errors
> will it send out, etc.  Additionally, is there a doc somewhere for best
> practices in connecting a listener to another server/long-running process?
>  Not having either of those there has been some struggle to know how to
> restart the subscription/listening process if my server dies.  The
> amqp_client tutorial has been a great help, but when it comes to error
> handling from the listening module perspective it doesn't tell me what the
> library is expecting me to do.  I don't want to have to do a bunch of
> engineering because I'm square peg, round hole-ing the library.  The primary
> issues I'm concerned with are when my server dies hard and is destined to be
> restarted by its supervisor what should I send to the amqp client process?
> Should I send it close messages and then start a new one? Or should I
> reconnect to the client library.  This wouldn't be as big of an issue but I
> need to use durable/persistent queues and if I still have a listener hanging
> around with the same bindings on the same queue it will eat all my messages
> and send them nowhere.
>
> Thanks,
> -Max
>
> On Thu, Jun 30, 2011 at 7:48 AM, Matthew Sackman <matthew at rabbitmq.com>wrote:
>
>> Hi Max,
>>
>> On Wed, Jun 29, 2011 at 06:28:59PM -0400, Max Warnock wrote:
>> > I've built a behavior in erlang to subscribe to a given topic exchange
>> and
>> > farm out message handling.  I'm using the rabbitmq amqp_client library
>> for
>> > erlang and when I put the system under heavy load I get, on occasion,
>> the
>> > following error:
>>
>> Could you let us know which version of Rabbit, Erlang and the Erlang
>> client you're using?
>>
>> > =ERROR REPORT==== 29-Jun-2011::18:02:18 ===
>> > ** Generic server <0.1117.0> terminating
>> > ** Last message in was {'$gen_cast',
>> >                            {method,
>> >                                {'channel.close',406,
>> >                                    <<"PRECONDITION_FAILED - unknown
>> delivery
>> > tag 856">>,
>> >                                    60,80},
>>
>> That's a double-ack (probably). Sadly, the AMQP 0-9-1 spec says that
>> acking is not idempotent, thus it's a fault to ack the same message
>> multiple times...
>>
>> > The server receive loop where the ack happens looks like this:
>> > receive
>> > ...
>> > {#'basic.deliver'{delivery_tag = Tag, routing_key = RoutingKey},
>> > #amqp_msg{payload = Payload}} ->
>> >     amqp_channel:cast(get(amqp_channel_pid), #'basic.ack'{delivery_tag =
>> > Tag}),
>> >     spawn_and_queue(spawn_handle_message, Module, RoutingKey, Payload),
>> >     loop(Module);
>> > ...
>> > end
>>
>> ...hmmm, which is so simple that I can't see how it could go wrong: if
>> you're not double acking then something else must be going on to make
>> the broker think that it's not expecting an ack for that message, hence
>> the error. If you're doing some sort of reject operation - either
>> basic.nack or basic.reject on messages and you then subsequently ack one
>> of those messages then that would also cause this error. There may be
>> other cases as well.
>>
>> > The amqp_client_sup can't seem to bring back the the client either and
>> dies
>> > from the retry intensity being reached.  I've done a hefty amount of
>> > googling and can't seem to find where things could be going wrong.
>>  Before
>> > jumping into the amqp_client code I thought I'd ask the mailing list if
>> they
>> > have any ideas.  The only thing I can think is that there is a race
>> > condition within the client library.  I will be double checking my code
>> to
>> > be sure it isn't sending the ack twice, but given the simplicity of the
>> ack
>> > the only way it could is if it receives the same message (with identical
>> > delivery tag) from the amqp_client library twice.
>>
>> It could be a bug in the client library, but I'd be a little surprised
>> if we're managing to duplicate messages somehow - that would be a new
>> level of fail for us. ;) However, the fact that the entire connection
>> dies is alarming and almost certainly a bug: PRECONDITION_FAILED is a
>> soft error and should only tear down the channel, not the whole
>> connection. After that, all you should have to do is create a new
>> channel and everything else should be ok. If that's not the case please
>> let us know.
>>
>> Best wishes,
>>
>> Matthew
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20110701/a9893c7f/attachment.htm>


More information about the rabbitmq-discuss mailing list