[rabbitmq-discuss] Problem with publish confirms

Ryan Brown ryankbrown at gmail.com
Mon Apr 7 18:07:17 BST 2014


You rock Simon! So much great information. I found my problem and it was
indeed a gen_server timeout due to a missed return in my confirmation chain
going back up the chain to the rest interface where I respond to the client
that posted the message for publishing.

However, that said, I am curious about the option of just using
amqp_channel:wait_for_confirms/1. I did not use this option for a couple of
reasons.

First was the fact that using the registered handler is mentioned quite
often as the preferred option. My understanding was that using the
amqp_channel:wait_for_confirms would cause the channel to wait for a
response before another message could be published on that channel. If this
is true, it seems like it could slow-down publishing more than registering
a handler and letting publishing continue. Is this correct? Or am I
misunderstanding the whole workflow?

Speaking of misunderstandings; the second reason I did not use
wait_for_confirms was that I didn't completely understand the definition:

"Wait until all messages published since the last call have been either
ack'd or nack'd by the broker; or until timeout elapses."

What exactly does it mean by the "last call"? This sounded to me as if it
may get in a situation that if calls are coming-in rapidly enough that they
are publishing new messages prior to the previous message getting ack'd, it
would not respond until all such queued messages were ack'd/nack'd. I
realized this was unlikely, but could not come-up with documentation clear
enough to convince me otherwise and this would not work for me as I need to
respond to each publish individually.

Thanks again for the, characteristically, excellent and comprehensive
reponse.

Best,

Ryan


On Mon, Apr 7, 2014 at 4:20 AM, Simon MacMullen <simon at rabbitmq.com> wrote:

> It's going to be hard to debug this without seeing your code, especially
> since it sounds like it is quite likely to be a general Erlang problem.
>
> Having said that, here are some general pointers:
>
> On 04/04/2014 20:44, Ryan Brown wrote:
> <snip>
>
>> Then on publish, I select which one to use based
>> on the values passed-down the stack from message intake. Set self() as
>> the confirm handler and respond with a wait so that up the stack I can
>> initiate a receive loop in the message intake which is not a gen_server.
>>
>
> Have you seen amqp_channel:wait_for_confirms/1? That might be easier than
> handling the confirms yourself if all you are going to do is wait for them
> anyway.
>
>  I have debug messages throughout and it appears that everything is
>> happening as expected. I trace the message through the publish path. To
>> being published and passing the wait back up to the receiver. I also see
>> where the basic.ack is received and the publish_confirm acknowledgement
>> passed all the way up to the receive loop. However, then things go awry.
>> I get the attached dump.
>>
>
> It's very hard to see what is going wrong here in detail, but the surface
> issue is that your gen_server took more than 5 seconds to respond to a
> request - note that gen_server:call/2 will time out after 5 seconds.
>
> For this reason we never use gen_server:call/2 (we have a CI task that
> explodes if we accidentally commit code using it!); we always use call/3
> with a timeout of 'infinity'. IMO this is a misfeature in Erlang.
>
>  My take-away, although likely I am wrong, is that somewhere, apparently
>> in amqp_balanced_publisher, a gen_server call is timing-out while
>> waiting for a response. But, I seem to be able to trace the responses
>> through the whole stack so I am thoroughly confused.
>>
>
> That's correct. Of course, it should be quite rare / require heavy load
> for the 5s timeout to fail because the operation simply took too long. It's
> quite possible that the gen_server you are calling into is deadlocked or
> stuck in some other way.
>
> I would debug this as follows:
>
> 1) Change the call/2 to call/3 with infinity.
> 2) If the program now hangs rather than times out, launch 'observer' (
> http://www.erlang.org/doc/apps/observer/observer_ug.html), go and find
> the server you are calling into, and see what it's current stacktrace is.
> You may wish to give this process a registered name temporarily so you can
> find it more easily.
>
> Cheers, Simon
>
>


-- 
-rb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140407/3490b77b/attachment.html>


More information about the rabbitmq-discuss mailing list