[rabbitmq-discuss] Problem with publish confirms
Simon MacMullen
simon at rabbitmq.com
Mon Apr 7 11:20:19 BST 2014
It's going to be hard to debug this without seeing your code, especially
since it sounds like it is quite likely to be a general Erlang problem.
Having said that, here are some general pointers:
On 04/04/2014 20:44, Ryan Brown wrote:
<snip>
> Then on publish, I select which one to use based
> on the values passed-down the stack from message intake. Set self() as
> the confirm handler and respond with a wait so that up the stack I can
> initiate a receive loop in the message intake which is not a gen_server.
Have you seen amqp_channel:wait_for_confirms/1? That might be easier
than handling the confirms yourself if all you are going to do is wait
for them anyway.
> I have debug messages throughout and it appears that everything is
> happening as expected. I trace the message through the publish path. To
> being published and passing the wait back up to the receiver. I also see
> where the basic.ack is received and the publish_confirm acknowledgement
> passed all the way up to the receive loop. However, then things go awry.
> I get the attached dump.
It's very hard to see what is going wrong here in detail, but the
surface issue is that your gen_server took more than 5 seconds to
respond to a request - note that gen_server:call/2 will time out after 5
seconds.
For this reason we never use gen_server:call/2 (we have a CI task that
explodes if we accidentally commit code using it!); we always use call/3
with a timeout of 'infinity'. IMO this is a misfeature in Erlang.
> My take-away, although likely I am wrong, is that somewhere, apparently
> in amqp_balanced_publisher, a gen_server call is timing-out while
> waiting for a response. But, I seem to be able to trace the responses
> through the whole stack so I am thoroughly confused.
That's correct. Of course, it should be quite rare / require heavy load
for the 5s timeout to fail because the operation simply took too long.
It's quite possible that the gen_server you are calling into is
deadlocked or stuck in some other way.
I would debug this as follows:
1) Change the call/2 to call/3 with infinity.
2) If the program now hangs rather than times out, launch 'observer'
(http://www.erlang.org/doc/apps/observer/observer_ug.html), go and find
the server you are calling into, and see what it's current stacktrace
is. You may wish to give this process a registered name temporarily so
you can find it more easily.
Cheers, Simon
More information about the rabbitmq-discuss
mailing list