[rabbitmq-discuss] Problem with publish confirms

Simon MacMullen simon at rabbitmq.com
Mon Apr 7 11:20:19 BST 2014


It's going to be hard to debug this without seeing your code, especially 
since it sounds like it is quite likely to be a general Erlang problem.

Having said that, here are some general pointers:

On 04/04/2014 20:44, Ryan Brown wrote:
<snip>
> Then on publish, I select which one to use based
> on the values passed-down the stack from message intake. Set self() as
> the confirm handler and respond with a wait so that up the stack I can
> initiate a receive loop in the message intake which is not a gen_server.

Have you seen amqp_channel:wait_for_confirms/1? That might be easier 
than handling the confirms yourself if all you are going to do is wait 
for them anyway.

> I have debug messages throughout and it appears that everything is
> happening as expected. I trace the message through the publish path. To
> being published and passing the wait back up to the receiver. I also see
> where the basic.ack is received and the publish_confirm acknowledgement
> passed all the way up to the receive loop. However, then things go awry.
> I get the attached dump.

It's very hard to see what is going wrong here in detail, but the 
surface issue is that your gen_server took more than 5 seconds to 
respond to a request - note that gen_server:call/2 will time out after 5 
seconds.

For this reason we never use gen_server:call/2 (we have a CI task that 
explodes if we accidentally commit code using it!); we always use call/3 
with a timeout of 'infinity'. IMO this is a misfeature in Erlang.

> My take-away, although likely I am wrong, is that somewhere, apparently
> in amqp_balanced_publisher, a gen_server call is timing-out while
> waiting for a response. But, I seem to be able to trace the responses
> through the whole stack so I am thoroughly confused.

That's correct. Of course, it should be quite rare / require heavy load 
for the 5s timeout to fail because the operation simply took too long. 
It's quite possible that the gen_server you are calling into is 
deadlocked or stuck in some other way.

I would debug this as follows:

1) Change the call/2 to call/3 with infinity.
2) If the program now hangs rather than times out, launch 'observer' 
(http://www.erlang.org/doc/apps/observer/observer_ug.html), go and find 
the server you are calling into, and see what it's current stacktrace 
is. You may wish to give this process a registered name temporarily so 
you can find it more easily.

Cheers, Simon



More information about the rabbitmq-discuss mailing list