[rabbitmq-discuss] The rabbitmq-server stop command hangs

Tim Watson tim at rabbitmq.com
Fri Nov 30 16:46:25 GMT 2012


Actually the problem is in the AMQP client that shovel uses - in fact the scenario which triggers this behaviour is rather obscure. From your rabbit-sasl.log we can see that the shovelB and its _realtime friend are not fully operational at the time you're shutting down. This is evident because in the trace, we can see that the shovel are workers are still in the connection establishment phase:

stacktrace:  [{gen,do_call,4,[{file,"gen.erl"},{line,217}]},
            {gen_server,call,3,[{file,"gen_server.erl"},{line,184}]},
            {application,load1,2,[{file,"application.erl"},{line,95}]},
            {application,start,2,[{file,"application.erl"},{line,129}]},
            {amqp_connection,start,1,[]},
            {rabbit_shovel_worker,make_conn_and_chan,1,[]},
            {rabbit_shovel_worker,handle_cast,2,[]},
            {gen_server2,handle_msg,2,[]}]

In the sasl logs we see repeated entries indicating that there is a problem identifying the host, which is preventing the workers from establishing the shovel connection properly:

=CRASH REPORT==== 28-Nov-2012::14:07:52 ===
  crasher:
    initial call: amqp_gen_connection:init/1
    pid: <0.579.0>
    registered_name: []
    exception exit: {unexpected_msg,
                        {'EXIT',<0.573.0>,
                            {{badmatch,{error,unknown_host}},
                             [{rabbit_shovel_worker,make_conn_and_chan,1,[]},
                              {rabbit_shovel_worker,handle_cast,2,[]},
                              {gen_server2,handle_msg,2,[]},
                              {proc_lib,init_p_do_apply,3,
                                  [{file,"proc_lib.erl"},{line,227}]}]}}}
      in function  gen_server:terminate/6 (gen_server.erl, line 737)
    ancestors: [<0.577.0>,amqp_sup,<0.52.0>]
    messages: []
    links: [<0.577.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 2584
    stack_size: 24
    reductions: 974
  neighbours:


Now what is actually happening is this: the application controller for the node (which is part of the Erlang/OTP runtime system) is busy shutting down all the rabbit plugins and the rabbit application itself. Whilst this is happening however, the rabbit_shovel_worker is attempting to connect to 'server.local' and failing. Each time it fails, the shovel supervisor restarts it and tries again to establish an AMQP connection. This connection startup routine, specifically the amqp_connection:start function, contains code that calls the application controller to check if the required client infrastructure (i.e., the amqp_client application) is already running, and that call into the OTP application management APIs actually deadlocks when it is run inside an application shutdown sequence.

So to see this happen, you have to have an AMQP connection attempt that fails and is restarted by its supervisor *just before* the application controller starts shutting down, *and* then attempts to start *just after* the application controller has entered the shutdown phase. The shovel configuration, reconnect_delay and non-availability of the source host (or other network oddities) all play a part here.

Cheers,
Tim

On 30 Nov 2012, at 15:57, Elizabeth Liao wrote:

> Thanks for all your help as well.  Just one question, was the problem just in the shovel plugin?
> ________________________________________
> From: rabbitmq-discuss-bounces at lists.rabbitmq.com [rabbitmq-discuss-bounces at lists.rabbitmq.com] on behalf of Tim Watson [tim at rabbitmq.com]
> Sent: Friday, November 30, 2012 10:35 AM
> To: Discussions about RabbitMQ
> Subject: Re: [rabbitmq-discuss] The rabbitmq-server stop command hangs
> 
> Thank you Liz,
> 
> We've now found the problem, filed a bug and fixed it. Hopefully the fix will be released in 3.0.1 in the very near future. :)
> 
> Thanks again for reporting this, and providing us with all the information that has helped us track it down!
> 
> Tim
> 
> On 30 Nov 2012, at 15:18, Elizabeth Liao wrote:
> 
>>> One more question please. Is the node that you're trying to shut down hanging *indefinitely* or for a very long time, or for a minute or so?
>> 
>> In all cases for which I've sent logs/trace outputs, the shutdown hangs indefinitely.   I've also seen instances where it hangs for a very long time (~5 minutes) but those were not reproducible.
>> 
>>> When the rabbit is stuck shutting down, is the source host for shovelB and shovelB_realtime (server.local) accessible - i.e., you can ping it, telnet to the amqp port, etc? It seems these two shovels have never got properly started, and they're hung (still trying to establish connections/channels) when you're trying to shut down. We are filing a bug and looking at how to fix this, but it would be helpful for me to understand the topology so I can simulate this bug when producing a fix.
>> 
>> I checked the connections to and from server.local using ping and telneting to the amqp port and both look okay at the time the shutdown is hanging.
>> 
>> Other information that may be of use:
>> * server.local does not have any special configuration (no rabbitmq.config file)
>> * I can reproduce this more reliably with 2.8.7 than 3.0.0
>> * We're initiating the shutdown shortly after bootup
>> 
>> Liz
>> 
>> 
>> Email Confidentiality Notice
>> 
>> The information contained in this transmission is confidential, proprietary or privileged and may be subject to protection under the law. This message is intended for the sole use of the individual or entity to whom it's addressed. If you are not the intended recipient, you are notified that any use, distribution or copying of the message is strictly prohibited and may subject you to criminal or civil penalties. If you received this transmission in error, please contact the sender immediately by replying to this email and delete the material from any computer.
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> 
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> Email Confidentiality Notice
> 
> The information contained in this transmission is confidential, proprietary or privileged and may be subject to protection under the law. This message is intended for the sole use of the individual or entity to whom it's addressed. If you are not the intended recipient, you are notified that any use, distribution or copying of the message is strictly prohibited and may subject you to criminal or civil penalties. If you received this transmission in error, please contact the sender immediately by replying to this email and delete the material from any computer.
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



More information about the rabbitmq-discuss mailing list