[rabbitmq-discuss] rabbitmqctl stop hangs
Tim Watson
tim at rabbitmq.com
Wed Dec 11 10:52:06 GMT 2013
Hi Ed,
If at all possible, would you please send as much of the logging information as possible? You can contact me offline about emailing/uploading it privately if you prefer, or simply scrub any private data from it before hand. I'm going to need to see more of the stack traces in the logs to diagnose this issue properly.
Thanks,
Tim
On 11 Dec 2013, at 09:18, Tim Watson wrote:
> Thanks for the analysis Ed, this sounds like a bug - ill take a walk through the code this morning and confirm that. If the shovel is "stuck" trying to establish a connection, that might well be an explanation. It's surprising that the supervision tree (for shovel workers) doesn't handle this scenario, but it may need to be reconfigured to handle delayed startup or the workers may need to be re-worked to handle startup differently.
>
> I'll post back when I've validated those issues and let you know what my findings are.
>
> Cheers,
> Tim
>
> On 10 Dec 2013, at 23:40, "Tyrrill, Ed" <ed.tyrrill at emc.com> wrote:
>
>> Hi Tim,
>>
>> I realized I did not address all of your questions.
>>
>> > How long do these hangs take? The shovel workers will wait 10 seconds for both their inbound and outbound connections to close cleanly.
>>
>> I have one I server I left untouched that hung four days ago, and is still hung
>>
>> > If you examine the log files for both the source and destination (i.e., remote) brokers during the shutdown, there may be some useful indication of whether this is the cause of the problem or not.
>>
>> The destination of the shovel was down at the time so only the source side is relevant. I noticed that the shutdown_log file has the time 13:46, with the message "Stopping and halting node 'rabbit at vm-ave29' …", and shutdown_err is empty. The "Halting Erlang VM" message in rabbit at vm-ave29.log is then coming five minutes later at 13:51. The messages in rabbit at vm-ave29-sasl.log all seem related to shovel connect attempts. Also note that the RabbitMQ erlang process is still running.
>>
>> > We have fixed bugs with shutdown delays and deadlocks in the past, but they're mostly dusted and released now. We do have an open issue that can cause long delays during broker shutdown, which is mediated by having a lot of durable queues (regardless of whether they contain messages or not). Could that be what you're seeing? How many durable queues do these brokers have running on them?
>>
>> Five durable queues.
>>
>> Thanks,
>> Ed
>>
>> From: <Tyrrill>, Edward Tyrrill <ed.tyrrill at emc.com>
>> Reply-To: Discussions about RabbitMQ <rabbitmq-discuss at lists.rabbitmq.com>
>> Date: Tuesday, December 10, 2013 10:48 AM
>> To: Discussions about RabbitMQ <rabbitmq-discuss at lists.rabbitmq.com>
>> Subject: Re: [rabbitmq-discuss] rabbitmqctl stop hangs
>>
>> Hi Tim,
>>
>> Thanks for your quick response. From the end of the log it looks like the shovel plugin was trying to establish the connection to the remote broker, which was down at the time, at the same time stop was run:
>>
>> =ERROR REPORT==== 6-Dec-2013::13:51:36 ===
>> ** Generic server <0.8762.19> terminating
>> ** Last message in was {'EXIT',<0.8759.19>,
>> {{badmatch,{error,etimedout}},
>> [{rabbit_shovel_worker,make_conn_and_chan,1,[]},
>> {rabbit_shovel_worker,handle_cast,2,[]},
>> {gen_server2,handle_msg,2,[]},
>> {proc_lib,init_p_do_apply,3,
>> [{file,"proc_lib.erl"},{line,239}]}]}}
>> ** When Server state == {state,amqp_direct_connection,
>> {state,'rabbit at vm-ave29',
>> {user,<<"guest">>,
>> [administrator],
>> rabbit_auth_backend_internal,
>> {internal_user,<<"guest">>,
>> <<193,148,73,243,245,222,154,143,19,215,47,234,93,
>> 175,56,125,17,151,61,97>>,
>> [administrator]}},
>> <<"/">>,
>> {amqp_params_direct,<<"guest">>,none,<<"/">>,
>> 'rabbit at vm-ave29',none,[]},
>> {amqp_adapter_info,unknown,unknown,unknown,unknown,
>> <<"<'rabbit at vm-ave29'.3.8762.19>">>,
>> {'Direct',{0,9,1}},
>> []},
>> <0.8765.19>,undefined},
>> <0.8764.19>,
>> {amqp_params_direct,<<"guest">>,none,<<"/">>,
>> 'rabbit at vm-ave29',none,[]},
>> 0,
>> [{<<"capabilities">>,table,
>> [{<<"publisher_confirms">>,bool,true},
>> {<<"exchange_exchange_bindings">>,bool,true},
>> {<<"basic.nack">>,bool,true},
>> {<<"consumer_cancel_notify">>,bool,true},
>> {<<"connection.blocked">>,bool,true},
>> {<<"consumer_priorities">>,bool,true},
>> {<<"authentication_failure_close">>,bool,true}]},
>> {<<"copyright">>,longstr,
>> <<"Copyright (C) 2007-2013 GoPivotal, Inc.">>},
>> {<<"information">>,longstr,
>> <<"Licensed under the MPL. See http://www.rabbitmq.com/">>},
>> {<<"platform">>,longstr,<<"Erlang/OTP">>},
>> {<<"product">>,longstr,<<"RabbitMQ">>},
>> {<<"version">>,longstr,<<"3.2.0">>}],
>> none,false}
>> ** Reason for termination ==
>> ** {unexpected_msg,
>> {'EXIT',<0.8759.19>,
>> {{badmatch,{error,etimedout}},
>> [{rabbit_shovel_worker,make_conn_and_chan,1,[]},
>> {rabbit_shovel_worker,handle_cast,2,[]},
>> {gen_server2,handle_msg,2,[]},
>> {proc_lib,init_p_do_apply,3,
>> [{file,"proc_lib.erl"},{line,239}]}]}}}
>>
>> =INFO REPORT==== 6-Dec-2013::13:51:36 ===
>> stopped TCP Listener on 127.0.0.1:5672
>>
>> =INFO REPORT==== 6-Dec-2013::13:51:36 ===
>> Halting Erlang VM
>>
>>
>>
>>
>> From: Tim Watson <tim at rabbitmq.com>
>> Reply-To: Discussions about RabbitMQ <rabbitmq-discuss at lists.rabbitmq.com>
>> Date: Tuesday, December 10, 2013 1:49 AM
>> To: Discussions about RabbitMQ <rabbitmq-discuss at lists.rabbitmq.com>
>> Subject: Re: [rabbitmq-discuss] rabbitmqctl stop hangs
>>
>> There ought to be further information in the log files from the brokers in question during the stop operation. Can you post that, or put it somewhere accessible please? Why do you have both `rabbitmq-server stop' and `rabbitmqctl stop' running at the same time? Are those pointing to different rabbits?
>>
>> On 10 Dec 2013, at 01:32, Tyrrill, Ed wrote:
>>
>>> Hi All,
>>>
>>> We are using rabbitmq-server rpms on linux. Recently we upgraded from 3.1.1-1 to 3.2.0-1, and we are seeing intermittent hangs when stopping rabbitmq. Here is the ps output:
>>>
>>> root 31052 31051 0 Dec06 ? 00:00:00 /bin/sh /sbin/service rabbitmq-server stop
>>> root 31055 31052 0 Dec06 ? 00:00:00 /bin/sh /etc/init.d/rabbitmq-server stop
>>> root 31100 31055 0 Dec06 ? 00:00:00 /bin/sh /usr/sbin/rabbitmqctl stop /var/run/rabbitmq/pid
>>> root 31111 31100 0 Dec06 ? 00:00:00 su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmqctl "stop" "/var/run/rabbitmq/pid"
>>> rabbitmq 31112 31111 0 Dec06 ? 00:24:10 /usr/lib64/erlang/erts-5.10.3/bin/beam.smp -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.2.0/sbin/../ebin -noshell -noinput -hidden -sname rabbitmqctl31112 -boot start_clean -s rabbit_control_main -nodename rabbit at vm-ave29 -extra stop /var/run/rabbitmq/pid
>>>
>>> The CPU time column on the erlang process does slowly go up. I don't know if it plays a factor, but this broker has shovels defined to a remote broker, and the remote broker was down at the time of this stop.
>>>
>>
>> How long do these hangs take? The shovel workers will wait 10 seconds for both their inbound and outbound connections to close cleanly. If you examine the log files for both the source and destination (i.e., remote) brokers during the shutdown, there may be some useful indication of whether this is the cause of the problem or not.
>>
>>> Is this a known issue? We've been seeing this a couple times a week (over > 100 brokers), and I need to get a fix for this.
>>>
>>
>> We have fixed bugs with shutdown delays and deadlocks in the past, but they're mostly dusted and released now. We do have an open issue that can cause long delays during broker shutdown, which is mediated by having a lot of durable queues (regardless of whether they contain messages or not). Could that be what you're seeing? How many durable queues do these brokers have running on them?
>>
>> Tim
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131211/f3a826af/attachment.html>
More information about the rabbitmq-discuss
mailing list