<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Ed,<div><br></div><div>If at all possible, would you please send as much of the logging information as possible? You can contact me offline about emailing/uploading it privately if you prefer, or simply scrub any private data from it before hand. I'm going to need to see more of the stack traces in the logs to diagnose this issue properly.</div><div><br></div><div>Thanks,</div><div>Tim</div><div><br><div><div>On 11 Dec 2013, at 09:18, Tim Watson wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div bgcolor="#FFFFFF"><div>Thanks for the analysis Ed, this sounds like a bug - ill take a walk through the code this morning and confirm that. If the shovel is "stuck" trying to establish a connection, that might well be an explanation. It's surprising that the supervision tree (for shovel workers) doesn't handle this scenario, but it may need to be reconfigured to handle delayed startup or the workers may need to be re-worked to handle startup differently.</div><div><br></div><div>I'll post back when I've validated those issues and let you know what my findings are.</div><div><br></div><div>Cheers,</div><div>Tim</div><div><br></div><div>On 10 Dec 2013, at 23:40, "Tyrrill, Ed" <<a href="mailto:ed.tyrrill@emc.com">ed.tyrrill@emc.com</a>> wrote:<br><br></div><div></div><blockquote type="cite"><div>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252"><div>Hi Tim,</div><div><br></div><div>I realized I did not address all of your questions.</div><div><br></div><div>> How long do these hangs take? The shovel workers will wait 10 seconds for both their inbound and outbound connections to close cleanly.</div><div><br></div><div>I have one I server I left untouched that hung four days ago, and is still hung</div><div><br></div><div>> If you examine the log files for both the source and destination (i.e., remote) brokers during the shutdown, there may be some useful indication of whether this is the cause of the problem or not.</div><div><br></div><div><span class="Apple-style-span">The destination of the shovel was down at the time so only the source side is relevant. I noticed that the shutdown_log file has the time 13:46, with the message "Stopping and halting node 'rabbit@vm-ave29' …", and shutdown_err is empty. The "Halting Erlang VM" message in rabbit@</span>vm-ave29<span class="Apple-style-span">.log is then coming five minutes later at 13:51. The messages in rabbit@</span>vm-ave29<span class="Apple-style-span">-sasl.log all seem related to shovel connect attempts. </span>Also note that the RabbitMQ erlang process is still running.</div><div><br></div><div>> We have fixed bugs with shutdown delays and deadlocks in the past, but they're mostly dusted and released now. We do have an open issue that can cause long delays during broker shutdown, which is mediated by having a lot of durable queues (regardless of whether they contain messages or not). Could that be what you're seeing? How many durable queues do these brokers have running on them?</div><div><br></div><div>Five durable queues.</div><div><br></div><div>Thanks,</div><div>Ed</div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span> <Tyrrill>, Edward Tyrrill <<a href="mailto:ed.tyrrill@emc.com">ed.tyrrill@emc.com</a>><br><span style="font-weight:bold">Reply-To: </span> Discussions about RabbitMQ <<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a>><br><span style="font-weight:bold">Date: </span> Tuesday, December 10, 2013 10:48 AM<br><span style="font-weight:bold">To: </span> Discussions about RabbitMQ <<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a>><br><span style="font-weight:bold">Subject: </span> Re: [rabbitmq-discuss] rabbitmqctl stop hangs<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; "><div>Hi Tim,</div><div><br></div><div>Thanks for your quick response. From the end of the log it looks like the shovel plugin was trying to establish the connection to the remote broker, which was down at the time, at the same time stop was run:</div><div><br></div><div><div>=ERROR REPORT==== 6-Dec-2013::13:51:36 ===</div><div>** Generic server <0.8762.19> terminating </div><div>** Last message in was {'EXIT',<0.8759.19>,</div><div> {{badmatch,{error,etimedout}},</div><div> [{rabbit_shovel_worker,make_conn_and_chan,1,[]},</div><div> {rabbit_shovel_worker,handle_cast,2,[]},</div><div> {gen_server2,handle_msg,2,[]},</div><div> {proc_lib,init_p_do_apply,3,</div><div> [{file,"proc_lib.erl"},{line,239}]}]}}</div><div>** When Server state == {state,amqp_direct_connection,</div><div> {state,'rabbit@vm-ave29',</div><div> {user,<<"guest">>,</div><div> [administrator],</div><div> rabbit_auth_backend_internal,</div><div> {internal_user,<<"guest">>,</div><div> <<193,148,73,243,245,222,154,143,19,215,47,234,93,</div><div> 175,56,125,17,151,61,97>>,</div><div> [administrator]}},</div><div> <<"/">>,</div><div> {amqp_params_direct,<<"guest">>,none,<<"/">>,</div><div> 'rabbit@vm-ave29',none,[]},</div><div> {amqp_adapter_info,unknown,unknown,unknown,unknown,</div><div> <<"<<a href="mailto:'rabbit@vm-ave29'.3.8762.19">'rabbit@vm-ave29'.3.8762.19</a>>">>,</div><div> {'Direct',{0,9,1}},</div><div> []},</div><div> <0.8765.19>,undefined},</div><div> <0.8764.19>,</div><div> {amqp_params_direct,<<"guest">>,none,<<"/">>,</div><div> 'rabbit@vm-ave29',none,[]},</div><div> 0,</div><div> [{<<"capabilities">>,table,</div><div> [{<<"publisher_confirms">>,bool,true},</div><div> {<<"exchange_exchange_bindings">>,bool,true},</div><div> {<<"basic.nack">>,bool,true},</div><div> {<<"consumer_cancel_notify">>,bool,true},</div><div> {<<"connection.blocked">>,bool,true},</div><div> {<<"consumer_priorities">>,bool,true},</div><div> {<<"authentication_failure_close">>,bool,true}]},</div><div> {<<"copyright">>,longstr,</div><div> <<"Copyright (C) 2007-2013 GoPivotal, Inc.">>},</div><div> {<<"information">>,longstr,</div><div> <<"Licensed under the MPL. See <a href="http://www.rabbitmq.com/%22%3E%3E">http://www.rabbitmq.com/">></a>},</div><div> {<<"platform">>,longstr,<<"Erlang/OTP">>},</div><div> {<<"product">>,longstr,<<"RabbitMQ">>},</div><div> {<<"version">>,longstr,<<"3.2.0">>}],</div><div> none,false}</div><div>** Reason for termination == </div><div>** {unexpected_msg,</div><div> {'EXIT',<0.8759.19>,</div><div> {{badmatch,{error,etimedout}},</div><div> [{rabbit_shovel_worker,make_conn_and_chan,1,[]},</div><div> {rabbit_shovel_worker,handle_cast,2,[]},</div><div> {gen_server2,handle_msg,2,[]},</div><div> {proc_lib,init_p_do_apply,3,</div><div> [{file,"proc_lib.erl"},{line,239}]}]}}}</div><div><br></div><div>=INFO REPORT==== 6-Dec-2013::13:51:36 ===</div><div>stopped TCP Listener on 127.0.0.1:5672</div><div><br></div><div>=INFO REPORT==== 6-Dec-2013::13:51:36 ===</div><div>Halting Erlang VM</div></div><div><br></div><div><br></div><div><br></div><div><br></div><span id="OLK_SRC_BODY_SECTION"><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt"><span style="font-weight:bold">From: </span>Tim Watson <<a href="mailto:tim@rabbitmq.com">tim@rabbitmq.com</a>><br><span style="font-weight:bold">Reply-To: </span>Discussions about RabbitMQ <<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a>><br><span style="font-weight:bold">Date: </span>Tuesday, December 10, 2013 1:49 AM<br><span style="font-weight:bold">To: </span>Discussions about RabbitMQ <<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a>><br><span style="font-weight:bold">Subject: </span>Re: [rabbitmq-discuss] rabbitmqctl stop hangs<br></div><div><br></div><div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">
There ought to be further information in the log files from the brokers in question during the stop operation. Can you post that, or put it somewhere accessible please? Why do you have both `rabbitmq-server stop' and `rabbitmqctl stop' running at the same time?
Are those pointing to different rabbits?
<div><div><br><div><div>On 10 Dec 2013, at 01:32, Tyrrill, Ed wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; "><div style="font-family: Calibri, sans-serif; ">Hi All,</div><div style="font-family: Calibri, sans-serif; "><br></div><div style="font-family: Calibri, sans-serif; ">We are using rabbitmq-server rpms on linux. Recently we upgraded from 3.1.1-1 to 3.2.0-1, and we are seeing intermittent hangs when stopping rabbitmq. Here is the ps output:</div><div style="font-family: Calibri, sans-serif; "><br></div><div><div><div><font class="Apple-style-span" face="Courier">root 31052 31051 0 Dec06 ? 00:00:00 /bin/sh /sbin/service rabbitmq-server stop</font></div><div><font class="Apple-style-span" face="Courier">root 31055 31052 0 Dec06 ? 00:00:00 /bin/sh /etc/init.d/rabbitmq-server stop</font></div><div><font class="Apple-style-span" face="Courier">root 31100 31055 0 Dec06 ? 00:00:00 /bin/sh /usr/sbin/rabbitmqctl stop /var/run/rabbitmq/pid</font></div><div><font class="Apple-style-span" face="Courier">root 31111 31100 0 Dec06 ? 00:00:00 su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmqctl "stop" "/var/run/rabbitmq/pid"</font></div><div><font class="Apple-style-span" face="Courier">rabbitmq 31112 31111 0 Dec06 ? 00:24:10 /usr/lib64/erlang/erts-5.10.3/bin/beam.smp -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr</font><span class="Apple-style-span" style="font-family: Courier; ">/lib/rabbitmq/lib/rabbitmq_server-3.2.0/sbin/../ebin
-noshell -noinput -hidden -sname rabbitmqctl31112 -boot start_clean -s rabbit_control_main -nodename rabbit@vm-ave29 </span><span class="Apple-style-span" style="font-family: Courier; ">-extra stop /var/run/rabbitmq/pid</span></div></div></div><div style="font-family: Calibri, sans-serif; "><br></div><div style="font-family: Calibri, sans-serif; ">The CPU time column on the erlang process does slowly go up. I don't know if it plays a factor, but this broker has shovels defined to a remote broker, and the remote broker was down at the time of this stop.</div><div style="font-family: Calibri, sans-serif; "><br></div></div></blockquote><div><br></div><div>How long do these hangs take? The shovel workers will wait 10 seconds for both their inbound and outbound connections to close cleanly. If you examine the log files for both the source and destination (i.e., remote) brokers during the shutdown, there may
be some useful indication of whether this is the cause of the problem or not.</div><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; "><div style="font-family: Calibri, sans-serif; ">Is this a known issue? We've been seeing this a couple times a week (over > 100 brokers), and I need to get a fix for this.</div><div style="font-family: Calibri, sans-serif; "><br></div></div></blockquote><div><br></div><div>We have fixed bugs with shutdown delays and deadlocks in the past, but they're mostly dusted and released now. We do have an open issue that can cause long delays during broker shutdown, which is mediated by having a lot of durable queues (regardless of
whether they contain messages or not). Could that be what you're seeing? How many durable queues do these brokers have running on them?</div><div><br></div><div>Tim</div></div></div></div></div></div></span></div></div></span>
</div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>rabbitmq-discuss mailing list</span><br><span><a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a></span><br><span><a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss">https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a></span><br></div></blockquote></div>_______________________________________________<br>rabbitmq-discuss mailing list<br><a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br>https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss<br></blockquote></div><br></div></body></html>