[rabbitmq-discuss] Long timeout if server host becomes unreachable
Oleg Lyalikov
oleg.lyalikov at gmail.com
Tue Oct 8 10:49:11 BST 2013
Thanks for the answer Michael,
Configuring SO_LINGER didn't help for socketWrite timeout. Also it seems it
affects only "close" operation and also there is a code in
SocketFrameHandler which already sets SO_LINGER for socket on close (1
second default):
public void close() {
try { _socket.setSoLinger(true, SOCKET_CLOSING_TIMEOUT); } catch
(Exception _) {}
try { flush(); } catch
(Exception _) {}
try { _socket.close(); } catch
(Exception _) {}
}
But I even cannot close connection by myself in a separate thread - it's
blocked in SocketFrameHandler#writeFrame method:
Thread [Thread-0] (Suspended)
owns: CommandAssembler (id=42)
owns: Object (id=41)
waiting for: DataOutputStream (id=34)
SocketFrameHandler.writeFrame(Frame) line: 137
AMQConnection.writeFrame(Frame) line: 480
AMQCommand.transmit(AMQChannel) line: 102
AMQConnection$1(AMQChannel).quiescingTransmit(AMQCommand) line: 316
AMQConnection$1(AMQChannel).quiescingTransmit(Method) line: 298
AMQConnection$1(AMQChannel).quiescingRpc(Method,
AMQChannel$RpcContinuation) line: 233
AMQConnection.close(int, String, boolean, Throwable, int, boolean)
line: 800
AMQConnection.close(int, String, int) line: 724
AMQConnection.close(int) line: 710
Send$2.run() line: 118
Thread.run() line: 724
I googled a bit again, here someone asks sun to provide socket write
timeout (1997 year) :
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4031100
but the answer is to use nio SocketChannels or workaround with different
thread for read and write.
Here someone asks ActiveMQ to provide such write timeout:
https://issues.apache.org/jira/browse/AMQ-1993
and the answer is to use "transport.soWriteTimeout" parameter which is
managed by ActiveMQ itself
http://activemq.apache.org/maven/apidocs/org/apache/activemq/transport/WriteTimeoutFilter.html
As for maximum packet age - the client connection has state "ESTABLISHED"
during all 15 minutes, not TIME_WAIT.
There is a setting in linux which can configure TCP retransmissions
timeouts : /proc/sys/net/ipv4/tcp_retries2. By default the value is 15 and
it's about 900 seconds:
http://stackoverflow.com/questions/5907527/application-control-of-tcp-retransmission-on-linux
It's not recommended to set the value < 100 seconds and we cannot actually
change it because there are lots of other applications on the machine and
we do not know consequences of such changes.
So for me it looks like the RabbitMQ client library should provide
possibility to set such write timeout - maybe using the same Heartbeat
thread (for now this thread is blocked on "writeFrame" method like all
other threads waiting freeing the lock on outputStream object).
By the way I still cannot imagine any workaround for this issue but it's
really critical for us. Do you think there are some?
Regards,
Oleg
2013/10/7 Michael Klishin [via RabbitMQ] <
ml-node+s1065348n30283h10 at n5.nabble.com>
> On oct 7, 2013, at 4:26 p.m., Oleg Lyalikov <[hidden email]<http://user/SendEmail.jtp?type=node&node=30283&i=0>>
> wrote:
>
> > "main" prio=10 tid=0xf7505800 nid=0x2c3e runnable [0xf7728000]
> > java.lang.Thread.State: RUNNABLE
> > at java.net.SocketOutputStream.socketWrite0(Native Method)
> > at
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> > at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> > at
> > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> > at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> > - locked <0xb4532608> (a java.io.BufferedOutputStream)
> > at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> > at
> >
> com.rabbitmq.client.impl.SocketFrameHandler.flush(SocketFrameHandler.java:142)
>
> > at
> > com.rabbitmq.client.impl.AMQConnection.flush(AMQConnection.java:488)
> > at
> com.rabbitmq.client.impl.AMQCommand.transmit(AMQCommand.java:125)
> > at
> >
> com.rabbitmq.client.impl.AMQChannel.quiescingTransmit(AMQChannel.java:316)
> > - locked <0xb4532ee8> (a java.lang.Object)
> > at
> com.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:292)
> > - locked <0xb4532ee8> (a java.lang.Object)
> > at
> com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:634)
> > at
> com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:617)
> > at
> com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:608)
> If SocketOutputStream#socketWrite0 takes minutes to timeout, it may be
> worth trying setting SO_LINGER on the socket (SO_TIMEOUT sounds like
> what you want but I believe it only covers read operations):
>
>
> http://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setSoLinger(boolean,%20int)
>
>
> To do so, for example, you can subclass default ConnectionFactory and
> override
> #configureSocket:
>
>
> http://hg.rabbitmq.com/rabbitmq-java-client/file/46578678645e/src/com/rabbitmq/client/ConnectionFactory.java#l476
>
> Note that SO_LINGER is not free of downsides:
>
>
> http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
>
> http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required
>
> You can try reducing maximum packet age for the server, this should reduce
> the amount of
> time spent in TIME_WAIT.
>
> MK
>
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> [hidden email] <http://user/SendEmail.jtp?type=node&node=30283&i=1>
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> *signature.asc* (506 bytes) Download Attachment<http://rabbitmq.1065348.n5.nabble.com/attachment/30283/0/signature.asc>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://rabbitmq.1065348.n5.nabble.com/Long-timeout-if-server-host-becomes-unreachable-tp30275p30283.html
> To unsubscribe from Long timeout if server host becomes unreachable, click
> here<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=30275&code=b2xlZy5seWFsaWtvdkBnbWFpbC5jb218MzAyNzV8MzI1NzkxMjU5>
> .
> NAML<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
--
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Long-timeout-if-server-host-becomes-unreachable-tp30275p30320.html
Sent from the RabbitMQ mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131008/ed35c9df/attachment.htm>
More information about the rabbitmq-discuss
mailing list