[rabbitmq-discuss] Long timeout if server host becomes unreachable

Oleg Lyalikov oleg.lyalikov at gmail.com
Tue Oct 8 10:49:11 BST 2013


Thanks for the answer Michael,

Configuring SO_LINGER didn't help for socketWrite timeout. Also it seems it
affects only "close" operation and also there is a code in
SocketFrameHandler which already sets SO_LINGER for socket on close (1
second default):

    public void close() {
        try { _socket.setSoLinger(true, SOCKET_CLOSING_TIMEOUT); } catch
(Exception _) {}
        try { flush();                                           } catch
(Exception _) {}
        try { _socket.close();                                   } catch
(Exception _) {}
    }

But I even cannot close connection by myself in a separate thread - it's
blocked in SocketFrameHandler#writeFrame method:

Thread [Thread-0] (Suspended)
    owns: CommandAssembler  (id=42)
    owns: Object  (id=41)
    waiting for: DataOutputStream  (id=34)
    SocketFrameHandler.writeFrame(Frame) line: 137
    AMQConnection.writeFrame(Frame) line: 480
    AMQCommand.transmit(AMQChannel) line: 102
    AMQConnection$1(AMQChannel).quiescingTransmit(AMQCommand) line: 316
    AMQConnection$1(AMQChannel).quiescingTransmit(Method) line: 298
    AMQConnection$1(AMQChannel).quiescingRpc(Method,
AMQChannel$RpcContinuation) line: 233
    AMQConnection.close(int, String, boolean, Throwable, int, boolean)
line: 800
    AMQConnection.close(int, String, int) line: 724
    AMQConnection.close(int) line: 710
    Send$2.run() line: 118
    Thread.run() line: 724

I googled a bit again, here someone asks sun to provide socket write
timeout (1997 year) :
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4031100
but the answer is to use nio SocketChannels or workaround with different
thread for read and write.

Here someone asks ActiveMQ to provide such write timeout:
https://issues.apache.org/jira/browse/AMQ-1993
and the answer is to use "transport.soWriteTimeout" parameter which is
managed by ActiveMQ itself
http://activemq.apache.org/maven/apidocs/org/apache/activemq/transport/WriteTimeoutFilter.html

As for maximum packet age - the client connection has state "ESTABLISHED"
during all 15 minutes, not TIME_WAIT.

There is a setting in linux which can configure TCP retransmissions
timeouts : /proc/sys/net/ipv4/tcp_retries2. By default the value is 15 and
it's about 900 seconds:
http://stackoverflow.com/questions/5907527/application-control-of-tcp-retransmission-on-linux
It's not recommended to set the value < 100 seconds and we cannot actually
change it because there are lots of other applications on the machine and
we do not know consequences of such changes.

So for me it looks like the RabbitMQ client library should provide
possibility to set such write timeout - maybe using the same Heartbeat
thread (for now this thread is blocked on "writeFrame" method like all
other threads waiting freeing the lock on outputStream object).

By the way I still cannot imagine any workaround for this issue but it's
really critical for us. Do you think there are some?

Regards,
Oleg


2013/10/7 Michael Klishin [via RabbitMQ] <
ml-node+s1065348n30283h10 at n5.nabble.com>

> On oct 7, 2013, at 4:26 p.m., Oleg Lyalikov <[hidden email]<http://user/SendEmail.jtp?type=node&node=30283&i=0>>
> wrote:
>
> > "main" prio=10 tid=0xf7505800 nid=0x2c3e runnable [0xf7728000]
> >   java.lang.Thread.State: RUNNABLE
> >        at java.net.SocketOutputStream.socketWrite0(Native Method)
> >        at
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> >        at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> >        at
> > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> >        at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> >        - locked <0xb4532608> (a java.io.BufferedOutputStream)
> >        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> >        at
> >
> com.rabbitmq.client.impl.SocketFrameHandler.flush(SocketFrameHandler.java:142)
>
> >        at
> > com.rabbitmq.client.impl.AMQConnection.flush(AMQConnection.java:488)
> >        at
> com.rabbitmq.client.impl.AMQCommand.transmit(AMQCommand.java:125)
> >        at
> >
> com.rabbitmq.client.impl.AMQChannel.quiescingTransmit(AMQChannel.java:316)
> >        - locked <0xb4532ee8> (a java.lang.Object)
> >        at
> com.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:292)
> >        - locked <0xb4532ee8> (a java.lang.Object)
> >        at
> com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:634)
> >        at
> com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:617)
> >        at
> com.rabbitmq.client.impl.ChannelN.basicPublish(ChannelN.java:608)
> If SocketOutputStream#socketWrite0 takes minutes to timeout, it may be
> worth trying setting SO_LINGER on the socket (SO_TIMEOUT sounds like
> what you want but I believe it only covers read operations):
>
>
> http://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setSoLinger(boolean,%20int)
>
>
> To do so, for example, you can subclass default ConnectionFactory and
> override
> #configureSocket:
>
>
> http://hg.rabbitmq.com/rabbitmq-java-client/file/46578678645e/src/com/rabbitmq/client/ConnectionFactory.java#l476
>
> Note that SO_LINGER is not free of downsides:
>
>
> http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
>
> http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required
>
> You can try reducing maximum packet age for the server, this should reduce
> the amount of
> time spent in TIME_WAIT.
>
> MK
>
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> [hidden email] <http://user/SendEmail.jtp?type=node&node=30283&i=1>
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> *signature.asc* (506 bytes) Download Attachment<http://rabbitmq.1065348.n5.nabble.com/attachment/30283/0/signature.asc>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://rabbitmq.1065348.n5.nabble.com/Long-timeout-if-server-host-becomes-unreachable-tp30275p30283.html
>  To unsubscribe from Long timeout if server host becomes unreachable, click
> here<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=30275&code=b2xlZy5seWFsaWtvdkBnbWFpbC5jb218MzAyNzV8MzI1NzkxMjU5>
> .
> NAML<http://rabbitmq.1065348.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://rabbitmq.1065348.n5.nabble.com/Long-timeout-if-server-host-becomes-unreachable-tp30275p30320.html
Sent from the RabbitMQ mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131008/ed35c9df/attachment.htm>


More information about the rabbitmq-discuss mailing list