[rabbitmq-discuss] Bug: Java ConnectionFactory can hangs forever in network packet loss cases

Ильдар Нурисламов absorbb at gmail.com
Tue Feb 7 06:02:13 GMT 2012


Hello.

We have rabbitMQ 2.7.1 java clients remotely connected to the server.
We started experience short-term bad network scenarios and serious problem
occurred:
1. factory.setRequestedHeartbeat set to 30s
2. factory.setConnectionTimeout set to 30000ms
client properly closes connection after missing 30 seconds of heartbeats.
But sometimes it hangs completely when tries to open a new connection.

I tried to analyze java client code and what is result:

AMQConnection.java:286 :
          _frameHandler.setTimeout(HANDSHAKE_TIMEOUT); - socket.soTimeout
is set to 10s here
then it starts the MainLoop at line 294
and blocks till get a reply for a handshake at line 300:
     connStart =
                (AMQP.Connection.Start)
connStartBlocker.getReply().getMethod();

problem is that it's possible that it'll never get a reply. Because
MainLoop relies on heartbeats functional to handle such situation which is
not enabled yet. It happens only at line 368:
            setHeartbeat(heartbeat);
MainLoop endlessly runs at 492:
    Frame frame = _frameHandler.readFrame();
which returns null every 10s (this is how SocketTimeoutException handled in
Frame.readFrom..)
and handleSocketTimeout() do nothing because _heartbeat is not set yet.

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120207/2459d7d6/attachment.htm>


More information about the rabbitmq-discuss mailing list