[rabbitmq-discuss] Bug: Java ConnectionFactory can hangs forever in network packet loss cases

Steve Powell steve at rabbitmq.com
Thu Feb 16 13:27:15 GMT 2012


Dear Ildar,

First, may I apologise for not getting back to you sooner. It seems that you
have clearly identified a bug, and have helped to narrow it down for us.

Thank you very much. I have raised a problem for us to fix and track this
(24747).

I have a few comments regarding your settings: it seems to me that a heartbeat
of 30s is not unreasonable, but you should be aware that anything up to a minute
may pass before noticing that a heartbeat is missed, so you must not rely on
this interval.

The ConnectionTimeout will only affect waiting for the socket connection so is
not involved in this. I think your interval here is again quite large, but not
unreasonable in unreliable networks. I would expect the herartbeat to be about
half of this (see note above).

We'll get on to this bug asap.
Steve Powell
steve at rabbitmq.com
[wrk: +44-2380-111-528] [mob: +44-7815-838-558]

On 13 Feb 2012, at 09:28, Ильдар Нурисламов wrote:

> Can anybody help with this problem or prove that i'm wrong?
> 
> 2012/2/7 Ильдар Нурисламов <absorbb at gmail.com>
> Hello.
> 
> We have rabbitMQ 2.7.1 java clients remotely connected to the server.
> We started experience short-term bad network scenarios and serious problem occurred:
> 1. factory.setRequestedHeartbeat set to 30s
> 2. factory.setConnectionTimeout set to 30000ms
> client properly closes connection after missing 30 seconds of heartbeats.
> But sometimes it hangs completely when tries to open a new connection.
> 
> I tried to analyze java client code and what is result:
> 
> AMQConnection.java:286 :
>           _frameHandler.setTimeout(HANDSHAKE_TIMEOUT); - socket.soTimeout is set to 10s here
> then it starts the MainLoop at line 294
> and blocks till get a reply for a handshake at line 300:
>      connStart =
>                 (AMQP.Connection.Start) connStartBlocker.getReply().getMethod();
> 
> problem is that it's possible that it'll never get a reply. Because MainLoop relies on heartbeats functional to handle such situation which is not enabled yet. It happens only at line 368:
>             setHeartbeat(heartbeat);
> MainLoop endlessly runs at 492:              
>     Frame frame = _frameHandler.readFrame();
> which returns null every 10s (this is how SocketTimeoutException handled in Frame.readFrom..)
> and handleSocketTimeout() do nothing because _heartbeat is not set yet.
> 
> Thanks.
> 
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



More information about the rabbitmq-discuss mailing list