[rabbitmq-discuss] RabbitMQ server closing connection every few minutes
David Tinker
david.tinker at gmail.com
Tue May 22 11:28:14 BST 2012
Thanks a stack for the quick responses. I did find mentions of
heartbeat when Googling this problem but was under the impression that
we were not using it. It turns out we were and disabling it sorted out
our issues.
Thanks again
David
On Tue, May 22, 2012 at 10:25 AM, Tim Watson <tim at rabbitmq.com> wrote:
> On 22/05/2012 09:00, Tim Watson wrote:
>>
>> On 22/05/2012 08:48, Tim Watson wrote:
>>>
>>> On 22/05/2012 07:34, David Tinker wrote:
>>>>
>>>> Hi
>>>>
>>>> We have a Java application using RabbitMQ that keeps getting its
>>>> connection closed by the server every few minutes. On the client side
>>>> we see this:
>>>>
>>>> 2012-05-22 05:30:05,337 [AMQP Connection 127.0.0.1:5671] ERROR
>>>> pork.RabbitService - Connection to Rabbit Exchange shutting down:
>>>> com.rabbitmq.client.ShutdownSignalException: connection error; reason:
>>>> java.net.SocketException: Connection reset
>>>> Reason: java.net.SocketException: Connection reset
>>>>
>>>> On the server side we get this:
>>>>
>>>> =INFO REPORT==== 22-May-2012::05:25:20 ===
>>>> accepting AMQP connection<0.32660.612> (127.0.0.1:45458 ->
>>>> 127.0.0.1:5671)
>>>>
>>>> =ERROR REPORT==== 22-May-2012::05:30:05 ===
>>>> closing AMQP connection<0.32660.612> (127.0.0.1:45458 ->
>>>> 127.0.0.1:5671):
>>>> {timeout,running}
>>>>
>>>
>>> Hi David. The second error message will appear in the logs when there
>>> has been an un-handled exception in rabbit's 'reader process'. The
>>> '{timeout, running}' state occurs when receiving data from the network
>>> takes too long (which is where the 'timeout' tag originates) but the
>>> AMQP connection state is 'running' - the second tag. When this condition
>>> arises, the broker's reader process shuts down the connection.
>>>
>>> So at first glance, it does look like a networking problem to me,
>>> although TBH I'm not sure about the nature of it just yet. I suspect it
>>> could have something to do with "... It can sometimes be several minutes
>>> before a message is acked or nacked ...".
>>>
>>>> It happens more frequently when the system is busy (500-1000
>>>> messages/s) than when it is cruising (50 messages/s). The application
>>>> and RabbitMQ are on the same box so I don't think its a
>>>> networking/firewall problem.
>>>>
>>>> The application only uses one connection and one channel. Messages are
>>>> received using Connection.basicConsume(..) and acked or nacked
>>>> sometime later on different threads. It can sometimes be several
>>>> minutes before a message is acked or nacked. We are using
>>>> Connection.basicQos(10000) to limit the number of outstanding messages
>>>> but its not often that reaches 10000 and the error happens anytime.
>>>> There is defensive code to make sure messages are not acked/nacked
>>>> more than once.
>>>>
>>>> Anyone have any ideas? Is there any way to get more information on the
>>>> server or client side regarding the error?
>>>>
>>>
>>> Well the server error seems pretty clear and just to be 100% sure, I can
>>> verify that the error message you're seeing is only generated in one
>>> place in the whole code base:
>>>
>>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L241.
>>>
>>>
>>>
>>
>> Except that this isn't quite as clear as I suggested initially. The
>> networking layer isn't going to send a timeout on a blocking receive
>> like this - the 'timeout' message is being sent to the reader from some
>> other process. I will investigate further! :)
>>
>
> Ok so I think I've identified the source of this behaviour. The 'timeout'
> message is sent to the reader process by the AMQP heartbeat - a separate out
> of band process that is monitoring for 'stale' connections using heartbeat
> frames (see http://www.rabbitmq.com/amqp-0-9-1-reference.html for details).
>
> What I believe is happening is that as the server comes under load, the
> heartbeat for this connection is de-prioritised at the network layer and
> times out, causing the connection to be closed.
>
> Set your heartbeat to 0 (turning it off) for now and see if it fixes the
> problem? That's a short term investigative step though, obviously.
>
> I'll file a bug for improving the log messages when timeout occurs.
>
>
>>> The code handling the timeout condition is
>>>
>>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L321,
>>>
>>> the call site for which is
>>>
>>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L284.
>>>
>>> The origin of this is
>>>
>>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_net.erl#L108
>>>
>>> in rabbit_net - what the code does is make the calling process wait for
>>> a message from the Erlang networking layer. So it is the underlying
>>> TCP/IP layer in Erlang which is noticing a timeout on the connection.
>>>
>>> How this is mediated through the client's behaviour is another matter.
>>>
>>> Hope this helps us get started figuring out what's wrong!
>>>
>>> Cheers,
>>> Tim
>>>
>>>> RabbitMQ 2.8.1 on Gentoo Linux 64bit
>>>> Erlang R15B (erts-5.9) [source] [64-bit]
>>>> amqp-client-2.8.2 (also tried amqp-client-2.7.1)
>>>> java version "1.6.0_31"
>>>> Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
>>>>
>>>> Thanks
>>>> David
>>>> _______________________________________________
>>>> rabbitmq-discuss mailing list
>>>> rabbitmq-discuss at lists.rabbitmq.com
>>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>
>>>
>>> _______________________________________________
>>> rabbitmq-discuss mailing list
>>> rabbitmq-discuss at lists.rabbitmq.com
>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
--
http://twoogleplus.com/ Online service to cross-post tweets to Google+
http://engage.calibreapps.com/ Engage - Web based goal setting and
performance management
More information about the rabbitmq-discuss
mailing list