[rabbitmq-discuss] Recurring partitioning problem on local network
Simon MacMullen
simon at rabbitmq.com
Wed Dec 11 10:19:08 GMT 2013
On 11/12/13 03:19, Bill Chmura wrote:
> One of our sets went down today
>
> Both nodes basically have this, just naming the other node:
>
> =INFO REPORT==== 10-Dec-2013::18:29:24 ===
> rabbit on node 'rabbit at NURWEB-QAWEB01' down
>
> =ERROR REPORT==== 10-Dec-2013::18:29:35 ===
> Mnesia('rabbit at NURWEB-QAWEB02'): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit at NURWEB-QAWEB01'}
>
> =INFO REPORT==== 10-Dec-2013::18:29:47 ===
> node 'rabbit at NURWEB-QAWEB01' down: connection_closed
>
> Not much more info with the patched base file... does this help at all?
Somewhat, yes. The interesting bit is the "connection_closed" part. This
means that the net_ticktime-based timeout is not happening - something
is closing the TCP connection between the two hosts. That would explain
why it comes back again immediately.
Do you have some sort of firewall or other network infrastructure that
could be forcible closing this connection?
> I tried searching and got a lot on connection closed abruptly... but it did not sound right.
No, that's a different thing: we log "connection closed abruptly" when
AMQP connections go away without going through the AMQP close handshake.
Cheers, Simon
--
Simon MacMullen
RabbitMQ, Pivotal
More information about the rabbitmq-discuss
mailing list