[rabbitmq-discuss] Nodes loosing contact with the cluster, using 2.5.1

Otto Bergström otto.bergstrom at byburt.com
Wed Aug 31 09:20:14 BST 2011


Below is the logs from one of the incidents when we lost one node in
the cluster. The cluster consists of four nodes (richmqcoll01-04)
where we lost richmqcoll01 in this case.  For some reason richmqcoll03
did not log in the relevant timeframe bur was working fine at the
time:

richmqcoll01:
accepted TCP connection on [::]:5672 from 10.226.186.111:53352

=INFO REPORT==== 28-Aug-2011::13:49:54 ===
starting TCP connection <0.31813.0> from 10.226.186.111:53352

=INFO REPORT==== 28-Aug-2011::14:56:23 ===
closing TCP connection <0.31813.0> from 10.226.186.111:53352

=INFO REPORT==== 28-Aug-2011::15:01:38 ===
accepted TCP connection on [::]:5672 from 10.226.186.111:44208

=INFO REPORT==== 28-Aug-2011::15:01:38 ===
starting TCP connection <0.25013.2> from 10.226.186.111:44208

=INFO REPORT==== 28-Aug-2011::15:07:24 ===
closing TCP connection <0.31378.0> from 10.50.45.170:44118

=INFO REPORT==== 28-Aug-2011::15:54:56 ===
Limiting to approx 924 file handles (829 sockets)

=INFO REPORT==== 28-Aug-2011::15:54:56 ===
Memory limit set to 6144MB.

=INFO REPORT==== 28-Aug-2011::15:54:56 ===
Management plugin upgraded statistics to fine.

=INFO REPORT==== 28-Aug-2011::15:54:56 ===
Statistics database started.

richmqcoll02:

=INFO REPORT==== 28-Aug-2011::15:07:24 ===
closing TCP connection <0.4958.0> from 10.50.45.170:38267

=ERROR REPORT==== 28-Aug-2011::15:22:48 ===
** Node rabbit at richmqcoll01 not responding **
** Removing (timedout) connection **

=INFO REPORT==== 28-Aug-2011::15:22:48 ===
node rabbit at richmqcoll01 lost 'rabbit'

=INFO REPORT==== 28-Aug-2011::15:22:56 ===
node rabbit at richmqcoll01 down

=INFO REPORT==== 28-Aug-2011::15:24:41 ===
closing TCP connection <0.24556.0> from 10.228.198.47:48365

=INFO REPORT==== 28-Aug-2011::15:26:17 ===
accepted TCP connection on [::]:5672 from 10.228.198.47:52734

=INFO REPORT==== 28-Aug-2011::15:26:17 ===
starting TCP connection <0.31303.0> from 10.228.198.47:52734

=INFO REPORT==== 28-Aug-2011::15:37:53 ===
closing TCP connection <0.31303.0> from 10.228.198.47:52734

=INFO REPORT==== 28-Aug-2011::15:40:27 ===
accepted TCP connection on [::]:5672 from 10.228.198.47:50746

=INFO REPORT==== 28-Aug-2011::15:40:27 ===
starting TCP connection <0.1085.1> from 10.228.198.47:50746

=INFO REPORT==== 28-Aug-2011::15:45:09 ===
closing TCP connection <0.1085.1> from 10.228.198.47:50746




richmqcoll04:

=INFO REPORT==== 28-Aug-2011::15:01:48 ===
starting TCP connection <0.26077.0> from 10.229.19.111:46298

=INFO REPORT==== 28-Aug-2011::15:07:24 ===
closing TCP connection <0.3174.0> from 10.50.45.170:53300

=ERROR REPORT==== 28-Aug-2011::15:22:53 ===
** Node rabbit at richmqcoll01 not responding **
** Removing (timedout) connection **

=INFO REPORT==== 28-Aug-2011::15:22:53 ===
node rabbit at richmqcoll01 lost 'rabbit'

=INFO REPORT==== 28-Aug-2011::15:23:00 ===
node rabbit at richmqcoll01 down

=INFO REPORT==== 28-Aug-2011::15:25:56 ===
closing TCP connection <0.26077.0> from 10.229.19.111:46298

=INFO REPORT==== 28-Aug-2011::15:26:24 ===
accepted TCP connection on [::]:5672 from 10.229.19.111:36640

=INFO REPORT==== 28-Aug-2011::15:26:24 ===
starting TCP connection <0.32399.0> from 10.229.19.111:36640

=INFO REPORT==== 28-Aug-2011::15:39:07 ===
closing TCP connection <0.32399.0> from 10.229.19.111:36640

=INFO REPORT==== 28-Aug-2011::15:40:35 ===
accepted TCP connection on [::]:5672 from 10.229.19.111:36442

=INFO REPORT==== 28-Aug-2011::15:40:35 ===
starting TCP connection <0.1863.1> from 10.229.19.111:36442

=INFO REPORT==== 28-Aug-2011::15:45:24 ===
closing TCP connection <0.1863.1> from 10.229.19.111:36442

=INFO REPORT==== 28-Aug-2011::15:55:12 ===
node rabbit at richmqcoll01 up

=INFO REPORT==== 28-Aug-2011::15:56:11 ===
accepted TCP connection on [::]:5672 from 10.229.19.111:59147

Otto

On Aug 30, 1:37 pm, Matthew Sackman <matt... at rabbitmq.com> wrote:
> On Mon, Aug 29, 2011 at 09:58:06PM -0700, Theo wrote:
> > We downgraded to 2.4.1 and haven't had any issues. With 2.5.1 the
> > cluster split would happen several times per day, but since
> > downgrading it hasn't happend once.
>
> This is odd. Do you happen to have the broker logs from when you were
> running on 2.5.1 and these splits happen (from all members of the
> cluster). Maybe send them to us off list if they're big. I'm curious to
> see what the entries are. We can think of some changes we made, but we
> certainly weren't expecting them to make clusters more fragile.
>
> Matthew
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-disc... at lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


More information about the rabbitmq-discuss mailing list