[rabbitmq-discuss] Active/Active: shutdown of one service brings down the cluster
simon at rabbitmq.com
Mon Feb 13 18:09:44 GMT 2012
On 10/02/12 00:20, Vadim Chekan wrote:
> I think we nailed down a problem. We had a channel leak in our
> application. With ~50 connections we had >90 channels per connection and
> growing. This definitely correlates to high CPU usage.
> What I still do not understand either it triggered rabbit into unstable
> state or it was something else. Maybe increasing latencies in message
> handling triggered cluster members into flipping neighbor aliveness
> status back and force? Just speculating here: could timeouts because of
> high load cause network fragmentation, when every node temporally does
> not see neighbors, becomes a master, than see a neighbor, freak out, etc?
That's plausible, but I don't think that's what's happening (there's
nothing about network partitioning in the logs).
> I've attached logs from all 3 cluster members. They are polluted with
> load balancer "ping".
Thanks. I've had a poke at this but nothing is leaping out at me yet.
I'll keep at it though.
One thing that's a bit odd: you seem to be creating HA / transient /
autodelete / exclusive queues. So although they're "HA", they will
vanish if any of the following happens:
* The entire cluster goes down (transient) or
* All consumers for a queue cancel (autodelete) or
* The connection that created them closes (exclusive)
Is this intentional? It seems like an odd use of HA.
More information about the rabbitmq-discuss