[rabbitmq-discuss] Active/Active: shutdown of one service brings down the cluster
Vadim Chekan
kot.begemot at gmail.com
Fri Feb 10 00:20:53 GMT 2012
Hi Simon,
I think we nailed down a problem. We had a channel leak in our application.
With ~50 connections we had >90 channels per connection and growing. This
definitely correlates to high CPU usage.
What I still do not understand either it triggered rabbit into unstable
state or it was something else. Maybe increasing latencies in message
handling triggered cluster members into flipping neighbor aliveness status
back and force? Just speculating here: could timeouts because of high load
cause network fragmentation, when every node temporally does not see
neighbors, becomes a master, than see a neighbor, freak out, etc?
I've attached logs from all 3 cluster members. They are polluted with load
balancer "ping".
Vadim.
On Thu, Feb 9, 2012 at 4:21 AM, Simon MacMullen <simon at rabbitmq.com> wrote:
> Hi. Thanks for the error report. There seem to be a lot of strange things
> happening here - can you provide more complete logs from such an incident?
>
> Cheers, Simon
>
>
> On 09/02/12 02:42, Vadim Chekan wrote:
>
>> Hi all,
>>
>> Given: 3 servers in active/active configuration, rabbit: 2.7.1, erlang
>> R14B03, CentOS, 64bits.
>> We experienced at least 2 occasions of the following situation: we
>> observe abnormal high CPU utilization on one of rabbit servers (40% when
>> <10% is a norm) without any obvious reason. We did nice restart rabbit
>> service and the whole cluster went down (restarted).
>> Another effect is that cluster seems to enters some (inconsistent?)
>> state and queues can not be registered/deleted, list of queues can not
>> be viewed through management UI, etc.
>>
>> Here is a log which contains errors when cluster in broken state:
>> http://pastebin.com/6rweU3MD
>>
>> Questions:
>> Are those errors critical?
>> If we experience a high CPU situation again, what can we do, any
>> additional logging, profiling, process snapshots, etc?
>>
>> Vadim.
>>
>> --
>> From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT
>> is explicitly specified
>>
>>
>>
>> ______________________________**_________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.**rabbitmq.com<rabbitmq-discuss at lists.rabbitmq.com>
>> https://lists.rabbitmq.com/**cgi-bin/mailman/listinfo/**rabbitmq-discuss<https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
>>
>
>
> --
> Simon MacMullen
> RabbitMQ, VMware
> ______________________________**_________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.**rabbitmq.com<rabbitmq-discuss at lists.rabbitmq.com>
> https://lists.rabbitmq.com/**cgi-bin/mailman/listinfo/**rabbitmq-discuss<https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss>
>
--
More information about the rabbitmq-discuss
mailing list