[rabbitmq-discuss] Autoheal torture test - Initial success, then a terminal state

Matt Pietrek mpietrek at skytap.com
Thu Mar 27 22:15:11 GMT 2014


Because we're sometimes just mean to our software, I wrote a torture test
to see how RabbitMQ's Autoheal deal with repeated partitions.

In a nutshell, we start with two brokers (3.2.4) in a cluster. I run my
test which uses "iptables" to knock out the link between the two brokers
and then restore things.

It does this break/fix continuously in a loop. The time between partitions,
and the time inside partitions is configurable.

Using 60 seconds between inducing a partition, and 60 seconds in a
partitioned state, I expect that this might be messy - The brokers try to
autoheal, and then everything falls apart. However, I'd expect that once I
stop my torture and return things back to "normal", that an autoheal will
eventually succeed and the brokers will be happily clustered again.

This isn't what happens. Instead, the two brokers essentially ignore each
other. Even after waiting for 10+ minutes. I can see each broker, but they
each think the other is missing.

Here's a filtered view of the logs, grepping for
"Autoheal|Starting|Stopping|Partitions|Winner|Loser":

rabbit at mq2.log: Autoheal request sent to rabbit at mq1

rabbit at mq2.log: Autoheal: I am the winner, waiting for [rabbit at mq1] to stop

rabbit at mq2.log: Autoheal: I am the winner, waiting additionally for
[rabbit at mq1] to stop

rabbit at mq1.log: Autoheal request sent to rabbit at mq1

rabbit at mq1.log: Autoheal request received from rabbit at mq1

rabbit at mq1.log: Autoheal decision

rabbit at mq1.log:  * Partitions: [[rabbit at mq1],[rabbit at mq2]]

rabbit at mq1.log:  * Winner:     rabbit at mq2

rabbit at mq1.log:  * Losers:     [rabbit at mq1]

rabbit at mq1.log: Autoheal request received from rabbit at mq2

rabbit at mq1.log: Autoheal decision

rabbit at mq1.log:  * Partitions: [[rabbit at mq1],[rabbit at mq2]]

rabbit at mq1.log:  * Winner:     rabbit at mq2

rabbit at mq1.log:  * Losers:     [rabbit at mq1]

rabbit at mq1.log: Autoheal: we were selected to restart; winner is rabbit at mq2

rabbit at mq1.log: Stopping RabbitMQ

rabbit at mq2.log: Autoheal: aborting - rabbit at mq1 went down

rabbit at mq2.log: Autoheal request sent to rabbit at mq1

rabbit at mq2.log: Autoheal: we were selected to restart; winner is rabbit at mq1

rabbit at mq2.log: Stopping RabbitMQ

rabbit at mq1.log: Autoheal: aborting - rabbit at mq2 went down

rabbit at mq1.log: Autoheal request sent to rabbit at mq1

rabbit at mq1.log: Autoheal request received from rabbit at mq2

rabbit at mq1.log: Autoheal decision

rabbit at mq1.log:  * Partitions: [[rabbit at mq1],[rabbit at mq2]]

rabbit at mq1.log:  * Winner:     rabbit at mq1

rabbit at mq1.log:  * Losers:     [rabbit at mq2]

rabbit at mq1.log: Autoheal request received from rabbit at mq1

rabbit at mq1.log: Autoheal decision

rabbit at mq1.log:  * Partitions: [[rabbit at mq1],[rabbit at mq2]]

rabbit at mq1.log:  * Winner:     rabbit at mq1

rabbit at mq1.log:  * Losers:     [rabbit at mq2]

rabbit at mq1.log: Autoheal: I am the winner, waiting for [rabbit at mq2] to stop

rabbit at mq1.log: Autoheal: I am the winner, waiting additionally for
[rabbit at mq2] to stop

rabbit at mq2.log: Autoheal: aborting - rabbit at mq1 went down

rabbit at mq2.log: Autoheal request sent to rabbit at mq1

rabbit at mq2.log: Autoheal: we were selected to restart; winner is rabbit at mq1

rabbit at mq1.log: Autoheal: aborting - rabbit at mq2 went down

rabbit at mq1.log: Autoheal request sent to rabbit at mq1

rabbit at mq1.log: Autoheal request received from rabbit at mq2

rabbit at mq1.log: Autoheal decision

rabbit at mq1.log:  * Partitions: [[rabbit at mq1],[rabbit at mq2]]

rabbit at mq1.log:  * Winner:     rabbit at mq1

rabbit at mq1.log:  * Losers:     [rabbit at mq2]

rabbit at mq1.log: Autoheal request received from rabbit at mq1

rabbit at mq1.log: Autoheal decision

rabbit at mq1.log:  * Partitions: [[rabbit at mq1],[rabbit at mq2]]

rabbit at mq1.log:  * Winner:     rabbit at mq1

rabbit at mq1.log:  * Losers:     [rabbit at mq2]

rabbit at mq1.log: Autoheal: I am the winner, waiting for [rabbit at mq2] to stop

rabbit at mq1.log: Autoheal: I am the winner, waiting additionally for
[rabbit at mq2] to stop

# And nothing else beyond this, even after waiting for 10+ minutes.

I don't ever see the "Stopping RabbitMQ" that I've seen in other Autoheal
circumstances.

I can send more complete logs, but wanted to see if this is a known issue
or expected behavior first.


Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140327/7cf00395/attachment.html>


More information about the rabbitmq-discuss mailing list