[rabbitmq-discuss] rabbitmq 3.1.4 upgrade lost cluster config
Morty
morty+rabbitmq at frakir.org
Wed Aug 14 20:17:17 BST 2013
On Wed, Aug 14, 2013 at 03:14:20PM +0100, Emile Joubert wrote:
> If you ran the same sequence of steps on different clusters that had
> identical configuration then you should get the same result. Either
> the clusters did not have the same configuration or the sequence of
> steps was different.
We use puppet. The configurations are templated and enforced to be
identical except for cluster node names, which are different between
clusters. If a config changes that is only read at server start, such
as /etc/rabbitmq/rabbitmq.config, then rabbitmq is automatically
restarted by puppet.
> Compare the logfiles from the nodes in the first cluster with the
> logfiles from the second cluster. The differences should indicate the
> cause. Pay close attention to the order of messages of the form
>
> rabbit on node 'name at host' up/down
The logs aren't helping me. In particular, the order of "up" and
"down" events is equivalent between the clusters up until the time of
the failure.
> Also compare the configurations using "rabbitmqctl environment" on both
> clusters and make sure they are the same.
They are indeed the same, except for file names (which incorporate the
nodenames) and cluster members (which again incorporate the
nodenames). If I compensate for the above with suitable perl -pi -e
s/hostname91/hostname11/ stuff, the configs are identical.
This looks to me like a code bug. A race condition or any number of
other classes of bug could explain why two identically-configured
clusters would exhibit different behavior when run through the same
sequence of operations.
- Morty
More information about the rabbitmq-discuss
mailing list