[rabbitmq-discuss] rabbitmqctl start_app hangs when replacing mirrored cluster instances in EC2
Mike Zraly
mzraly at gmail.com
Mon Jul 7 12:30:40 BST 2014
[I tried posting this to the new group, rabbitmq-users, but got no
response. Google groups tells me rabbitmq-users only has 101 members now,
compared to 1800 or so for rabbitmq-discuss, so I hope re-posting to the
larger group will at least elicit some (non-meta) feedback.]
Hi all,
I am setting up a RabbitMQ cluster in an Amazon EC2 region. Each host is
in the same geographical region, so I do not expect network partitions in
the sense that two members of the cluster are both running but cannot
communicate with each other. However it is reasonable to expect individual
cluster hosts to be terminated and replaced with new hosts having the same
hostname but a new IP address and a fresh install of RabbitMQ. A typical
use case for this is a rolling upgrade where we keep 2 of the 3 cluster
nodes up at all times to continue providing service during the upgrade
period.
What I hope is that the same post-install provisioning script that joins a
newly created instance into the cluster will work for the new instance that
is taking over for an older one. What I am seeing is rabbitmqctl start_app
hang.
The installation sequence is basically this:
install rabbitmq-server_3.3.1-1
enable management plugin
add health check user account with monitoring tag
add application user account
add HA policy '{"ha-mode": "all", "ha-sync-mode": "automatic"} for all
application queues
service rabbitmq-server stop
set /var/lib/rabbitmq/.erlang.cookie
reboot system (restarting rabbitmq server)
for each hostname 'target' that this host should join into a cluster with:
if target is listening on port 5672
rabbitmqctl stop_app
if rabbitmqctl join_cluster target has non-zero exit status
rabbitmqctl start_app
What I see if I start a cluster with hosts A, B, and C, then terminate
instance C and replace it with a new instance that executes these same
steps, is that rabbitmqctl join_cluster succeeds saying C is already part
of the cluster, then rabbitmqctl start_app hangs.
What am I doing wrong?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140707/be95dce0/attachment.html>
More information about the rabbitmq-discuss
mailing list