[rabbitmq-discuss] rabbitmqctl start_app hangs when replacing mirrored cluster instances in EC2

Jason McIntosh mcintoshj at gmail.com
Thu Jul 10 15:39:55 BST 2014

If this is what I've seen before the cluster thinks C is node XYZ.  But
you're trying to tell the cluster that C is really your new host YZX.  You
need to remove the old node from the cluster to add your new node as a
replacement for C.  Your new node tries to start up and thinks it should be
part of the cluster because you just tried to join it, but the cluster
refuses to accept the new node so it seems to hang.  I could be completely
wrong on this though.

As I recall, there was a rabbitmqctl command to completely remove a node
from the cluster, though I don't recall what it is off hand.  You could try
doing that first and then adding your node?


On Mon, Jul 7, 2014 at 6:30 AM, Mike Zraly <mzraly at gmail.com> wrote:

> [I tried posting this to the new group, rabbitmq-users, but got no
> response.  Google groups tells me rabbitmq-users only has 101 members now,
> compared to 1800 or so for rabbitmq-discuss, so I hope re-posting to the
> larger group will at least elicit some (non-meta) feedback.]
> Hi all,
> I am setting up a RabbitMQ cluster in an Amazon EC2 region.  Each host is
> in the same geographical region, so I do not expect network partitions in
> the sense that two members of the cluster are both running but cannot
> communicate with each other.  However it is reasonable to expect individual
> cluster hosts to be terminated and replaced with new hosts having the same
> hostname but a new IP address and a fresh install of RabbitMQ.  A typical
> use case for this is a rolling upgrade where we keep 2 of the 3 cluster
> nodes up at all times to continue providing service during the upgrade
> period.
> What I hope is that the same post-install provisioning script that joins a
> newly created instance into the cluster will work for the new instance that
> is taking over for an older one.  What I am seeing is rabbitmqctl start_app
> hang.
> The installation sequence is basically this:
> install rabbitmq-server_3.3.1-1
> enable management plugin
> add health check user account with monitoring tag
> add application user account
> add HA policy '{"ha-mode": "all", "ha-sync-mode": "automatic"} for all
> application queues
> service rabbitmq-server stop
> set /var/lib/rabbitmq/.erlang.cookie
> reboot system (restarting rabbitmq server)
> for each hostname 'target' that this host should join into a cluster with:
>     if target is listening on port 5672
>         rabbitmqctl stop_app
>         if rabbitmqctl join_cluster target has non-zero exit status
>             rabbitmqctl start_app
> What I see if I start a cluster with hosts A, B, and C, then terminate
> instance C and replace it with a new instance that executes these same
> steps, is that rabbitmqctl join_cluster succeeds saying C is already part
> of the cluster, then rabbitmqctl start_app hangs.
> What am I doing wrong?
> _______________________________________________
> rabbitmq-discuss mailing list has moved to
> https://groups.google.com/forum/#!forum/rabbitmq-users,
> please subscribe to the new list!
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Jason McIntosh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140710/580803d4/attachment.html>

More information about the rabbitmq-discuss mailing list