[rabbitmq-discuss] Queues disappeared after a cluster upgrade to 3.1.5, Erlang R16B01

Jason McIntosh mcintoshj at gmail.com
Tue Aug 27 15:11:49 BST 2013


I upgraded from 3.0.4 to 3.1.5 and from esl-erlang R15B03 to esl-erlang
R16B01.  I've been doing some digging and can't find any reason why this
would happen, no additional log files, nothing.  Note there are three nodes
involved in a cluster.  I upgraded my "dvlp" sample box which wasn't
clustered with no problem using the exact same script I used to upgrade my
alpha cluster.  Below is the script I'm using (might be useful for others
upgrading a cluster).  I'm going to try recreating my
"Alpha" environment and redoing the upgrade.  The ONLY thing that I can
think of - I was doing stop/starts as part of training of the cluster about
20 minutes before I did the upgrade to test various concepts, e.g. message
loss, master node recovery, etc. etc.  At this point, if people haven't
seen or heard of this, I'll chalk it up to something funky disk or
otherwise until I can try and replicate it.  My biggest concern at this
point is that because i no longer have the backup's of the mnesia database,
replicating the environment won't replicate exactly what was in the
database, and so I won't be able to replicate it.  When I start deploying
to our production environment, I won't make that mistake - I'll shutdown
rabbit, back up my whole rabbit_data_directory first :)
Thanks!
Jason

# ########################################
# 04/22/2013 - JasonMcIntosh - core upgrade script to go through each
reported node in a cluster and upgrade it
# ########################################
if [ ! -e /data/rabbitmq ]; then
echo "No rabbit found on $1" > /var/log/rabbit_upgrade.log
exit 0
fi

export LOG_FILE=/var/log/rabbitmq/upgrade.log
rm -f $LOG_FILE

export CLUSTER_STATUS="`rabbitmqctl cluster_status`"
export CLUSTER_STATUS="`echo $CLUSTER_STATUS|tr -d ' \n'`"
echo "Started at `date`" >> $LOG_FILE
echo "$CLUSTER_STATUS" >> $LOG_FILE
echo "" >> $LOG_FILE

getServerFQDN() {
SERVER_FQDN=`echo $1 | awk -F@ '{print $2};'`
FQDN=`nslookup $SERVER_FQDN|grep Name|awk -F\: '{print $2}'|sed 's/ //g'`
echo $FQDN
}

upgradeRabbitNode() {
SERVER_FQDN=`getServerFQDN $1`
echo "Doing upgrade of $SERVER_FQDN" >> $LOG_FILE
# The upgrade deploy job actually stops the rabbit server, shouldn't need
this, but we'll do it anways
  # Not exact commands below as I'm using bladelogic internal commands to
do these, but the idea should be the same.
remote_exec ${SERVER_FQDN} service rabbitmq-server stop  >> $LOG_FILE
remote_exec $SERVER_FQDN yum -y erase rabbit* erlang* >> $LOG_FILE
        remote_exec  $SERVER_FQDN yum -y install rabbitmq-server-3.1.5...
>> $LOG_FILE
        remote_exec $SERVER_FQDN rabbitmq-plugins enable
rabbitmq_management rabbitmq_management_agent
rabbitmq_management_visualiser rabbitmq_shovel rabbitmq_shovel_management>>
$LOG_FILE
        remote_exec  $SERVER_FQDN chkconfig rabbitmq-server on
}


#Get the cluster nodes and pick the first disk node as the "upgrader" node
export UPGRADER_NODE=`echo "$CLUSTER_STATUS"|awk -F\[ '{print $4}'|awk -F\]
'{print $1}'|awk -F, '{print $1}'`
export UPGRADER_FQDN=`getServerFQDN $UPGRADER_NODE`
export NODE_LIST="`echo $CLUSTER_STATUS|awk -F\[
'{sub(/.*running_nodes/,\"\")};1'|awk -F\[ '{print $2}'|awk -F\] '{print
$1}'|sed -e 's/,/ /g'`"

echo "Node list: $NODE_LIST " >> $LOG_FILE
echo "Disk node for last upgrade $UPGRADER_NODE">> $LOG_FILE

if [ "$UPGRADER_NODE" = "" ]; then
echo " ** No upgrader node found! EXITING" >> $LOG_FILE
exit -1
fi


#Shutdown and upgrade all other nodes than the upgrader node
echo "Doing upgrade of all non upgrade nodes..." >> $LOG_FILE
for clusterNode in ${NODE_LIST}; do
if [ $clusterNode != $UPGRADER_NODE ]; then
upgradeRabbitNode $clusterNode
fi
done

#Upgrade the upgrader node now.
echo "Upgrade the core upgrade node ..." >> $LOG_FILE
upgradeRabbitNode $UPGRADER_NODE

#NOW start all nodes, starting with the upgrader node.
echo "Starting rabbit on upgrader node..." >> $LOG_FILE
remote_exec $UPGRADER_FQDN service rabbitmq-server start >> $LOG_FILE
for clusterNode in ${NODE_LIST}; do
if [ $clusterNode != $UPGRADER_NODE ]; then
SERVER_FQDN=`getServerFQDN $clusterNode`
echo "Starting rabbit on NON upgrade nodes..." >> $LOG_FILE
remote_exec $SERVER_FQDN service rabbitmq-server start >> $LOG_FILE
fi
done

#Finally, make sure our HA Policy is applied to all our virtual hosts

echo "Finished with upgrade..." >> $LOG_FILE
#Report how we worked out...
echo "
RESULTS
"
cat $LOG_FILE



On Tue, Aug 27, 2013 at 4:58 AM, Emile Joubert <emile at rabbitmq.com> wrote:

> On 23/08/13 23:01, Jason McIntosh wrote:
> > A ps auf shows /usr/lib/erlang/erts-5.9.3.1/bin/epmd -daemon as still
> > running.  SO I'm wondering if that might have an impact.
>
> Depending on how you upgraded Erlang you may need to stop this process
> manually. I'd be surprised if this was the cause of the error though.
>
> > stop rabbit on server X (upgrader is Z, other node is Y)
> > remove all rabbit/erlang RPM's
> > Reinstall rabbit software
> > Update rabbitmqadmin
> > Enable management plugins (just in case)
> > Enable auto start.
> >
> > Rinse and repeat on servers Y, then Z and then start bringing them up
> > starting with upgrader node. First start Z, then start Y, then start X.
>
> From which versions did you upgrade?
>
> > On Fri, Aug 23, 2013 at 4:51 PM, Jason McIntosh wrote:
>
> >     =INFO REPORT==== 23-Aug-2013::15:37:45 ===
> >     Disk free limit set to 1000MB
>
> Were there any other log messages in either logfile or console messages
> on any nodes in the interval between or near 15:37:45 - 15:37:50?
>
> >         =ERROR REPORT==== 23-Aug-2013::15:37:50 ===
> >         ** Generic server <0.303.0> terminating
> >         ** Last message in was {'EXIT',<0.350.0>,normal}
> Did you perform the same upgrade in other environments, and the failure
> only occurred in one of the environments?
>
>
>
> -Emile
>
>
>
>
>
>
>
>
>


-- 
Jason McIntosh
http://mcintosh.poetshome.com/blog/
573-424-7612
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130827/2b4d3ec6/attachment.htm>


More information about the rabbitmq-discuss mailing list