<div dir="ltr">I upgraded from 3.0.4 to 3.1.5 and from esl-erlang R15B03 to esl-erlang R16B01. �I've been doing some digging and can't find any reason why this would happen, no additional log files, nothing. �Note there are three nodes involved in a cluster. �I upgraded my "dvlp" sample box which wasn't clustered with no problem using the exact same script I used to upgrade my alpha cluster. �Below is the script I'm using (might be useful for others upgrading a cluster). �I'm going to try recreating my "Alpha"�environment�and redoing the upgrade. �The ONLY thing that I can think of - I was doing stop/starts as part of training of the cluster about 20 minutes before I did the upgrade to test various concepts, e.g. message loss, master node recovery, etc. etc. �At this point, if people haven't seen or heard of this, I'll chalk it up to something funky disk or otherwise until I can try and replicate it. �My biggest concern at this point is that because i no longer have the backup's of the mnesia database, replicating the environment won't replicate exactly what was in the database, and so I won't be able to replicate it. �When I start deploying to our production environment, I won't make that mistake - I'll shutdown rabbit, back up my whole rabbit_data_directory first :)<div style>
Thanks!<br>Jason</div><div><br></div><div><div># ########################################<br></div><div># 04/22/2013 - JasonMcIntosh - core upgrade script to go through each reported node in a cluster and upgrade it</div>
<div># ########################################</div><div>if [ ! -e /data/rabbitmq ]; then</div><div><span class="" style="white-space:pre">        </span>echo "No rabbit found on $1" > /var/log/rabbit_upgrade.log</div>
<div><span class="" style="white-space:pre">        </span>exit 0</div><div>fi</div><div><br></div><div>export LOG_FILE=/var/log/rabbitmq/upgrade.log</div><div>rm -f $LOG_FILE</div><div><br></div><div>export CLUSTER_STATUS="`rabbitmqctl cluster_status`"</div>
<div>export CLUSTER_STATUS="`echo $CLUSTER_STATUS|tr -d ' \n'`"</div><div>echo "Started at `date`" >> $LOG_FILE</div><div>echo "$CLUSTER_STATUS" >> $LOG_FILE</div><div>echo "" >> $LOG_FILE</div>
<div><br></div><div>getServerFQDN() {</div><div><span class="" style="white-space:pre">        </span>SERVER_FQDN=`echo $1 | awk -F@ '{print $2};'`</div><div><span class="" style="white-space:pre">        </span>FQDN=`nslookup $SERVER_FQDN|grep Name|awk -F\: '{print $2}'|sed 's/ //g'`</div>
<div><span class="" style="white-space:pre">        </span>echo $FQDN</div><div>}</div><div><br></div><div>upgradeRabbitNode() {</div><div><span class="" style="white-space:pre">        </span>SERVER_FQDN=`getServerFQDN $1`</div><div><span class="" style="white-space:pre">        </span>echo "Doing upgrade of $SERVER_FQDN" >> $LOG_FILE</div>
<div><span class="" style="white-space:pre">        </span># The upgrade deploy job actually stops the rabbit server, shouldn't need this, but we'll do it anways</div><div style>� # Not exact commands below as I'm using bladelogic internal commands to do these, but the idea should be the same.</div>
<div><span class="" style="white-space:pre">        </span>remote_exec ${SERVER_FQDN} service rabbitmq-server stop �>> $LOG_FILE</div><div><span class="" style="white-space:pre">        </span>remote_exec�$SERVER_FQDN yum -y erase rabbit* erlang* >> $LOG_FILE</div>
<div>� � � ��remote_exec��$SERVER_FQDN yum -y install rabbitmq-server-3.1.5... >> $LOG_FILE<br></div><div>� � � ��remote_exec�$SERVER_FQDN rabbitmq-plugins enable rabbitmq_management rabbitmq_management_agent rabbitmq_management_visualiser rabbitmq_shovel rabbitmq_shovel_management>> $LOG_FILE<br>
</div><div style>� � � ��remote_exec��$SERVER_FQDN�chkconfig rabbitmq-server on</div><div>}</div><div><br></div><div><br></div><div>#Get the cluster nodes and pick the first disk node as the "upgrader" node</div>
<div>export UPGRADER_NODE=`echo "$CLUSTER_STATUS"|awk -F\[ '{print $4}'|awk -F\] '{print $1}'|awk -F, '{print $1}'`</div><div>export UPGRADER_FQDN=`getServerFQDN $UPGRADER_NODE`</div><div>
export NODE_LIST="`echo $CLUSTER_STATUS|awk -F\[ '{sub(/.*running_nodes/,\"\")};1'|awk -F\[ '{print $2}'|awk -F\] '{print $1}'|sed -e 's/,/ /g'`"</div><div><br></div><div>
echo "Node list: $NODE_LIST " >> $LOG_FILE</div><div>echo "Disk node for last upgrade $UPGRADER_NODE">> $LOG_FILE</div><div><br></div><div>if [ "$UPGRADER_NODE" = "" ]; then</div>
<div><span class="" style="white-space:pre">        </span>echo " ** No upgrader node found! EXITING" >> $LOG_FILE</div><div><span class="" style="white-space:pre">        </span>exit -1</div><div>fi</div><div><br></div>
<div><br></div><div>#Shutdown and upgrade all other nodes than the upgrader node</div><div>echo "Doing upgrade of all non upgrade nodes..." >> $LOG_FILE</div><div>for clusterNode in ${NODE_LIST}; do</div><div>
<span class="" style="white-space:pre">        </span>if [ $clusterNode != $UPGRADER_NODE ]; then�</div><div><span class="" style="white-space:pre">                </span>upgradeRabbitNode $clusterNode</div><div><span class="" style="white-space:pre">        </span>fi</div>
<div>done</div><div><br></div><div>#Upgrade the upgrader node now.</div><div>echo "Upgrade the core upgrade node ..." >> $LOG_FILE</div><div>upgradeRabbitNode $UPGRADER_NODE</div><div><br></div><div>#NOW start all nodes, starting with the upgrader node.</div>
<div>echo "Starting rabbit on upgrader node..." >> $LOG_FILE</div><div>remote_exec�$UPGRADER_FQDN service rabbitmq-server start >> $LOG_FILE</div><div>for clusterNode in ${NODE_LIST}; do</div><div><span class="" style="white-space:pre">        </span>if [ $clusterNode != $UPGRADER_NODE ]; then�</div>
<div><span class="" style="white-space:pre">                </span>SERVER_FQDN=`getServerFQDN $clusterNode`</div><div><span class="" style="white-space:pre">                </span>echo "Starting rabbit on NON upgrade nodes..." >> $LOG_FILE</div>
<div><span class="" style="white-space:pre">                </span>remote_exec�$SERVER_FQDN service rabbitmq-server start >> $LOG_FILE</div><div><span class="" style="white-space:pre">        </span>fi</div><div>done</div><div><br></div>
<div>#Finally, make sure our HA Policy is applied to all our virtual hosts</div><div><br></div><div>echo "Finished with upgrade..." >> $LOG_FILE</div><div>#Report how we worked out...</div><div>echo "</div>
<div>RESULTS</div><div>"</div><div>cat $LOG_FILE</div><div><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Aug 27, 2013 at 4:58 AM, Emile Joubert <span dir="ltr"><<a href="mailto:emile@rabbitmq.com" target="_blank">emile@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">On 23/08/13 23:01, Jason McIntosh wrote:<br>
> A ps auf shows /usr/lib/erlang/erts-5.9.3.1/bin/epmd -daemon as still<br>
> running. �SO I'm wondering if that might have an impact.<br>
<br>
</div>Depending on how you upgraded Erlang you may need to stop this process<br>
manually. I'd be surprised if this was the cause of the error though.<br>
<div class="im"><br>
> stop rabbit on server X (upgrader is Z, other node is Y)<br>
> remove all rabbit/erlang RPM's<br>
> Reinstall rabbit software<br>
> Update rabbitmqadmin<br>
> Enable management plugins (just in case)<br>
> Enable auto start.<br>
><br>
> Rinse and repeat on servers Y, then Z and then start bringing them up<br>
> starting with upgrader node. First start Z, then start Y, then start X.<br>
<br>
</div>From which versions did you upgrade?<br>
<div class="im"><br>
> On Fri, Aug 23, 2013 at 4:51 PM, Jason McIntosh wrote:<br>
<br>
> � � =INFO REPORT==== 23-Aug-2013::15:37:45 ===<br>
> � � Disk free limit set to 1000MB<br>
<br>
</div>Were there any other log messages in either logfile or console messages<br>
on any nodes in the interval between or near 15:37:45 - 15:37:50?<br>
<div class="im"><br>
> � � � � =ERROR REPORT==== 23-Aug-2013::15:37:50 ===<br>
> � � � � ** Generic server <0.303.0> terminating<br>
> � � � � ** Last message in was {'EXIT',<0.350.0>,normal}<br>
</div>Did you perform the same upgrade in other environments, and the failure<br>
only occurred in one of the environments?<br>
<span class=""><font color="#888888"><br>
<br>
<br>
-Emile<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>Jason McIntosh<br><a href="http://mcintosh.poetshome.com/blog/">http://mcintosh.poetshome.com/blog/</a><br>573-424-7612
</div></div></div>