| I *think* the stats database is a red herring.<br><br>Perhaps. But it's the only correlation that I've seen. That is, I've never seen it happen on node that didn't have the stats database before it shut down.<br>
<br>A little more background context: I'm writing "rolling restart" logic. For each node in the cluster, in sequence, I stop the node, perform update logic (currently nothing), then restart the node. <br><br>
| You say this happens when restarting?<br><br>Yes. Occasionally the node will restart OK, but more often than not, it hangs on the "rabbitmqctl wait"<br><br>I modified my script to run rabbitmq-server as a background task. Also, worth noting that these scripts are invoked remotely via Capistrano, so until I prefaced them with nohup, the server would start then immediately exit. The invocation line now looks like this:<br>
<br>nohup rabbitmq-server &<br><br>The nohup.out on the failing node ends with:<br><br>+---+ +---+<br>| | | |<br>| | | |<br>| | | |<br>| +---+ +-------+<br>| |<br>| RabbitMQ +---+ |<br>
| | | |<br>| v2.7.1 +---+ |<br>| |<br>+-------------------+<br>AMQP 0-9-1 / 0-9 / 0-8<br>Copyright (C) 2007-2011 VMware, Inc.<br>Licensed under the MPL. See <a href="http://www.rabbitmq.com/">http://www.rabbitmq.com/</a><br>
<br>node : rabbit@play2<br>app descriptor : /usr/lib/rabbitmq/lib/rabbitmq_server-2.7.1/sbin/../ebin/rabbit.app<br>home dir : /home/mpietrek<br>config file(s) : /home/mpietrek/work/var/run/rabbitmq.config<br>
cookie hash : pS5H9kY3Wra/XdLEKT5hgQ==<br>log : /home/mpietrek/work/logs/<a href="http://play2.mpietrek.internal.illumita.com/rabbit@play2.log">play2.mpietrek.internal.illumita.com/rabbit@play2.log</a><br>sasl log : /home/mpietrek/work/logs/<a href="http://play2.mpietrek.internal.illumita.com/rabbit@play2-sasl.log">play2.mpietrek.internal.illumita.com/rabbit@play2-sasl.log</a><br>
database dir : /home/mpietrek/work/var/lib/rabbit@play2<br>erlang version : 5.7.4<br><br>-- rabbit boot start<br>starting file handle cache server ...done<br>starting worker pool ...done<br>
starting database ...<br><br><br><div class="gmail_quote">On Thu, Feb 23, 2012 at 3:52 AM, Simon MacMullen <span dir="ltr"><<a href="mailto:simon@rabbitmq.com">simon@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I *think* the stats database is a red herring. You say this happens when restarting?<div class="im"><br>
<br>
On 23/02/12 00:30, Matt Pietrek wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Let me add some additional information, and re-summarize what I'm seeing.<br>
<br>
In our startup script for RabbitMQ we do the following;<br>
<br>
rabbitmq-server -detached<br>
rabbitmqctl status<br>
<Extract the PID from rabbitmqctl status, write to our PIDFILE><br>
</blockquote>
<br></div>
There's a potential race here if an old server is running (maybe about to shut down?). rabbitmqctl status could pick up the old pid.<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
rabbitmqctl wait PIDFILE<br>
</blockquote>
<br>
However, rabbitmqctl wait should then detect that the pid has died and fail. Unless the pid gets reused by the OS but that is presumably unlikely.<br>
<br>
But rabbitmqctl wait will wait indefinitely as long as the pid is alive and not a fully functional rabbit node. So I'd check two things:<br>
<br>
1) You should fix that race, it can be done safely:<br>
<br>
Do not use rabbitmq-server -detached and rabbitmqctl status to get the pid. Instead set RABBITMQ_PID_FILE and background the rabbitmq-server script. You will then *definitely* get the right pid since the script writes its own pid then execs - no race possible.<br>
<br>
2) Capture the stdout of rabbitmq-server when you start it - if rabbitmqctl wait still hangs, see how far it's got / what it's doing.<div class="HOEnZb"><div class="h5"><br>
<br>
Cheers, Simon<br>
<br>
-- <br>
Simon MacMullen<br>
RabbitMQ, VMware<br>
</div></div></blockquote></div><br>