[rabbitmq-discuss] rabbitmqctl stall/hang when leaving a cluster

Thu Feb 23 11:52:17 GMT 2012

I *think* the stats database is a red herring. You say this happens when 
restarting?

On 23/02/12 00:30, Matt Pietrek wrote:
> Let me add some additional information, and re-summarize what I'm seeing.
>
> In our startup script for RabbitMQ we do the following;
>
> rabbitmq-server -detached
> rabbitmqctl status
> <Extract the PID from rabbitmqctl status, write to our PIDFILE>

There's a potential race here if an old server is running (maybe about 
to shut down?). rabbitmqctl status could pick up the old pid.

> rabbitmqctl wait PIDFILE

However, rabbitmqctl wait should then detect that the pid has died and 
fail. Unless the pid gets reused by the OS but that is presumably unlikely.

But rabbitmqctl wait will wait indefinitely as long as the pid is alive 
and not a fully functional rabbit node. So I'd check two things:

1) You should fix that race, it can be done safely:

Do not use rabbitmq-server -detached and rabbitmqctl status to get the 
pid. Instead set RABBITMQ_PID_FILE and background the rabbitmq-server 
script. You will then *definitely* get the right pid since the script 
writes its own pid then execs - no race possible.

2) Capture the stdout of rabbitmq-server when you start it - if 
rabbitmqctl wait still hangs, see how far it's got / what it's doing.

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, VMware