[rabbitmq-discuss] RabbitMQ waits forever for PID file during startup
Simon MacMullen
simon at rabbitmq.com
Fri Jul 11 11:37:22 BST 2014
On 10/07/2014 3:02PM, Cesar Munoz wrote:
> Hi Simon,
>
> so we have tried to find the problem with the initial installation, but
> no luck yet. It is very difficult to track it, as it is totally
> non-deterministic!
> In the meantime, we installed the latest version of RabbitMQ, which
> includes de set -e fix, but the same issue still happened. Given the
> output of ps auxf
>
> https://gist.github.com/anonymous/62239513b154179a8a4e
>
> it looks like
>
> /bin/sh /etc/init.d/rabbitmq-server start
>
> and
>
> /bin/sh /usr/sbin/rabbitmqctlwait /var/run/rabbitmq/pid
>
> were running concurrently. Is there any chance that this fact created
> some sort of race condition between these 2 processes that would make
> the set -e fix not work?
The "set -e" should cause a failure in the case where the script was not
able to write the pid file for whatever reason. That's all. Looking at
the ps output posted in the latest case, the startup has got past that
point as it's started the beam process for the server.
"rabbitmqctl wait" should wait indefinitely for the server to start up,
as long as the server has not actually died.
But it looks like something is getting stuck? Is there anything in the
server logs at this point? Bearing in mind that the machine in question
has claimed to run out of memory writing a 5-byte file, so I don't
necessarily trust it.
Cheers, Simon
> Cheers,
> Cesar.
>
>
>
> On 6 June 2014 11:55, Cesar Munoz <cesar.munoz at ammeon.com
> <mailto:cesar.munoz at ammeon.com>> wrote:
>
> Hi Simon,
>
> the ulimits for rabbitmq user are pretty much the same, the only
> difference is that max user processes is set to 1024 instead of 2066207.
>
> About the system itself, it is true that there has to be something
> strange going on if a shell redirection can fail, but I'm checking
> the configuration and I don't see anything specially awkward.
>
> We are using Red Hat 6.4, and these are the parameters that we set
> in the sysctl.conf:
> http://pastebin.com/SfJBwrna
>
> The rest of the parameters in the kickstart file are pretty much the
> standard ones.
> This is an intermittent issue (we are testing how often it happens,
> so far we got 3 failures in 13 installations), so it is harder to
> track it!
> Either way, restarting the service works, so it looks like whatever
> causes the problem disappears after a while. I've been trying to
> find what could make this non-deterministic, but so far I haven't
> noticed anything unusual.
>
> Thanks again!
> Cesar.
>
>
> On 6 June 2014 11:27, Simon MacMullen <simon at rabbitmq.com
> <mailto:simon at rabbitmq.com>> wrote:
>
> On 06/06/2014 10:49AM, Cesar Munoz wrote:
>
> Hi Simon,
>
> the set -e looks like a very good idea, at least the process
> will return
> the failure straight away!
>
>
> Sure!
>
>
> These are the ulimits:
>
> [root at ms1 ~]# ulimit -a
>
>
> <snip>
>
> Those are the ulimits which apply to root - maybe they are
> different for the "rabbitmq" user?
>
> But more to the point: we're failing to do something very very
> simple here, there has to be something weird about this system
> if echo or shell redirection can fail with an error message
> about memory allocation.
>
> So have you configured anything unusual about this system?
>
>
> Cheers, Simon
>
> --
> Simon MacMullen
> RabbitMQ, Pivotal
>
>
>
>
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this email in error please notify the
> system manager. This message contains confidential information and is
> intended only for the individual named. If you are not the named
> addressee you should not disseminate, distribute or copy this e-mail.
>
--
Simon MacMullen
RabbitMQ, Pivotal
More information about the rabbitmq-discuss
mailing list