<div dir="ltr"><div><div><div>Hi Simon,<br><br></div>the thing is, /var/run/rabbitmq/pid still contains the "Cannot allocate memory" error, that's probably why the wait pid is still blocked. The system logs are not saying anything new, but we run sos after reproducing the issue and we're taking a look to see if there is anything interesting. I'll let you know!<br>
<br></div>Thanks,<br></div>Cesar.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 11 July 2014 11:37, Simon MacMullen <span dir="ltr"><<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On 10/07/2014 3:02PM, Cesar Munoz wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Simon,<br>
<br><div class="">
so we have tried to find the problem with the initial installation, but<br>
no luck yet. It is very difficult to track it, as it is totally<br>
non-deterministic!<br>
In the meantime, we installed the latest version of RabbitMQ, which<br>
includes de set -e fix, but the same issue still happened. Given the<br>
output of ps auxf<br>
<br>
<a href="https://gist.github.com/anonymous/62239513b154179a8a4e" target="_blank">https://gist.github.com/<u></u>anonymous/62239513b154179a8a4e</a><br>
<br>
it looks like<br>
<br>
/bin/sh /etc/init.d/rabbitmq-server start<br>
<br>
and<br>
<br></div>
/bin/sh /usr/sbin/rabbitmqctlwait /var/run/rabbitmq/pid<div class=""><br>
<br>
were running concurrently. Is there any chance that this fact created<br>
some sort of race condition between these 2 processes that would make<br>
the set -e fix not work?<br>
</div></blockquote>
<br>
The "set -e" should cause a failure in the case where the script was not able to write the pid file for whatever reason. That's all. Looking at the ps output posted in the latest case, the startup has got past that point as it's started the beam process for the server.<br>
<br>
"rabbitmqctl wait" should wait indefinitely for the server to start up, as long as the server has not actually died.<br>
<br>
But it looks like something is getting stuck? Is there anything in the server logs at this point? Bearing in mind that the machine in question has claimed to run out of memory writing a 5-byte file, so I don't necessarily trust it.<br>
<br>
Cheers, Simon<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">
Cheers,<br>
Cesar.<br>
<br>
<br>
<br>
On 6 June 2014 11:55, Cesar Munoz <<a href="mailto:cesar.munoz@ammeon.com" target="_blank">cesar.munoz@ammeon.com</a><br></div><div class="">
<mailto:<a href="mailto:cesar.munoz@ammeon.com" target="_blank">cesar.munoz@ammeon.com</a><u></u>>> wrote:<br>
<br>
Hi Simon,<br>
<br>
the ulimits for rabbitmq user are pretty much the same, the only<br>
difference is that max user processes is set to 1024 instead of 2066207.<br>
<br>
About the system itself, it is true that there has to be something<br>
strange going on if a shell redirection can fail, but I'm checking<br>
the configuration and I don't see anything specially awkward.<br>
<br>
We are using Red Hat 6.4, and these are the parameters that we set<br>
in the sysctl.conf:<br>
<a href="http://pastebin.com/SfJBwrna" target="_blank">http://pastebin.com/SfJBwrna</a><br>
<br>
The rest of the parameters in the kickstart file are pretty much the<br>
standard ones.<br>
This is an intermittent issue (we are testing how often it happens,<br>
so far we got 3 failures in 13 installations), so it is harder to<br>
track it!<br>
Either way, restarting the service works, so it looks like whatever<br>
causes the problem disappears after a while. I've been trying to<br>
find what could make this non-deterministic, but so far I haven't<br>
noticed anything unusual.<br>
<br>
Thanks again!<br>
Cesar.<br>
<br>
<br>
On 6 June 2014 11:27, Simon MacMullen <<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a><br></div><div class="">
<mailto:<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>>> wrote:<br>
<br>
On 06/06/2014 10:49AM, Cesar Munoz wrote:<br>
<br>
Hi Simon,<br>
<br>
the set -e looks like a very good idea, at least the process<br>
will return<br>
the failure straight away!<br>
<br>
<br>
Sure!<br>
<br>
<br>
These are the ulimits:<br>
<br>
[root@ms1 ~]# ulimit -a<br>
<br>
<br>
<snip><br>
<br>
Those are the ulimits which apply to root - maybe they are<br>
different for the "rabbitmq" user?<br>
<br>
But more to the point: we're failing to do something very very<br>
simple here, there has to be something weird about this system<br>
if echo or shell redirection can fail with an error message<br>
about memory allocation.<br>
<br>
So have you configured anything unusual about this system?<br>
<br>
<br>
Cheers, Simon<br>
<br>
--<br>
Simon MacMullen<br>
RabbitMQ, Pivotal<br>
<br>
<br>
<br>
<br></div>
This email and any files transmitted with it are confidential and<br>
intended solely for the use of the individual or entity to whom they are<br>
addressed. If you have received this email in error please notify the<br>
system manager. This message contains confidential information and is<br>
intended only for the individual named. If you are not the named<br>
addressee you should not disseminate, distribute or copy this e-mail.<br>
<br>
</blockquote><div class="HOEnZb"><div class="h5">
<br>
-- <br>
Simon MacMullen<br>
RabbitMQ, Pivotal<br>
</div></div></blockquote></div><br></div>
<br>
<div><font size="2">This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.</font></div><div><br></div>