<div dir="ltr"><div><div><div><div>Thanks Tim, I will send you a link to the log files privately. We do have mirrored queues, we setup an HA policy to mirror all queues to exactly 2 nodes of the 4, as of yet we have not made use of any synchronization policy.<br>
<br></div>We start all rabbit nodes via:<br><br></div></div></div>sudo /etc/init.d/rabbit-server start<br><br><div><div><div>We do have chef managing this server and has since caused a restart on 2 of our 4 nodes, it is now temporarily disabled. I will send you the log files for all 4 nodes dating back several days. One thing I did notice in the log file for 3 of the 4 nodes:<br>
<br>=ERROR REPORT==== 16-May-2013::23:27:20 ===<br>connection <0.25853.253>, channel 1 - soft error:<br>{amqp_error,not_found,<br> "home node 'rabbit@rabbit-box' of durable queue '<a href="http://my.queue.name" target="_blank">my.queue.name</a>' in vhost '/' is down or inaccessible",<br>
'queue.declare'}<br><br><br></div><div>When looking at the log files you will notice many entries like:<br><br>=INFO REPORT==== 17-May-2013::09:15:42 ===<br>accepting AMQP connection <0.5117.0> (IP:55913 -> IP:5672)<br>
<br>=WARNING REPORT==== 17-May-2013::09:15:42 ===<br>closing AMQP connection <0.5117.0> (IP:55913 -> IP:5672):<br>connection_closed_abruptly<br><br></div><div>Those are our load balancers checking the node health, sorry for the log spam.<br>
</div><div><br><br></div><div><br><br><br></div></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, May 17, 2013 at 9:32 AM, Tim Watson <span dir="ltr"><<a href="mailto:tim@rabbitmq.com" target="_blank">tim@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hmn<div><br><div><div class="im"><div>On 17 May 2013, at 13:45, Eric Berg wrote:</div>
<br><blockquote type="cite"><div dir="ltr"><div>Thanks for your response Tim. If you would like SSH access to these boxes let me know, we can work something out privately. Thanks!<br><br></div></div></blockquote><div><br>
</div></div><div>Ok, though first of all I'd like to know if you supply logs for the nodes in question? A private drop box would be fine.</div><div class="im"><br><blockquote type="cite"><div dir="ltr"><div>Update from yesterday:<br>
</div><div>It looks like 2 of the 4 nodes in our cluster have finally shut down, all channels are now gone. Another node in the cluster hangs on<br>
</div><div>> sudo rabbitmqctl status<br><br></div><div>and the final node in the cluster appears to be running just fine. It however sees the unresponsive node in the cluster status as a running node, as does the web UI.<br>
</div><div><br></div></div></blockquote><div><br></div></div><div>Right, so we've still got an unresponsive node. Do you have any mirrored queues, and if so, what synchronisation and/or recovery policies are you using?</div>
<div class="im"><br><blockquote type="cite"><div dir="ltr"><div><br><b>When you upgraded your cluster, what RabbitMQ version did you upgrade
from and to, and did you upgrade Erlang as well and if so, which
versions were involved?<br></b></div>- we upgraded from 3.0.4 to 3.1.0, we did not upgrade Erlang it was/is at version R15B03. We did however install it via RPM with the --nodeps flag because it did not detect the Erlang dependency correctly. We had previously installed Erlang:<br>
<br>esl-erlang.x86_64 R15B03-2 @erlang-solutions <br><br></div></blockquote><div><br></div></div><div>Hmn, I suppose it's possible that this re-install went wrong somehow and is causing some of the things below.</div>
<div class="im"><br><blockquote type="cite"><div dir="ltr"><b><br>What happens if you start up Erlang by itself, using `erl -sname test` - do you still see all those screwy warnings? </b><br><div>All 4 of the nodes can run this without issue as my user, when I sudo su to rabbitmq user I get errors on 2 of the 4 nodes as such:<br>
<br></div></div></blockquote><div><br></div></div><div>Well the nodes should always be running as the rabbitmq user, so how're you starting them as your user? That might be at the root of some of these problems, viz the rabbitmq-server (service) should always run as the rabbitmq user and when issuing rabbitmqctl commands and the like, you would normally do `$ sudo rabbitmqctl status` and so on. Log files would definitely help though.</div>
</div></div></div><br>_______________________________________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>
<br></blockquote></div><br></div>