<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Eric,<div><br><div><div>On 17 May 2013, at 16:05, Eric Berg wrote:</div><blockquote type="cite"><div dir="ltr"><div><div><div><div>We start all rabbit nodes via:</div><br></div></div></div>sudo /etc/init.d/rabbit-server start<br><br><div><div></div></div></div></blockquote><div><br></div><div>Ah. I've had problems on CentOS when doing that - it screwed up my permissions completely and I couldn't subsequently run the server. Re-installing and using `/sbin/service start rabbitmq-server` instead did the trick. You're not on a redhat variant are you?</div><br><blockquote type="cite"><div dir="ltr"><div><div><div>We do have chef managing this server and has since caused a restart on 2 of our 4 nodes, it is now temporarily disabled.</div></div></div></div></blockquote><div><br></div><div>That could cause a number of problems, especially if nodes are clustered, there is a netsplit and chef restarts them in the wrong order - though they should just fail to start, rather than melt down your data centre. :)</div><br><blockquote type="cite"><div dir="ltr"><div><div><div> I will send you the log files for all 4 nodes dating back several days.</div></div></div></div></blockquote><div><br></div><div>Cool thanks. I'll spend some time looking through those.</div><br><blockquote type="cite"><div dir="ltr"><div><div><div> One thing I did notice in the log file for 3 of the 4 nodes:<br>
<br>=ERROR REPORT==== 16-May-2013::23:27:20 ===<br>connection <0.25853.253>, channel 1 - soft error:<br>{amqp_error,not_found,<br> "home node 'rabbit@rabbit-box' of durable queue '<a href="http://my.queue.name/" target="_blank">my.queue.name</a>' in vhost '/' is down or inaccessible",<br>
'queue.declare'}<br><br></div></div></div></div></blockquote><div><br></div><div>Ah ok, that's probably nothing to worry about, though it may help with diagnosis.</div><br><blockquote type="cite"><div dir="ltr"><div><div><div><br></div><div>When looking at the log files you will notice many entries like:<br><br>=INFO REPORT==== 17-May-2013::09:15:42 ===<br>accepting AMQP connection <0.5117.0> (IP:55913 -> IP:5672)<br>
<br>=WARNING REPORT==== 17-May-2013::09:15:42 ===<br>closing AMQP connection <0.5117.0> (IP:55913 -> IP:5672):<br>connection_closed_abruptly<br><br></div><div>Those are our load balancers checking the node health, sorry for the log spam.<br>
</div><div><br></div></div></div></div></blockquote><div><br></div><div>Ok sure - I'll filter those out. :)</div><div><br></div><div>Cheers,</div><div>Tim</div></div></div></body></html>