[rabbitmq-discuss] Someone else with a nodedown error

Fri May 17 20:38:11 BST 2013

Eric,

On 17 May 2013, at 16:05, Eric Berg wrote:
> We start all rabbit nodes via:
> 
> sudo /etc/init.d/rabbit-server start
> 

Ah. I've had problems on CentOS when doing that - it screwed up my permissions completely and I couldn't subsequently run the server. Re-installing and using `/sbin/service start rabbitmq-server` instead did the trick.  You're not on a redhat variant are you?

> We do have chef managing this server and has since caused a restart on 2 of our 4 nodes, it is now temporarily disabled.

That could cause a number of problems, especially if nodes are clustered, there is a netsplit and chef restarts them in the wrong order - though they should just fail to start, rather than melt down your data centre. :)

> I will send you the log files for all 4 nodes dating back several days.

Cool thanks. I'll spend some time looking through those.

> One thing I did notice in the log file for 3 of the 4 nodes:
> 
> =ERROR REPORT==== 16-May-2013::23:27:20 ===
> connection <0.25853.253>, channel 1 - soft error:
> {amqp_error,not_found,
>             "home node 'rabbit at rabbit-box' of durable queue 'my.queue.name' in vhost '/' is down or inaccessible",
>             'queue.declare'}
> 

Ah ok, that's probably nothing to worry about, though it may help with diagnosis.

> 
> When looking at the log files you will notice many entries like:
> 
> =INFO REPORT==== 17-May-2013::09:15:42 ===
> accepting AMQP connection <0.5117.0> (IP:55913 -> IP:5672)
> 
> =WARNING REPORT==== 17-May-2013::09:15:42 ===
> closing AMQP connection <0.5117.0> (IP:55913 -> IP:5672):
> connection_closed_abruptly
> 
> Those are our load balancers checking the node health, sorry for the log spam.
> 

Ok sure - I'll filter those out. :)

Cheers,
Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130517/e1c8ebb5/attachment.htm>