[rabbitmq-discuss] Automating a RabbitMQ 3.0 cluster on EC2

Sat Dec 29 14:12:06 GMT 2012

On Saturday, 29. December 2012 at 14:51, Francesco Mazzoli wrote:
> I’m not sure what you mean here. What does “RabbitMQ package” refers to? Can
> you be more clear on why “they converge into the same issue”?
>  
The package I mention refers to the package available on the RabbitMQ website. In this case I'm installing a Ubuntu package, which generates all its configuration files when installed and starts the service too.

The problems outlined converge into the same issue because of this last bit. It might seem like a useful feature of the package, but given that a node that should enter the cluster needs to be fresh and untouched, this is practically impossible to achieve with the cluster_nodes section in the configuration, at least when trying to avoid hacks that generate the configuration beforehand, where users and directories don't yet exist.

After the package installation, the node that was just freshly installed needs to be reset for these things to fully work.
>  
> Well, here to change clustering behavior you would probably have to force
> things a bit, either with “dangerous” commands or by `eval'ing Erlang
> expressions - so you’d be delving into clustering internals that, as
> implementation details, are not documented.
>  
> Note that you can also deploy some gentler measures like “check that the node
> has clustered, try again in 5 minutes if it didn’t”, but those would be ad-hoc
> measures that we can’t generalize as default behaviour.
>  
Sure, but I'm still interested in these, as odd as they may seem. My intent is to have an automated cluster setup, and for that to fully work, I need to know the oddities and ways to work around them. A lot of the documentation is aimed at setting up a cluster manually, which to me is a practice to be avoided in production.
>  
> Netsplits are tolerated to a certain extent, but currently we don’t precisely
> specify the semantics of the recovery mechanisms. What’s surely the case is
> that to use `join_cluster' both nodes have to be online - this is not that much
> about netsplits but about `join_cluster' being an operation that requires
> agreement from 2 parties.
>  
That's understandable behaviour, though I'd suggest considering some sort of retry in either scenario. Which might reduce the issues with the overall setup I've described here so far significantly.  
>  
> I’m still not sure what your problem is exactly. If the above makes sense,
> `join_cluster' should serve your needs. For what concerns the hostname
> problems, I’m waiting for more details.
>  
join_cluster does fit my needs, but in an automated environment, e.g. with Chef, that command would be triggered every 30 minutes. Would you consider that a good practice to do?

Hence me asking about experiences of automating a cluster setup in production. How people work to make it work, how people make sure a node joins the cluster, how to make sure resources aren't overused where not necessary, e.g. by triggering commands that don't need to be triggered, et.c

Cheers, Mathias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20121229/a493413e/attachment.htm>