[rabbitmq-discuss] Rabbitmq boot failure with "tables_not_present"

Thu Jan 17 17:35:03 GMT 2013

Hi Jerry,
sudo("bash -c 'echo XXXXXXXXXXXXXXXX > /var/lib/rabbitmq/.erlang.cookie'")
sudo("chown rabbitmq /var/lib/rabbitmq/.erlang.cookie")
sudo("chmod 600 /var/lib/rabbitmq/.erlang.cookie")
sudo("rm -fr /var/lib/rabbitmq/mnesia")

That is indeed a fine way to get rid of your Mnesia contents including clustering info and any metadata that needs to be shared amongst the nodes (queue, exchange, binding, user, vhost, etc. definitions).

On the other hand, after you've done it, you've got no really good reason to expect your nodes to act as clustered.
What I want to achieve after redeployment is to erase previous states completely and let the cluster starts with a clean state, that's why I erased the /mnesia folder (is there a better way to do that?). The problem is sometimes the error messages show up for a few minutes then everything works fine after that, but other times I saw the error message being logged for 80 minutes before the cluster works correctly. Do you have any suggestions?

Are you establishing your clusters using the rabbitmq command line tools or by statically encoding their properties in your rabbitmq.config files?  You're going to have to repeat whichever you did when you bring a newly redeployed cluster, having gone through the cleansing you outline above, back online.

You might consider setting up scripts to execute the appropriate commands, as per our clustering guide, on the appropriate nodes after you've done the scripted clean-up you describe.

Oh, I used rabbitmq.conf to config clustering, like Simon has pointed it out in another email, here is what it looks like:
[
{rabbit,
  [
    {tcp_listeners, [5672]},
    {cluster_nodes, {['rabbit at ip-10-0-2-97', 'rabbit at ip-10-0-2-106'], disc}}
  ]
}
].
I have the same config file shown above on the two rabbitmq servers 10.0.2.97 and 10.0.2.106.

Do you have any suggestions that what might have gone wrong? This configuration works fine in about 80% of time, when the "tables_not_present" error only show up for a few minutes. In about 20% of time, this error appears in the log file for as long as several hours, but in the end the cluster successfully established. Is this a normal behavior?

Thanks,
Shanyu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130117/34fc0f38/attachment.htm>