[rabbitmq-discuss] rabbitmq 2.6.1 cluster failure recovery
Alain Dazzi
alain at kno.com
Sat Oct 1 01:25:08 BST 2011
Hi,
I can't get my rabbitmq cluster to recover from a dead node. So
perhaps someone can help ...
node-1 (cumulonimbus)
Linux cumulonimbus 2.6.38-11-server #50-Ubuntu SMP Mon Sep 12 21:34:27
UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
ii rabbitmq-server 2.6.1-1
root at cumulonimbus:~# ls -1 /usr/lib/rabbitmq/lib/rabbitmq_server-2.6.1/plugins/
amqp_client-2.6.1.ez
mochiweb-1.3-rmq2.6.1-git9a53dbd.ez
rabbitmq_management-2.6.1.ez
rabbitmq_management_agent-2.6.1.ez
rabbitmq_management_visualiser-2.6.1.ez
rabbitmq_mochiweb-2.6.1.ez
README
webmachine-1.7.0-rmq2.6.1-hg0c4b60a.ez
node-2 (nuage-informatique)
Linux nuage-informatique 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12
21:17:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
ii rabbitmq-server 2.6.1-1
1/ stop both servers and set-up same .erlang_cookie value; restart nodes
2/ on node1 I create a cluster
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl cluster rabbit at nuage-informatique rabbit at cumulonimbus
Clustering node rabbit at cumulonimbus with ['rabbit at nuage-informatique',
rabbit at cumulonimbus] ...
...done.
3/ This creates 2 disc nodes !!!
4/ run a test and pass data successfully
5/ restart node-2 (service rabbitmq-server stop)
service rabbitmq-server start ... fails with ...
root at nuage-informatique:~/Desktop# service rabbitmq-server start
Starting rabbitmq-server: FAILED - check /var/log/rabbitmq/startup_{log, _err}
rabbitmq-server.
Erlang has closed
^M
Crash dump was written to: erl_crash.dump^M
Kernel pid terminated (application_controller)
({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}})^M
Activating RabbitMQ plugins ...
1 plugins activated:
* rabbitmq_management_agent-2.6.1
+---+ +---+
| | | |
| | | |
| | | |
| +---+ +-------+
| |
| RabbitMQ +---+ |
| | | |
| v2.6.1 +---+ |
| |
+-------------------+
AMQP 0-9-1 / 0-9 / 0-8
Copyright (C) 2007-2011 VMware, Inc.
Licensed under the MPL. See http://www.rabbitmq.com/
node : rabbit at nuage-informatique
app descriptor :
/usr/lib/rabbitmq/lib/rabbitmq_server-2.6.1/sbin/../ebin/rabbit.app
home dir : /var/lib/rabbitmq
config file(s) : (none)
cookie hash : qHpvLciGsi5o4f8ScVzyWg==
log : /var/log/rabbitmq/rabbit at nuage-informatique.log
sasl log : /var/log/rabbitmq/rabbit at nuage-informatique-sasl.log
database dir : /var/lib/rabbitmq/mnesia/rabbit at nuage-informatique
erlang version : 5.7.4
-- rabbit boot start
starting file handle cache server ...done
starting worker pool ...done
starting database
...BOOT ERROR: FAILED
Reason: {error,
{timeout_waiting_for_tables,
[rabbit_user,rabbit_user_permission,rabbit_vhost,
rabbit_durable_route,rabbit_durable_exchange,
rabbit_durable_queue]}}
Stacktrace: [{rabbit_mnesia,wait_for_tables,1},
{rabbit_mnesia,check_schema_integrity,0},
{rabbit_mnesia,ensure_schema_integrity,0},
{rabbit_mnesia,init,0},
{rabbit,'-run_boot_step/1-lc$^1/1-1-',1},
{rabbit,run_boot_step,1},
{rabbit,'-start/2-lc$^0/1-0-',1},
{rabbit,start,2}]
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}}"}^M
At this point I have to re-install node-2 to recover.
Any idea why?
Thank you,
next I would like to test mirrored q but obviously this has to work first...
-Alain
More information about the rabbitmq-discuss
mailing list