[rabbitmq-discuss] rabbitmq 2.6.1 cluster failure recovery

Alain Dazzi alain at kno.com
Sat Oct 1 01:25:08 BST 2011


Hi,

I can't get my rabbitmq cluster to recover from a dead node. So
perhaps someone can help ...

node-1 (cumulonimbus)
Linux cumulonimbus 2.6.38-11-server #50-Ubuntu SMP Mon Sep 12 21:34:27
UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
ii  rabbitmq-server                       2.6.1-1
root at cumulonimbus:~# ls -1 /usr/lib/rabbitmq/lib/rabbitmq_server-2.6.1/plugins/
amqp_client-2.6.1.ez
mochiweb-1.3-rmq2.6.1-git9a53dbd.ez
rabbitmq_management-2.6.1.ez
rabbitmq_management_agent-2.6.1.ez
rabbitmq_management_visualiser-2.6.1.ez
rabbitmq_mochiweb-2.6.1.ez
README
webmachine-1.7.0-rmq2.6.1-hg0c4b60a.ez


node-2 (nuage-informatique)
Linux nuage-informatique 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12
21:17:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
ii  rabbitmq-server                       2.6.1-1

1/ stop both servers and set-up same .erlang_cookie value; restart nodes

2/ on node1 I create a cluster
    rabbitmqctl stop_app
    rabbitmqctl reset
    rabbitmqctl cluster rabbit at nuage-informatique rabbit at cumulonimbus
Clustering node rabbit at cumulonimbus with ['rabbit at nuage-informatique',
                                          rabbit at cumulonimbus] ...
...done.

3/ This creates 2 disc nodes !!!

4/ run a test and pass data successfully

5/ restart node-2 (service rabbitmq-server stop)
service rabbitmq-server start ... fails with ...
root at nuage-informatique:~/Desktop# service rabbitmq-server start
Starting rabbitmq-server: FAILED - check /var/log/rabbitmq/startup_{log, _err}
rabbitmq-server.
Erlang has closed
^M
Crash dump was written to: erl_crash.dump^M
Kernel pid terminated (application_controller)
({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}})^M

Activating RabbitMQ plugins ...
1 plugins activated:
* rabbitmq_management_agent-2.6.1


+---+   +---+
|   |   |   |
|   |   |   |
|   |   |   |
|   +---+   +-------+
|                   |
| RabbitMQ  +---+   |
|           |   |   |
|   v2.6.1  +---+   |
|                   |
+-------------------+
AMQP 0-9-1 / 0-9 / 0-8
Copyright (C) 2007-2011 VMware, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/

node           : rabbit at nuage-informatique
app descriptor :
/usr/lib/rabbitmq/lib/rabbitmq_server-2.6.1/sbin/../ebin/rabbit.app
home dir       : /var/lib/rabbitmq
config file(s) : (none)
cookie hash    : qHpvLciGsi5o4f8ScVzyWg==
log            : /var/log/rabbitmq/rabbit at nuage-informatique.log
sasl log       : /var/log/rabbitmq/rabbit at nuage-informatique-sasl.log
database dir   : /var/lib/rabbitmq/mnesia/rabbit at nuage-informatique
erlang version : 5.7.4

-- rabbit boot start
starting file handle cache server                                     ...done
starting worker pool                                                  ...done
starting database
...BOOT ERROR: FAILED
Reason: {error,
            {timeout_waiting_for_tables,
                [rabbit_user,rabbit_user_permission,rabbit_vhost,
                 rabbit_durable_route,rabbit_durable_exchange,
                 rabbit_durable_queue]}}
Stacktrace: [{rabbit_mnesia,wait_for_tables,1},
             {rabbit_mnesia,check_schema_integrity,0},
             {rabbit_mnesia,ensure_schema_integrity,0},
             {rabbit_mnesia,init,0},
             {rabbit,'-run_boot_step/1-lc$^1/1-1-',1},
             {rabbit,run_boot_step,1},
             {rabbit,'-start/2-lc$^0/1-0-',1},
             {rabbit,start,2}]
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}}"}^M

At this point I have to re-install node-2 to recover.

Any idea why?

Thank you,

next I would like to test mirrored q but obviously this has to work first...

-Alain


More information about the rabbitmq-discuss mailing list