[rabbitmq-discuss] Disc node clustering
Artsiom Gulin
u2.storm at gmail.com
Fri Jan 27 12:03:43 GMT 2012
Hello.
I have some unpleasant issues while clustering disc nodes.
Two brokers located on two hosts.
Steps to reproduce:
1. Cluster *second* machine *with first* as disc node.
(RABBITMQ_NODENAME=wosnfs).
[root at epbyminw2482t3 ~]# rabbitmqctl stop_app && rabbitmqctl reset &&
rabbitmqctl cluster wosnfs@`hostname -s` wosnfs at epbyminw2482t2 &&
rabbitmqctl start_app
2. Remove first node
[root at epbyminw2482t2]# rabbitmqctl stop_app && rabbitmqctl cluster
wosnfs@`hostname -s` && rabbitmqctl reset && rabbitmqctl start_app
3. Restart rabbitmq-server service on second node.
[root at epbyminw2482t3 ~]# service rabbitmq-server restart
Restarting rabbitmq-server: FAILED - check
/var/log/rabbitmq/startup_{log, _err}
rabbitmq-server.
[root at epbyminw2482t3 ~]# cat /var/log/rabbitmq/startup_err
Erlang has closed
Crash dump was written to: erl_crash.dump
Kernel pid terminated (application_controller)
({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}})
[root at epbyminw2482t3 ~]# cat /var/log/rabbitmq/startup_log
Activating RabbitMQ plugins ...
0 plugins activated:
/<-CUT->/
erlang version : 5.8.4
-- rabbit boot start
starting file handle cache server
...done
starting worker pool
...done
starting database
...BOOT ERROR: FAILED
Reason: {error,
{unable_to_join_cluster,
[wosnfs at epbyminw2482t3,wosnfs at epbyminw2482t2],
{merge_schema_failed,
"Bad cookie in table definition
mirrored_sup_childspec: wosnfs at epbyminw2482t3 =
{cstruct,mirrored_sup_childspec,ordered_set,[wosnfs at epbyminw2482t3],[],[],0,read_write,false,[],[],false,mirrored_sup_childspec,[key,mirroring_pid,childspec],[],[],{{1327,663993,999525},*wosnfs at epbyminw2482t2*},{{3,1},{wosnfs at epbyminw2482t3,{1327,664433,471064}}}},
*wosnfs at epbyminw2482t2* =
{cstruct,mirrored_sup_childspec,ordered_set,[*wosnfs at epbyminw2482t2*],[],[],0,read_write,false,[],[],false,mirrored_sup_childspec,[key,mirroring_pid,childspec],[],[],{{1327,664434,761441},*wosnfs at epbyminw2482t2*},{{2,0},[]}}\n"}}}
Stacktrace: [{rabbit_mnesia,init_db,3},
{rabbit_mnesia,init,0},
{rabbit,'-run_boot_step/1-lc$^1/1-1-',1},
{rabbit,run_boot_step,1},
{rabbit,'-start/2-lc$^0/1-0-',1},
{rabbit,start,2},
{application_master,start_it_old,4}]
{"Kernel pid
terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}}"}
All commands finished successfully except the last.
Could you help me to find out what kind of error appeared and why?
It seems that first node (epbyminw2482t2) has already been removed from
the cluster, why some information about the one left in mnesia on
another node and appears in error log? I suppose that that correct
removing of any node from cluster should not influence on others.
Problem is reproducible with arbitrary number of disc nodes.
It is interesting, that if we change an order joining to cluster - join
first node to second, then no error will appear.
Environment:
CentOS 6.0
Erlang R1403
RabbitMQ 2.7.1
--
Best regards,
Artsiom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120127/08026af7/attachment.htm>
More information about the rabbitmq-discuss
mailing list