[rabbitmq-discuss] Disc node clustering

Mon Jan 30 13:03:37 GMT 2012

Hi, Simon.
> So it looks like you are leaving the cluster and then resetting, 
> rather than the other way around (which is more normal). It looks like 
> the request to leave the cluster gets ignored in this case. I'll file 
> a bug.

I've tried scenario, when removing node from cluster is performed like this:

1) join machine B (epbyminw2482t3) to machine A (epbyminw2482t2).

[root at epbyminw2482t3 ~]# rabbitmqctl stop_app && rabbitmqctl reset && 
rabbitmqctl cluster wosnfs@`hostname -s` wosnfs at epbyminw2482t2 && 
rabbitmqctl start_app
<-CUT->
Clustering node wosnfs at epbyminw2482t3 with [wosnfs at epbyminw2482t3,
                                             wosnfs at epbyminw2482t2] ...
...done.
Starting node wosnfs at epbyminw2482t3 ...
...done.

Everything is ok.

2) remove node A from cluster

[root at epbyminw2482t2 ~]# *rabbitmqctl stop_app && rabbitmqctl reset && 
rabbitmqctl start_app*
Stopping node wosnfs at epbyminw2482t2 ...
...done.
Resetting node wosnfs at epbyminw2482t2 ...
...done.
Starting node wosnfs at epbyminw2482t2 ...
...done.

 From now node A and B are standalone, and operate/independently./

3) restart broker on node B.
Restarting rabbitmq-server: FAILED - check 
/var/log/rabbitmq/startup_{log, _err}
rabbitmq-server.

[root at epbyminw2482t3 ~]# cat /var/log/rabbitmq/startup_log
Activating RabbitMQ plugins ...
0 plugins activated:
<-CUT->
node           : wosnfs at epbyminw2482t3
app descriptor : 
/usr/lib/rabbitmq/lib/rabbitmq_server-2.7.1/sbin/../ebin/rabbit.app
home dir       : /var/lib/rabbitmq
config file(s) : (none)
<-CUT->
erlang version : 5.8.4

-- rabbit boot start
starting file handle cache server                                     
...done
starting worker pool                                                  
...done
starting database                                                     
...BOOT ERROR: FAILED
Reason: {error,
             {unable_to_join_cluster,
                 [wosnfs at epbyminw2482t3,wosnfs at epbyminw2482t2],
                 {merge_schema_failed,
                     "Bad cookie in table definition 
mirrored_sup_childspec: wosnfs at epbyminw2482t3 = 
{cstruct,mirrored_sup_childspec,ordered_set,[wosnfs at epbyminw2482t3],[],[],0,read_write,false,[],[],false,mirrored_sup_childspec,[key,mirroring_pid,childspec],[],[],{{1327,927347,889571},wosnfs at epbyminw2482t2},{{3,1},{wosnfs at epbyminw2482t3,{1327,927549,609065}}}}, 
wosnfs at epbyminw2482t2 = 
{cstruct,mirrored_sup_childspec,ordered_set,[wosnfs at epbyminw2482t2],[],[],0,read_write,false,[],[],false,mirrored_sup_childspec,[key,mirroring_pid,childspec],[],[],{{1327,927550,935402},wosnfs at epbyminw2482t2},{{2,0},[]}}\n"}}}
Stacktrace: [{rabbit_mnesia,init_db,3},
              {rabbit_mnesia,init,0},
              {rabbit,'-run_boot_step/1-lc$^1/1-1-',1},
              {rabbit,run_boot_step,1},
              {rabbit,'-start/2-lc$^0/1-0-',1},
              {rabbit,start,2},
              {application_master,start_it_old,4}]
{"Kernel pid 
terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}}"}

We got this error.

3.1) But, if we stop application on node A
[root at epbyminw2482t2 ~]# rabbitmqctl stop_app

3.2)[root at epbyminw2482t3 ~]# service rabbitmq-server restart
Restarting rabbitmq-server: RabbitMQ is not running
SUCCESS
rabbitmq-server.

Rabbitmq broker on node B(!) restarts successfully.
Until broker(or application) on node A is down, node B work fine.
When broker on node A is up, node  B faces above mentioned troubles on 
restart (or `rabbitmqctl reset`).
Workaround is to remove mnesia db on node B, but it is bad idea, while 
node B is still joined to the cluster.
It seems that there is some kind of rule about order of joining and 
removing nodes from cluster...

In our application all rabbitmq brokers have to joined in cluster as 
disk nodes, and it have to be possible
to add\remove any node at any time. Could you suggest me, have to 
achieve that and avoid "bad cookie" problem?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120130/9e40c04f/attachment.htm>