<div dir="ltr">Hello,<div><br></div><div style>I am using RabbitMQ 3.1.1 on RedHat 6.2. I noticed some odd behavior when trying to restore a broken cluster that I think may be a bug. In short, when I "forget" a node in the cluster, then later call "rabbitmqctl reset" on it, it re-adds itself to the cluster.</div>
<div style><br></div><div style>It's actually more complicated than that, but completely reproducible, so here are the steps:</div><div style><br></div><div style><i>Assuming two nodes in a cluster: rabbit-a and rabbit-b. </i></div>
<div style><i><br></i></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><div style><b>[root@rabbit-a ~]# rabbitmqctl stop</b></div></div><div style><div style>Stopping and halting node 'rabbit@rabbit-a' ...</div>
</div><div style><div style>...done.</div></div><div style><div style><br></div></div><div style><div style><div style><b>[root@rabbit-b ~]# rabbitmqctl stop</b></div></div></div><div style><div style>Stopping and halting node 'rabbit@rabbit-b' ...</div>
</div><div style><div style>...done.</div></div></blockquote><div style><i><br></i></div><div style><i>Now we will assume we need to start rabbit-a without rabbit-b, which is all sorts of fun since rabbit-b was the last one down. Based on what I've read, we need to start rabbit-a in node-only mode and then forget rabbit-b.</i></div>
<div style><i><br></i></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><div style><b>[root@rabbit-a ~]# export RABBITMQ_NODE_ONLY=true</b></div></div><div style><div style><b>[root@rabbit-a ~]# rabbitmq-server &</b></div>
</div><div style><div style>[1] 19386</div></div><div style><div style><b>[root@rabbit-a ~]# rabbitmqctl forget_cluster_node --offline rabbit@rabbit-b</b></div></div><div style><div style><div style>Removing node 'rabbit@rabbit-b' from cluster ...</div>
</div></div><div style><div style><br></div></div><div style><div style>=INFO REPORT==== 8-Jul-2013::09:45:34 ===</div></div><div style><div style>Removing node 'rabbit@rabbit-b' from cluster</div></div><div style>
<div style><br></div></div><div style><div style>=INFO REPORT==== 8-Jul-2013::09:45:34 ===</div></div><div style><div style> application: mnesia</div></div><div style><div style> exited: stopped</div></div><div style>
<div style> type: temporary</div></div><div style><div style>...done.</div></div><div style><div style><b>[root@rabbit-a ~]# rabbitmqctl stop</b></div></div><div style><div style>Stopping and halting node 'rabbit@rabbit-a' ...</div>
</div><div style><div style><br></div></div><div style><div style>=INFO REPORT==== 8-Jul-2013::09:45:48 ===</div></div><div style><div style>Halting Erlang VM</div></div><div style><div style>Error: {{badmatch,undefined},</div>
</div><div style><div style> [{rabbit_plugins,active,0,[{file,"src/rabbit_plugins.erl"},{line,48}]},</div></div><div style><div style> {rabbit,app_shutdown_order,0,[{file,"src/rabbit.erl"},{line,476}]},</div>
</div><div style><div style> {rabbit,stop,0,[{file,"src/rabbit.erl"},{line,380}]},</div></div><div style><div style> {rabbit,stop_and_halt,0,[{file,"src/rabbit.erl"},{line,384}]},</div>
</div><div style><div style> {rpc,'-handle_call_call/6-fun-0-',5,[{file,"rpc.erl"},{line,205}]}]}</div></div></blockquote><div style><br></div><div style><i>Note the error above when it was stopped-- I'm not sure if that is expected. Anyway, let's now turn off the node-only mode and start the server again. It's successful and note that the cluster status contains only its own node:</i></div>
<div style><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><b>[root@rabbit-a ~]# unset RABBITMQ_NODE_ONLY</b></div><div style><div style><b>[root@rabbit-a ~]# rabbitmq-server &</b></div>
</div><div style><div style>[1] 21349</div></div><div style><div style><div style><b>[root@rabbit-a ~]# rabbitmqctl cluster_status</b></div></div></div><div style><div style><div style>Cluster status of node 'rabbit@rabbit-a' ...</div>
</div></div><div style><div style><div style>[{nodes,[{disc,['rabbit@rabbit-a']}]},</div></div></div><div style><div style><div style> {running_nodes,['rabbit@rabbit-a']},</div></div></div><div style><div style>
<div style> {partitions,[]}]</div></div></div><div style><div style><div style>...done.</div></div></div></blockquote><div style><div><br></div><div style><i>So far so good. But let's assume we're ready to bring rabbit-b back online. If we try without making any changes, it will fail due to this error (which I guess is expected):</i></div>
<div style><br></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div style><div style>{"init terminating in do_boot",{rabbit,failure_during_boot,{error,{inconsistent_cluster,"Node 'rabbit@rabbit-b' thinks it's clustered with node 'rabbit@rabbit-a', but 'rabbit@rabbit-a' disagrees"}}}}</div>
</div></blockquote><div style><div style><br></div><div style><i>OK. So I guess we need to reset rabbit-b before we can start it again. I know we could delete the mnesia directory, but let's not be so brute force about it. Let's put it in node-only mode and use rabbitmqctl reset:</i></div>
<div style><i><br></i></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><div style><div style><b>[root@rabbitmq-b ~]# export RABBITMQ_NODE_ONLY=true</b></div></div></div><div style>
<div style><div style><b>[root@rabbitmq-b ~]# rabbitmq-server &</b></div></div></div><div style><div style><div style>[1] 13647</div></div></div><div style><div style><div style><div style><b>[root@rabbitmq-b ~]# rabbitmqctl reset</b></div>
</div></div></div><div style><div style><div style><div style>Resetting node 'rabbit@rabbitmq-b' ...</div></div></div></div><div style><div style><div style><div style><br></div></div></div></div><div style><div style>
<div style><div style>=INFO REPORT==== 8-Jul-2013::09:49:29 ===</div></div></div></div><div style><div style><div style><div style>Resetting Rabbit</div></div></div></div><div style><div style><div style><div style><br></div>
</div></div></div><div style><div style><div style><div style>=INFO REPORT==== 8-Jul-2013::09:49:29 ===</div></div></div></div><div style><div style><div style><div style> application: mnesia</div></div></div></div><div style>
<div style><div style><div style> exited: stopped</div></div></div></div><div style><div style><div style><div style> type: temporary</div></div></div></div><div style><div style><div style><div style>Error: {version_mismatch,[],</div>
</div></div></div><div style><div style><div style><div style> [add_ip_to_listener,exchange_decorators,</div></div></div></div><div style><div style><div style><div style> exchange_event_serial,gm,gm_pids,</div>
</div></div></div><div style><div style><div style><div style> mirrored_supervisor,remove_user_scope,</div></div></div></div><div style><div style><div style><div style> runtime_parameters,semi_durable_route,topic_trie,</div>
</div></div></div><div style><div style><div style><div style> topic_trie_node,user_admin_to_tags,add_queue_ttl,</div></div></div></div><div style><div style><div style><div style> multiple_routing_keys]}</div>
</div></div></div><div style><div style><div style><div style><b>[root@rabbitmq-b ~]# rabbitmqctl stop</b></div></div></div></div><div style><div style><div style><div style>Stopping and halting node 'rabbit@rabbitmq-b' ...</div>
</div></div></div></blockquote><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><div style><div style><div style><br></div></div></div></div><div style><div style><div style><div style>=INFO REPORT==== 8-Jul-2013::09:50:23 ===</div>
</div></div></div><div style><div style><div style><div style>Halting Erlang VM</div></div></div></div><div style><div style><div style><div style>Error: {{badmatch,undefined},</div></div></div></div><div style><div style>
<div style><div style> [{rabbit_plugins,active,0,[{file,"src/rabbit_plugins.erl"},{line,48}]},</div></div></div></div><div style><div style><div style><div style> {rabbit,app_shutdown_order,0,[{file,"src/rabbit.erl"},{line,476}]},</div>
</div></div></div><div style><div style><div style><div style> {rabbit,stop,0,[{file,"src/rabbit.erl"},{line,380}]},</div></div></div></div><div style><div style><div style><div style> {rabbit,stop_and_halt,0,[{file,"src/rabbit.erl"},{line,384}]},</div>
</div></div></div><div style><div style><div style><div style> {rpc,'-handle_call_call/6-fun-0-',5,[{file,"rpc.erl"},{line,205}]}]}</div></div></div></div></blockquote><div style><div style><div>
<br></div><div style><i>Note again the error when stopping, but also the error when resetting. But here is the WEIRD thing. Now go back to rabbit-a and get the cluster_status. It seems that rabbit-b has magically rejoined the cluster!</i></div>
<div style><i><br></i></div></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><div style><div style><div style><b>[root@rabbitmq-a ~]# rabbitmqctl cluster_status</b></div></div></div>
</div><div style><div style><div style><div style>Cluster status of node 'rabbit@rabbitmq-a' ...</div></div></div></div><div style><div style><div style><div style>[{nodes,[{disc,['<font color="#ff0000">rabbit@rabbitmq-b</font>','rabbit@rabbitmq-a']}]},</div>
</div></div></div><div style><div style><div style><div style> {running_nodes,['rabbit@rabbitmq-a']},</div></div></div></div><div style><div style><div style><div style> {partitions,[]}]</div></div></div></div><div style>
<div style><div style><div style>...done.</div></div></div></div></blockquote><div style><div style><div style><div><br></div><div style>Sure enough, if we restart rabbit-b, it will be operating in a cluster with rabbit-a again:</div>
<div style><br></div></div></div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div style><div style><div style><div style><div style><b>[root@rabbitmq-b ~]# unset RABBITMQ_NODE_ONLY</b></div></div>
</div></div></div><div style><div style><div style><div style><div style><div style><b>[root@rabbitmq-b ~]# rabbitmq-server &</b></div></div></div></div></div></div><div style><div style><div style><div style><div style>
[1] 15775</div></div></div></div></div><div style><div style><div style><div style><div style><b>[root@rabbitmq-b ~]# rabbitmqctl cluster_status</b></div></div></div></div></div><div style><div style><div style><div style>
<div style>Cluster status of node 'rabbit@rabbitmq-b' ...</div></div></div></div></div><div style><div style><div style><div style><div style>[{nodes,[{disc,['rabbit@rabbitmq-b','rabbit@rabbitmq-a']}]},</div>
</div></div></div></div><div style><div style><div style><div style><div style> {running_nodes,['rabbit@vm-rh62-cmoesel','rabbitmq-b']},</div></div></div></div></div><div style><div style><div style><div style>
<div style> {partitions,[]}]</div></div></div></div></div><div style><div style><div style><div style><div style>...done.</div></div></div></div></div></blockquote><div style><div style><div style><div style><br></div><div style>
So-- this is not at all what I expected. Seems like a bug, right? I guess in this case I will just delete the mnesia directory instead of trying to do a reset.</div><div style><br></div><div style>-Chris</div></div></div>
<div style><br></div><div style><br></div><div style><br></div></div></div>