[rabbitmq-discuss] Odd Behavior w/ Restoring Broken Cluster
Matthias Radestock
matthias at rabbitmq.com
Wed Jul 10 07:56:19 BST 2013
Chris,
apologies for the late reply...
On 08/07/13 16:11, Chris wrote:
> I noticed some odd behavior when trying to restore a broken cluster
> that I think may be a bug.
> [...]
> *[root at rabbit-a ~]# rabbitmqctl stop*
> Stopping and halting node 'rabbit at rabbit-a' ...
>
> =INFO REPORT==== 8-Jul-2013::09:45:48 ===
> Halting Erlang VM
> Error: {{badmatch,undefined},
>
> [{rabbit_plugins,active,0,[{file,"src/rabbit_plugins.erl"},{line,48}]},
>
> {rabbit,app_shutdown_order,0,[{file,"src/rabbit.erl"},{line,476}]},
> {rabbit,stop,0,[{file,"src/rabbit.erl"},{line,380}]},
> {rabbit,stop_and_halt,0,[{file,"src/rabbit.erl"},{line,384}]},
>
> {rpc,'-handle_call_call/6-fun-0-',5,[{file,"rpc.erl"},{line,205}]}]}
Yep, that's a bug.
> *[root at rabbitmq-b ~]# rabbitmqctl reset*
> Resetting node 'rabbit at rabbitmq-b' ...
>
> =INFO REPORT==== 8-Jul-2013::09:49:29 ===
> Resetting Rabbit
>
> =INFO REPORT==== 8-Jul-2013::09:49:29 ===
> application: mnesia
> exited: stopped
> type: temporary
> Error: {version_mismatch,[],
> [add_ip_to_listener,exchange_decorators,
> exchange_event_serial,gm,gm_pids,
> mirrored_supervisor,remove_user_scope,
>
> runtime_parameters,semi_durable_route,topic_trie,
>
> topic_trie_node,user_admin_to_tags,add_queue_ttl,
> multiple_routing_keys]}
And that is a bug too. Running force_reset instead would probably avoid
this error.
> But here is the WEIRD thing. Now go back to rabbit-a and get the
> cluster_status. It seems that rabbit-b has magically rejoined the cluster!/
> /
> /
>
> *[root at rabbitmq-a ~]# rabbitmqctl cluster_status*
> Cluster status of node 'rabbit at rabbitmq-a' ...
> [{nodes,[{disc,['rabbit at rabbitmq-b','rabbit at rabbitmq-a']}]},
> {running_nodes,['rabbit at rabbitmq-a']},
> {partitions,[]}]
> ...done.
>
>
> Sure enough, if we restart rabbit-b, it will be operating in a cluster
> with rabbit-a again:
>
> *[root at rabbitmq-b ~]# unset RABBITMQ_NODE_ONLY*
> *[root at rabbitmq-b ~]# rabbitmq-server &*
> [1] 15775
> *[root at rabbitmq-b ~]# rabbitmqctl cluster_status*
> Cluster status of node 'rabbit at rabbitmq-b' ...
> [{nodes,[{disc,['rabbit at rabbitmq-b','rabbit at rabbitmq-a']}]},
> {running_nodes,['rabbit at vm-rh62-cmoesel','rabbitmq-b']},
> {partitions,[]}]
> ...done.
And that is another bug.
> I guess in this case I will just delete the mnesia directory instead
> of trying to do a reset.
force_reset should do the trick.
Thanks for reporting this.
Regards,
Matthias.
More information about the rabbitmq-discuss
mailing list