[rabbitmq-discuss] Odd Behavior w/ Restoring Broken Cluster

Matthias Radestock matthias at rabbitmq.com
Wed Jul 10 07:56:19 BST 2013


Chris,

apologies for the late reply...

On 08/07/13 16:11, Chris wrote:
> I noticed some odd behavior when trying to restore a broken cluster
> that I think may be a bug.
> [...]
>     *[root at rabbit-a ~]# rabbitmqctl stop*
>     Stopping and halting node 'rabbit at rabbit-a' ...
>
>     =INFO REPORT==== 8-Jul-2013::09:45:48 ===
>     Halting Erlang VM
>     Error: {{badmatch,undefined},
>
>     [{rabbit_plugins,active,0,[{file,"src/rabbit_plugins.erl"},{line,48}]},
>
>       {rabbit,app_shutdown_order,0,[{file,"src/rabbit.erl"},{line,476}]},
>               {rabbit,stop,0,[{file,"src/rabbit.erl"},{line,380}]},
>               {rabbit,stop_and_halt,0,[{file,"src/rabbit.erl"},{line,384}]},
>
>       {rpc,'-handle_call_call/6-fun-0-',5,[{file,"rpc.erl"},{line,205}]}]}

Yep, that's a bug.

>     *[root at rabbitmq-b ~]# rabbitmqctl reset*
>     Resetting node 'rabbit at rabbitmq-b' ...
>
>     =INFO REPORT==== 8-Jul-2013::09:49:29 ===
>     Resetting Rabbit
>
>     =INFO REPORT==== 8-Jul-2013::09:49:29 ===
>          application: mnesia
>          exited: stopped
>          type: temporary
>     Error: {version_mismatch,[],
>                               [add_ip_to_listener,exchange_decorators,
>                                exchange_event_serial,gm,gm_pids,
>                                mirrored_supervisor,remove_user_scope,
>
>     runtime_parameters,semi_durable_route,topic_trie,
>
>     topic_trie_node,user_admin_to_tags,add_queue_ttl,
>                                multiple_routing_keys]}

And that is a bug too. Running force_reset instead would probably avoid 
this error.

>   But here is the WEIRD thing.  Now go back to rabbit-a and get the
> cluster_status.  It seems that rabbit-b has magically rejoined the cluster!/
> /
> /
>
>     *[root at rabbitmq-a ~]# rabbitmqctl cluster_status*
>     Cluster status of node 'rabbit at rabbitmq-a' ...
>     [{nodes,[{disc,['rabbit at rabbitmq-b','rabbit at rabbitmq-a']}]},
>       {running_nodes,['rabbit at rabbitmq-a']},
>       {partitions,[]}]
>     ...done.
>
>
> Sure enough, if we restart rabbit-b, it will be operating in a cluster
> with rabbit-a again:
>
>     *[root at rabbitmq-b ~]# unset RABBITMQ_NODE_ONLY*
>     *[root at rabbitmq-b ~]# rabbitmq-server &*
>     [1] 15775
>     *[root at rabbitmq-b ~]# rabbitmqctl cluster_status*
>     Cluster status of node 'rabbit at rabbitmq-b' ...
>     [{nodes,[{disc,['rabbit at rabbitmq-b','rabbit at rabbitmq-a']}]},
>       {running_nodes,['rabbit at vm-rh62-cmoesel','rabbitmq-b']},
>       {partitions,[]}]
>     ...done.

And that is another bug.

> I guess in this case I will just delete the mnesia directory instead
> of trying to do a reset.

force_reset should do the trick.


Thanks for reporting this.


Regards,

Matthias.


More information about the rabbitmq-discuss mailing list