[rabbitmq-discuss] Cluster unresponsive after some stop/start of a node.

Markiyan Kushnir markiyan.kushnir at gmail.com
Sun May 20 08:15:11 BST 2012


On 20.05.2012 10:08, Markiyan Kushnir wrote:
> Here is my setup:
>
> rabbit at qwe is at 10.0.0.1 (initially the master)
> rabbit at asd is at 10.0.0.2 (initially a slave)
>
> asd has joined the cluster with qwe -- OK.
>
> In my tests I need to stop/start a cluster node -- qwe, which is a
> master for my test queues. I use /usr/sbin/rabbitmqctl {stop|start}_app
> for it -- everything is OK.
>
> In order to test slave promotion, I first stop the master (qwe), then
> after some time I start it, so that it now becomes a slave.
>
> At the end of the test I stop asd, then start it, so that qwe takes
> queues mastership back over.
>
> During my test, the cluster serves two clients: a message producer and
> a message consumer, running some low rate communication through the
> slave node (asd).
>


Forgot to clarify: I use mirrored queues, and my "consuming" client 
supports the Consumer Cancellation Notifications extension.

--
Markiyan


>
> Now after a couple of tests, when attempting to do start_app on asd, I
> get (after some pause):
>
> Starting node 'rabbit at asd' ...
> Error: {cannot_start_application,rabbit,
> {bad_return,
> {{rabbit,start,[normal,[]]},
> {'EXIT',{rabbit,failure_during_boot}}}}}
>
>
>
> cluster_status on qwe says:
>
> Cluster status of node 'rabbit at qwe' ...
> [{nodes,[{disc,['rabbit at qwe']},{ram,['rabbit at asd']}]},
> {running_nodes,['rabbit at qwe']}]
> ...done.
>
>
> And cluster_status on asd says:
>
> Cluster status of node 'rabbit at asd' ...
> [{nodes,[{unknown,['rabbit at asd']}]},{running_nodes,[]}]
> ...done.
>
> Now I want to remove asd from the cluster... An attempt to run
> stop_app/reset on asd gives (after some pause as well):
>
> Resetting node 'rabbit at asd' ...
> Error: {timeout_waiting_for_tables,[gm_group]}
>
>
> In this situation I can only throw the entire cluster away and create
> a new one...
>
> How can I recover from this situation?
>
>
> Thanks,
> Markiyan.



More information about the rabbitmq-discuss mailing list