[rabbitmq-discuss] HA active/active cluster in a bad state

Tue Oct 4 22:59:17 BST 2011

Hi Bryan,

On Tue, Oct 04, 2011 at 03:57:09PM -0500, Bryan Murphy wrote:
> This brought the server back up.  However, it's not functioning correctly.
>  For example, sudo rabbitmqctl cluster_status works fine:
> 
> Cluster status of node 'rabbit at domU-12-31-38-07-18-A6' ...
> [{nodes,[{disc,['rabbit at domU-12-31-38-07-18-A6','rabbit at ip-10-202-209-83',
>                 'rabbit at domU-12-31-39-06-72-50']}]},
>  {running_nodes,['rabbit at domU-12-31-39-06-72-50','rabbit at ip-10-202-209-83',
>                  'rabbit at domU-12-31-38-07-18-A6']}]
> ...done.
> 
> however, sudo rabbitmqctl list_queues blocks and never returns.
> 
> I'm not touching anything else while the cluster is in this state.  What
> diagnostics can I provide to help track down this problem?

Ok, well you can Ctl-C the list_queues. On one of the other nodes, what
does rabbitmqctl cluster_status return?

How big were the queues? We recently fixed some bugs which had
previously been causing queue recovery to take a _very_ long time so it
might be one of those that's afflicting you. What is the CPU/disk doing
of the "stuck" node? If it's spinning then it's probably just taking a
very long time to recover.

Matthew