[rabbitmq-discuss] RabbitMQ hanging during 'rabbitmqctl stop' - please help

Thu Jul 25 22:58:16 BST 2013

Is there more information I can give you about my problem? Some way to
prevent the rabbit hang on service restart, or get it unstuck? So far,
the only way to get past it is to kill the beam process. I can't imagine
that is good for the integrity of the service.

I'm relatively new to working with RabbitMQ. I've been developing with
it for a couple months now, but I've been diligently researching this
problem and making trial-and-error attempts at black-box solutions for
the last two weeks to no avail. I'm really stuck and getting desperate
to find a solution.

This problem is persistent enough that I don't think we'll be able to
deploy into production without some understanding of the root cause and
a workaround.

I know this an open-source project, and I don't feel entitled to any
sort of timely response, but if this isn't the place to get help, where
else I should I look?

-Casey

On 07/23/2013 09:39 AM, Casey Marshall wrote:
> I'm using RabbitMQ federated exchanges among a number of servers. Each
> server has upstream exchanges on each of the others. I don't want to
> call it a cluster, since that has another meaning in RabbitMQ altogether
> ... let's call it a "federated group".
> 
> This worked fine over AMQP, both in my own local KVM-virtualized testing
> environment, and it seemed to as well in Amazon EC2 with unencrypted
> AMQP. However, I need to secure the connections in EC2 with AMQPS before
> we go into production.
> 
> I was able to automate creation and distribution of certificates among
> the servers, and the setup works fine in a local KVM-virtualized
> environment. However, when I started testing it in EC2 across regions, I
> started having problems. RabbitMQ hangs on some of the servers in EC2
> during 'service rabbitmq-server restart'.
> 
> I have scripts to automate updating the federated exchanges, to effect
> topology changes on the "federated group" -- adding or removing a
> server, or changing its role (different roles have different exchange
> upstreams). After this script updates the federation config, the
> subsequent restart hangs on some of the servers.
> 
> What should I do to make my RabbitMQ "federated group" more robust?
> 
> I've attached:
> 
> 1. rabbitmq.log and rabbitmq-sasl.log excerpt from one of the hosts that
> persistently gets stuck during a RabbitMQ 'stop', near the time of the hang.
> 2. 'rabbitmqctl status' from that host.
> 
> Installed on the system:
> 
> ii  esl-erlang                       1:16.b.1-1~ubuntu~precise       Erlang
> ii  rabbitmq-server                  3.1.3-1
> AMQP server written in Erlang
> 
> Much thanks,
> Casey
>