[rabbitmq-discuss] Hang on "starting database ...." remains in 2.8.2 cluster

Francesco Mazzoli francesco at rabbitmq.com
Wed May 9 14:27:06 BST 2012


Hi Matt,

I've tried to reproduce your setup locally and I never get hanging 
nodes. I've attached the bash script and the config I'm using, can you 
confirm that that corresponds to what you're doing?

You can use the script like this:
     ./test 10
Where "10" is the number of times the node will be restarted.

Francesco.

On 09/05/12 00:38, Matt Pietrek wrote:
> Hi Francesco,
>
> I run rabbitmq on 3 separate Ubuntu 10.04 64 bit VMs. Clustering is
> enabled via the rabbitmq config file that lists all three hosts (all
> them A, B, and C)
>
> I start up all the VMs concurrently (via Capistrano) and verify that
> the cluster is running as expected. I then go through this sequence:
>
> --------
> # On host A:
> rabbitmqctl -n rabbit at A stop
> nohup $RABBITMQ_SCRIPT_DIR/rabbitmq-server&
> rabbitmqctl wait $PIDFILE
>
> # On host B:
> rabbitmqctl -n rabbit at B stop
> nohup $RABBITMQ_SCRIPT_DIR/rabbitmq-server&
> rabbitmqctl wait $PIDFILE
>
> # On host C:
> rabbitmqctl -n rabbit at C stop
> nohup $RABBITMQ_SCRIPT_DIR/rabbitmq-server&
> rabbitmqctl wait $PIDFILE
> --------
>
> The idea being to bring down one server while still retaining two in
> the cluster.
>
> During one of the start operations (it's not consistent from run to
> run), rabbitmq-server will not finish starting up. The last line in
> that node's nohup.dat file is:
>
> "starting database   ....."
>
> FWIW, it might be helpful to put the shutdown/startup commands in a
> script that you can loop over repeatedly so as to try the whole
> sequence numerous times. We use Capistrano here to execute actions on
> remote machines, but you can probably use SSH to get the same effect
> from a script file.
>
> Let me know if you have other questions about our setup,
>
> Matt
>
>
> On Tue, May 8, 2012 at 3:52 AM, Francesco Mazzoli
> <francesco at rabbitmq.com>  wrote:
>> Hi Matt,
>>
>> Predictably I can't reproduce this. Since you say that it'll happen
>> "inevitably" (while if I understand correctly in your previous messages it
>> was tricky to reproduce), can you send us more information about your setup
>> and the steps on how to trigger the problem?
>>
>> Francesco.
>>
>>
>> On 04/05/12 23:55, Matt Pietrek wrote:
>>>
>>> I've written this alias before about this topic, and the problem
>>> remains in 2.8.2. See:
>>>
>>> http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2012-February/018414.html
>>>
>>> I have a three node cluster running RabbitMQ 2.82/Erlang R13B03 on Ubuntu
>>> 10.04.
>>>
>>> Once the cluster is up and running properly (as observed by the Web
>>> UI), I then start/stop individual nodes in the cluster:
>>>      rabbitmqctl stop
>>>      rabbitmq-server
>>>
>>> Inevitably one of the nodes won't come back up, waiting forever on
>>> "starting" the database (no 30 second timeout... Forever.)
>>>
>>> The only way to get all three nodes functioning again together is to
>>> forcibly stop the other two nodes, then restart them all again.
>>>
>>>
>>> The first item below is the console output as captured via nohup,
>>> showing "starting database" as the last item.
>>> The second item below is the last few lines of the rabbit@<node>.log
>>> file, showing the node shutting down, then beginning to start up
>>> again.
>>>
>>> Is it likely that a newer Erlang version would help out?
>>> What else can I provide to help diagnose this?
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>> --------
>>> node           : rabbit at util
>>> app descriptor :
>>> /usr/lib/rabbitmq/lib/rabbitmq_server-2.8.2/sbin/../ebin/rabbit.app
>>> home dir       : /home/mpietrek
>>> config file(s) : /home/mpietrek/work/var/run/rabbitmq.config
>>> cookie hash    : pR5H9kY3Wra/XdLELT5hgQ==
>>> log            :
>>>
>>> /home/mpietrek/work/logs/util.mpietrek.internal.illumita.com/rabbit at util.log
>>> sasl log       :
>>>
>>> /home/mpietrek/work/logs/util.mpietrek.internal.illumita.com/rabbit at util-sasl.log
>>> database dir   : /home/mpietrek/work/var/lib/rabbit at util
>>> erlang version : 5.7.4
>>>
>>> -- rabbit boot start
>>> starting file handle cache server
>>> ...done
>>> starting worker pool
>>>   ...done
>>> starting database                                                     ...
>>>
>>> --------
>>>
>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>>      application: rabbitmq_management_agent
>>>      exited: stopped
>>>      type: permanent
>>>
>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>> stopped TCP Listener on 0.0.0.0:5672
>>>
>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>>      application: rabbit
>>>      exited: stopped
>>>      type: permanent
>>>
>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>>      application: os_mon
>>>      exited: stopped
>>>      type: permanent
>>>
>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>>      application: mnesia
>>>      exited: stopped
>>>      type: permanent
>>>
>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>> Halting Erlang VM
>>>
>>> =INFO REPORT==== 4-May-2012::15:02:52 ===
>>> Limiting to approx 924 file handles (829 sockets)
>>> _______________________________________________
>>> rabbitmq-discuss mailing list
>>> rabbitmq-discuss at lists.rabbitmq.com
>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>

-------------- next part --------------
[
 {rabbit, [{cluster_nodes, [rabbit at mcnulty, hare at mcnulty, franc at mcnulty]}]}
].
-------------- next part --------------
#!/bin/bash

rabbitmq_server=~/src/rabbitmq-umbrella/rabbitmq-server/scripts/rabbitmq-server
rabbitmqctl=~/src/rabbitmq-umbrella/rabbitmq-server/scripts/rabbitmqctl

function start {
    RABBITMQ_NODE_PORT=$2 \
    RABBITMQ_SERVER_START_ARGS="-rabbitmq_mochiweb listeners [{mgmt,[{port,5"$2"}]}]" \
    RABBITMQ_NODENAME=$1 \
    RABBITMQ_MNESIA_DIR=/tmp/rabbitmq-$1-mnesia \
    RABBITMQ_PLUGINS_EXPAND_DIR=/tmp/rabbitmq-$1-plugins-scratch \
    RABBITMQ_LOG_BASE=/tmp \
    RABBITMQ_CONFIG_FILE=`pwd`/rabbitmq \
    $rabbitmq_server </dev/null &>/dev/null &

    $rabbitmqctl -n $1 wait /tmp/rabbitmq-$1-mnesia.pid
}

function stop {
    $rabbitmqctl -n $1 stop /tmp/rabbitmq-$1-mnesia.pid
}

function restart {
    stop $1
    start $1 $2
}

# To use the functions on their own
if [ $1 = "start" ]; then
    start $2 $3
elif [ $1 = "stop" ]; then
    stop $2
elif [ $1 = "restart" ]; then
    restart $2 $3
elif [ $1 = "clean" ]; then
    rm -rf /tmp/rabbit*
    rm -rf /tmp/hare*
    rm -rf /tmp/franc*
elif ! [[ $1 =~ ^[0-9]+$ ]] ; then
    name=`basename $0`
    echo "Usage:"
    echo "./$name start nodename port"
    echo "./$name stop nodename"
    echo "./$name restart nodename port"
    echo "./$name n -- n is the number of times the nodes will be restarted"
else
    echo "----------------------------------------------------------"
    echo "---- Starting nodes --------------------------------------"
    echo "----------------------------------------------------------"

    start rabbit 5672
    start hare 5673
    start franc 5674

    for (( i = 1; i <= $1; i++ )); do
        echo "----------------------------------------------------------"
        echo "---- Restarting nodes ------------------------------------"
        echo "----------------------------------------------------------"
        
        restart rabbit 5672
        restart hare 5673
        restart franc 5674
    done
    
    echo "----------------------------------------------------------"
    echo "---- Stopping nodes -------------------------------------"
    echo "----------------------------------------------------------"
    
    stop rabbit 
    stop hare 
    stop franc 
fi



More information about the rabbitmq-discuss mailing list