[rabbitmq-discuss] Hang on "starting database ...." remains in 2.8.2 cluster

Wed May 9 14:29:00 BST 2012

Oh and of course you should change the "rabbitmq_server" and 
"rabbitmqctl" variables to reflect the location of those scripts on your 
machine.

Francesco.

On 09/05/12 14:27, Francesco Mazzoli wrote:
> Hi Matt,
>
> I've tried to reproduce your setup locally and I never get hanging
> nodes. I've attached the bash script and the config I'm using, can you
> confirm that that corresponds to what you're doing?
>
> You can use the script like this:
> ./test 10
> Where "10" is the number of times the node will be restarted.
>
> Francesco.
>
> On 09/05/12 00:38, Matt Pietrek wrote:
>> Hi Francesco,
>>
>> I run rabbitmq on 3 separate Ubuntu 10.04 64 bit VMs. Clustering is
>> enabled via the rabbitmq config file that lists all three hosts (all
>> them A, B, and C)
>>
>> I start up all the VMs concurrently (via Capistrano) and verify that
>> the cluster is running as expected. I then go through this sequence:
>>
>> --------
>> # On host A:
>> rabbitmqctl -n rabbit at A stop
>> nohup $RABBITMQ_SCRIPT_DIR/rabbitmq-server&
>> rabbitmqctl wait $PIDFILE
>>
>> # On host B:
>> rabbitmqctl -n rabbit at B stop
>> nohup $RABBITMQ_SCRIPT_DIR/rabbitmq-server&
>> rabbitmqctl wait $PIDFILE
>>
>> # On host C:
>> rabbitmqctl -n rabbit at C stop
>> nohup $RABBITMQ_SCRIPT_DIR/rabbitmq-server&
>> rabbitmqctl wait $PIDFILE
>> --------
>>
>> The idea being to bring down one server while still retaining two in
>> the cluster.
>>
>> During one of the start operations (it's not consistent from run to
>> run), rabbitmq-server will not finish starting up. The last line in
>> that node's nohup.dat file is:
>>
>> "starting database ....."
>>
>> FWIW, it might be helpful to put the shutdown/startup commands in a
>> script that you can loop over repeatedly so as to try the whole
>> sequence numerous times. We use Capistrano here to execute actions on
>> remote machines, but you can probably use SSH to get the same effect
>> from a script file.
>>
>> Let me know if you have other questions about our setup,
>>
>> Matt
>>
>>
>> On Tue, May 8, 2012 at 3:52 AM, Francesco Mazzoli
>> <francesco at rabbitmq.com> wrote:
>>> Hi Matt,
>>>
>>> Predictably I can't reproduce this. Since you say that it'll happen
>>> "inevitably" (while if I understand correctly in your previous
>>> messages it
>>> was tricky to reproduce), can you send us more information about your
>>> setup
>>> and the steps on how to trigger the problem?
>>>
>>> Francesco.
>>>
>>>
>>> On 04/05/12 23:55, Matt Pietrek wrote:
>>>>
>>>> I've written this alias before about this topic, and the problem
>>>> remains in 2.8.2. See:
>>>>
>>>> http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2012-February/018414.html
>>>>
>>>>
>>>> I have a three node cluster running RabbitMQ 2.82/Erlang R13B03 on
>>>> Ubuntu
>>>> 10.04.
>>>>
>>>> Once the cluster is up and running properly (as observed by the Web
>>>> UI), I then start/stop individual nodes in the cluster:
>>>> rabbitmqctl stop
>>>> rabbitmq-server
>>>>
>>>> Inevitably one of the nodes won't come back up, waiting forever on
>>>> "starting" the database (no 30 second timeout... Forever.)
>>>>
>>>> The only way to get all three nodes functioning again together is to
>>>> forcibly stop the other two nodes, then restart them all again.
>>>>
>>>>
>>>> The first item below is the console output as captured via nohup,
>>>> showing "starting database" as the last item.
>>>> The second item below is the last few lines of the rabbit@<node>.log
>>>> file, showing the node shutting down, then beginning to start up
>>>> again.
>>>>
>>>> Is it likely that a newer Erlang version would help out?
>>>> What else can I provide to help diagnose this?
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>> --------
>>>> node : rabbit at util
>>>> app descriptor :
>>>> /usr/lib/rabbitmq/lib/rabbitmq_server-2.8.2/sbin/../ebin/rabbit.app
>>>> home dir : /home/mpietrek
>>>> config file(s) : /home/mpietrek/work/var/run/rabbitmq.config
>>>> cookie hash : pR5H9kY3Wra/XdLELT5hgQ==
>>>> log :
>>>>
>>>> /home/mpietrek/work/logs/util.mpietrek.internal.illumita.com/rabbit at util.log
>>>>
>>>> sasl log :
>>>>
>>>> /home/mpietrek/work/logs/util.mpietrek.internal.illumita.com/rabbit at util-sasl.log
>>>>
>>>> database dir : /home/mpietrek/work/var/lib/rabbit at util
>>>> erlang version : 5.7.4
>>>>
>>>> -- rabbit boot start
>>>> starting file handle cache server
>>>> ...done
>>>> starting worker pool
>>>> ...done
>>>> starting database ...
>>>>
>>>> --------
>>>>
>>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>>> application: rabbitmq_management_agent
>>>> exited: stopped
>>>> type: permanent
>>>>
>>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>>> stopped TCP Listener on 0.0.0.0:5672
>>>>
>>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>>> application: rabbit
>>>> exited: stopped
>>>> type: permanent
>>>>
>>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>>> application: os_mon
>>>> exited: stopped
>>>> type: permanent
>>>>
>>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>>> application: mnesia
>>>> exited: stopped
>>>> type: permanent
>>>>
>>>> =INFO REPORT==== 4-May-2012::15:02:14 ===
>>>> Halting Erlang VM
>>>>
>>>> =INFO REPORT==== 4-May-2012::15:02:52 ===
>>>> Limiting to approx 924 file handles (829 sockets)
>>>> _______________________________________________
>>>> rabbitmq-discuss mailing list
>>>> rabbitmq-discuss at lists.rabbitmq.com
>>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>
>>>
>