[rabbitmq-discuss] Problems With OCF Script - Race between start and wait

Tue May 10 00:23:28 BST 2011

Thanks, Matthew.

It seems like an edge case that would be easy to miss...  I should reiterate that my shared volume is slow.  slooowww, as in 4 MB/sec according to dd.  This is also just a test instance and not a production installation, so the priority is pretty low (except for the case when there is a large amount of data in the queue?).

> ________________________________________
> From: rabbitmq-discuss-bounces at lists.rabbitmq.com [rabbitmq-discuss-bounces at lists.rabbitmq.com] on behalf of Matthew Sackman [matthew at rabbitmq.com]
> Sent: Monday, May 09, 2011 9:54 AM
> 
> Hi Chris,
> 
> On Fri, May 06, 2011 at 04:27:40PM +0000, Chris Chew wrote:
> > Doing some testing on 2.4.1 towards upgrading from 2.2, I've noticed an odd behavior when using the OCF scripts to start Rabbit ala the Pacemaker guide.
> >
> > There seems to be a race condition in the time I call `rabbitmq-server...` and then call `rabbitmqctl -n <host> wait`:  If the erlang node hasn't started up yet, then `rabbitmqctl -n <host> wait` will fail saying the node is down.  This failure causes the OCF process to fail and then falsely report that Rabbit is not running (rabbit still starts up just fine).
> >
> > I fixed this temporarily on my installation by adding `sleep 2` in the rabbit_start() method of the rabbitmq-server OCF script between the lines where the server is started and the rabbit_wait function is invoked.
> >
> > Interesting, and this through off the path for a while, the race condition is only lost when I install the management plugin && set mnesia_base to a shared (network-attached) volume.  Just one or the other is not enough to lose the race, but both are.  This might also helps explain why this might not have been seen before.
> 
> Having thought about this some more, I'm now puzzled. ctl -n <host> wait
> should wait for a minimum of 5 seconds before giving up on the host.
> This should be enough time for the rabbit erlang node to start (if not
> fully boot). As long as the erlang node is contactable in some form
> within 5 seconds then the wait will then wait forever for the erlang
> node to fully boot and then report if rabbit's managed to start up ok or
> not.
> 
> What you're reporting seems to be the inverse: i.e. if the server starts
> too quickly then you encounter a problem.
> 
> Could you clarify?

Yes, I am seeing that 5 second wait behavior from rabbitmqctl, it's just that apparently 5 seconds is not long enough for the rabbit node to boot when using this shared volume as the mnesia_base.  I'm wondering if the rabbit_start() function should have a "spinner" built in just like the rabbit_stop() function has.

Here's a more detailed description of the behavior I'm seeing:

With a stock 2.4.1 setup, the problem is that crm_mon thinks the rabbit process did not start up, even though it actually has.  The Pacemaker logs look like this:

========================================================================
May  5 16:04:27 sm6nic02 lrmd: [1339]: info: rsc:rabbitmq-service:116: start
May  5 16:04:27 sm6nic02 crmd: [1342]: info: process_lrm_event: LRM operation rabbitmq-ip_monitor_10000 (call=115, rc=0, cib-update=214, confirmed=false) ok
May  5 16:04:32 sm6nic02 rabbitmq-server[30971]: INFO: rabbitmq-server start failed: 7
May  5 16:04:32 sm6nic02 crmd: [1342]: info: process_lrm_event: LRM operation rabbitmq-service_start_0 (call=116, rc=1, cib-update=215, confirmed=true) unknown error
========================================================================

...where the "7" return code on the third line seems to be $OCF_NOT_RUNNING, as returned by the rabbitmqctl_action() function.  However, rabbit does actually get up and fully-functional even though Pacemaker thinks startup failed.

When I modify the rabbitmqctl_action() function to send the output and timestamps to /tmp/log, I can confirm the 5 second delay is occurring inside the `rabbitmqctl -n ... wait` command.  For example, I added two `date` calls and appended them along with the output from the rabbitmqctl call into /tmp/log:

========================================================================
rabbitmqctl_action() {
    ...
    ocf_log info "executing $RABBITMQ_CTL $NODENAME_ARG $action > /dev/null 2> /dev/null"
    echo `date` >> /tmp/log
    $RABBITMQ_CTL $NODENAME_ARG $action >> /tmp/log 2>> /tmp/log
    rc=$?
    echo `date` >> /tmp/log
    ...
========================================================================

...which produces this output in /tmp/log:

========================================================================
Mon May 9 12:49:24 MDT 2011
Status of node rabbit at localhost ...
Error: unable to connect to node rabbit at localhost: nodedown
diagnostics:
- nodes and their ports on localhost: [{rabbitmqctl13264,38158}]
- current node: rabbitmqctl13264 at sm6nic02
- current node home dir: /var/lib/rabbitmq
- current node cookie hash: M+L/i5fJ87oOQ373piDAXQ==
Mon May 9 12:49:25 MDT 2011
Mon May 9 12:49:25 MDT 2011
Waiting for rabbit at localhost ...
Error: unable to connect to node rabbit at localhost: nodedown
diagnostics:
- nodes and their ports on localhost: [{rabbitmqprelaunch13333,36527},
                                       {rabbitmqctl13346,49342}]
- current node: rabbitmqctl13346 at sm6nic02
- current node home dir: /var/lib/rabbitmq
- current node cookie hash: M+L/i5fJ87oOQ373piDAXQ==
Mon May 9 12:49:30 MDT 2011
Mon May 9 12:49:30 MDT 2011
Status of node rabbit at localhost ...
Error: unable to connect to node rabbit at localhost: nodedown
diagnostics:
- nodes and their ports on localhost: [{rabbitmqprelaunch13333,36527},
                                       {rabbitmqctl13447,43772}]
- current node: rabbitmqctl13447 at sm6nic02
- current node home dir: /var/lib/rabbitmq
- current node cookie hash: M+L/i5fJ87oOQ373piDAXQ==
Mon May 9 12:49:30 MDT 2011
========================================================================

...note the 5 second jump in the timestamp between lines 10 and 19 which indicates the 5 second wait inside `rabbitmqctl wait`.

I can "fix" the problem by simply adding a `sleep 2` inside the rabbit_start() function like this:

========================================================================
rabbit_start() {
    ...
    setsid sh -c "$RABBITMQ_SERVER > ${RABBITMQ_LOG_BASE}/startup_log 2> ${RABBITMQ_LOG_BASE}/startup_err" &

    # Wait for the server to come up.
    # Let the CRM/LRM time us out if required

    sleep 2  #Chris: Just like Pete Rose, we want rabbitmqctl to lose the race between it and the rabbit-server vm startup...

    rabbit_wait
    ...
========================================================================

...of course the "sleep <arbitrary amount of time>" is not a proper fix.  I noticed the rabbit_stop() function has a "spinner" that waits for rabbit to shut down, deferring to the cluster stack to abort the process upon timeout...does it make sense to add the same type of wait loop for rabbit_start() and push the concern of the timeout into the cluster stack?

Hopefully this helps clarify...and thanks for your time!

-Chris