[rabbitmq-discuss] [PATCH 00 of 10] Several improvements to the OCF resource agent

Matthew Sackman matthew at lshift.net
Tue May 11 20:04:45 BST 2010


On Tue, May 11, 2010 at 08:50:19PM +0200, Florian Haas wrote:
> On 05/11/2010 08:01 PM, Matthew Sackman wrote:
> > These are excellent, and I have no doubt they will likely all be
> > accepted. As I'm sure you've been able to gather, some of the
> > documentation and example scripts that I've read in order to be able to
> > write the OCF script are out of date themselves, hence some of the
> > issues you've spotted and corrected.
> 
> Would you mind sharing exactly what documentation and example scripts
> you were following? We should get those fixed.

Sure, a lot of it was just done from reading other OCF scripts, such as
the DRBD, IPAddr and such like. But I think I got the most out of the
docs at
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-ocf.html

For example, that only talks of the monitor action, not status, and no
where that I found is there any real documentation as to what the
library of functions that is supplied with pacemaker do, nor how or when
they should be used (eg the ocf_is_probe function you've used).

> One other thing that came to mind while looking at the RA: the
> recommended minimum start timeout of 600s seems a bit excessive.
> Starting with Pacemaker 1.0.8 the crm shell will warn if the
> configuration provides for shorter timeouts than the RA recommends. Sure
> you need a 10-minute start timeout?

Currently, startup time can be very long if you have an awful lot of
data to recover from disk. We think this might be partially fixed very
soon as some work that's recently been done will all Rabbit to come up
even before all recovery is complete (only the queues still being
recovered will continue to be unavailable). However, even in this case,
there are still some internal resources that must be fully recovered
before Rabbit can be in any way considered to be up, and that can be
proportional to the amount of data it has previously stored on disk.

Thus, in conclusion, yes, 10 mins may be far too long. But in some cases
it may also be too short. Any advice you have as to what we should be
doing wrt the OCF script would be gratefully received.

Matthew



More information about the rabbitmq-discuss mailing list