<br><br><div class="gmail_quote">On Sun, Jul 25, 2010 at 1:33 AM, Dave Greggory <span dir="ltr">&lt;<a href="mailto:davegreggory@yahoo.com">davegreggory@yahoo.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

2. HA/Failover: I&#39;ve seen the Pacemaker guide but I&#39;m a little hesitant to set<br>

that up as we have little experience in house with Pacemaker/Corosync/DRBD. How<br>

many people use it for HA/Failover in production systems and how happy are you<br>

with it? Does it support failing over if the hard drive on one of the nodes die<br>

instead something a little more simple like a node running out of memory or<br>

hanging?<br></blockquote><div><br>We are contemplating this and have done some trialling/testing. For us the question is between providing HA at the xenserver level or at the host/app level using pacemaker etc.<br><br>

It took a little bit of fiddling to get it running with pacemaker (this was before the 

HA document was available), but once we had the system working, it 

worked/works well. Our solution used/uses shared ISCSI storage rather than DRDB and so relies on the reliability of the SAN. If a drive on one of the hosts fails (such as the root/other partition) and this causes difficulties for the status check script, it will failover to the other node. We assume that the drives containing the rabbitmq storage are &quot;safe&quot; through redundancy (RAID1, redundant storage controllers etc)<br>

<br>At this stage we are leaning towards the xenserver level due to lower complexity and still satisfying our requirements. We&#39;ve also had some hardware changes (production system will now be on a FC SAN rather than ISCSI) and have not done the work testing on the new configuration yet.<br>

<br>In terms of monitoring, we generally run rabbitmq_ctl list_queues as part of a munin plugin. We plan to hook it up to nagios, but havent done so yet.<br><br>Hope this is useful information.<br><br>Joe<br><br><br></div>

</div>