[rabbitmq-discuss] Instable HA cluster

Ben West ben.west at mobankgroup.com
Mon Jan 28 11:17:16 GMT 2013


Hi Simon,

Thanks for taking the time to look into this.

I had a feeling this was an environmental issue because it only occurred on
our QA and PreLive environments, we do not experience the same issue on our
LIVE environment (hosted in another service, not Azure).

I will get in touch with Microsoft and find out what our options are with
Azure VMs and establish if there is anything we can do to eliminate network
partitions. We do not currently use their Virtual Networks so we may try
setting up the two VMs in a virtual Network to increase the stability
between the two nodes. Failing that we may look at the Federation / Shovel
solutions mentioned in the link.

Cheers,

Ben


Ben West
Product Owner

Mobile: +44(0)7824 617 813
ben.west at mobankgroup.com

www.mopowered.co.uk <http://goog_1166506706/>



On 28 January 2013 10:55, Simon MacMullen <simon at rabbitmq.com> wrote:

> Hi, thanks. So it looks like the cluster is experiencing network
> partitions:
>
> http://www.rabbitmq.com/**partitions.html<http://www.rabbitmq.com/partitions.html>
>
> since we can see (for example) that around 18-Jan-2013 03:30:21 both nodes
> saw the other one go down. RabbitMQ clusters do not tolerate partitions
> well, so this needs to be fixed I'm afraid.
>
> You should not need to rebuild the cluster when this happens however, just
> stopping and starting nodes should be enough to recover.
>
> Cheers, Simon
>
>
> On 25/01/13 17:04, Ben West wrote:
>
>> Hi Simon,
>>
>> Please find attached.
>>
>> Thanks again,
>>
>> Ben
>>
>>
>> On 25 January 2013 16:41, Simon MacMullen <simon at rabbitmq.com
>> <mailto:simon at rabbitmq.com>> wrote:
>>
>>     Could you send the log for MOPAYPL2 as well? It would be useful to
>>     correlate what's happening with it too.
>>
>>     Cheers, Simon
>>
>>
>>     On 25/01/13 16:28, Ben West wrote:
>>
>>
>>         Hi Simon,
>>
>>         Thanks for for coming back to me.
>>
>>         Attached is a rabbit mq log from one of the nodes.
>>
>>         As I mentioned before, this seems to happen once every week or
>>         two. I
>>         can be too specific on when it happened because i only noticed
>>         when I
>>         check the console, however I believe the last occurrence would
>>         have been
>>         yesterday (24th Jan).
>>
>>         If you need further logs / info let me know and I'll dig it out
>>         for you.
>>
>>         Kind regards,
>>
>>         Ben
>>
>>
>>         On 25 January 2013 15:32, Simon MacMullen <simon at rabbitmq.com
>>         <mailto:simon at rabbitmq.com>
>>         <mailto:simon at rabbitmq.com <mailto:simon at rabbitmq.com>>> wrote:
>>
>>              On 25/01/13 09:52, ben.west at mobankgroup.com
>>         <mailto:ben.west at mobankgroup.**com <ben.west at mobankgroup.com>>
>>              <mailto:ben.west at mobankgroup._**_com
>>
>>         <mailto:ben.west at mobankgroup.**com <ben.west at mobankgroup.com>>>
>> wrote:
>>
>>                  I'm more thank happy to provide logs if these may be
>>         useful?
>>
>>
>>              That would be useful. Preferably if you can also tell us
>>         about the
>>              time at which the cluster broke (even if it's just "some
>>         time on
>>              Wednesday" or similar).
>>
>>              Cheers, Simon
>>              --
>>              Simon MacMullen
>>              RabbitMQ, VMware
>>
>>
>>
>>         <http://www.mopowered.co.uk>
>>
>>
>>
>>     --
>>     Simon MacMullen
>>     RabbitMQ, VMware
>>
>>
>>
>> <http://www.mopowered.co.uk>
>>
>
>
> --
> Simon MacMullen
> RabbitMQ, VMware
>

-- 
 <http://www.mopowered.co.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20130128/3556aea1/attachment.htm>


More information about the rabbitmq-discuss mailing list