[rabbitmq-discuss] RabbitMQ Cluster, split network & VMWare snapshot

Michael Oullion michael.oullion at norbert-dentressangle.com
Fri Feb 21 05:59:21 GMT 2014


Thanks again for your answer.
Honestly, my production team manage the snapshot as they want.
But with your answer I can explain why we need to stop snapshot these vm.
Thanks a lot
Regards
Le 21 févr. 2014 02:06, "Jerry Kuch" <jkuch at gopivotal.com> a écrit :

> On Thu, Feb 20, 2014 at 12:11 PM, Michael Oullion <
> michael.oullion at norbert-dentressangle.com> wrote:
>
>> Thanks Jerry for your quick answer.
>> What can we do in this situation?
>> Maybe we can uprise the net tick or use a specific behaviour to manage
>> network split.
>> Or simply stop take snapshot of the vm because it's not necessary?
>>
> You may want to think about why you take the snapshots in your particular
> workflow.  As long as you have a once in a while snapshot with the
> configuration of your Rabbit nodes, as you use them in your
> dev/test/production environment, you should be fine for restoration
> purposes.  You probably don't need to do that every night unless a lot is
> changing on them from day to day.
>
> Besides, if that VM were restored from a snapshot, it will wake up into a
> world where any connected clients and whatnot are likely gone and forgotten
> and have to slough such things off anyway.  And there may be messages
> sitting in queues that were long ago delivered to consumers and acted upon,
> that are now going to come back from the un-snapshotted grave.  If your
> apps are designed sensibly, favoring idempotency and suitable
> de-duplication of action at the consumer end, this won't be a big deal, of
> course.
>
> You may also want to keep an eye on your vSphere monitoring and management
> stuff to see if anything else is going on around the times these partitions
> occur.  Partitions are in the eye of each participating beholder, and we
> detect (really *define*) them via timeout, so anything that renders a node
> temporarily unable to participate in heart beating will manifest this way.
>
> Beyond snapshotting, which paralyzes the VM for part of the time the
> snapshot is being made, I'd also watch out for vMotion, which briefly stuns
> the VM being motioned into a quiescent state just before vSphere switches
> over to the migrated VM at its new location, and, possibly the hypervisor
> paging memory out beneath the guest OS that Rabbit is running on top of,
> which could make things lag enough that a heartbeat exchange would be
> missed.  The latter case can be especially sneaky since an ESX host under
> memory pressure may be paging out guest OSes without them, as far as they
> know, swapping...
>
> Best regards,
> Jerry
>
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20140221/de4c6efa/attachment.html>


More information about the rabbitmq-discuss mailing list