[rabbitmq-discuss] RabbitMq on ESXi

Thu Dec 5 18:28:00 GMT 2013

Simon,

Thanks for the suggestions.

I did read all the relevant documentation and I agree with you that
changing nettick would just mask the problem.

The nodes are not being suspended but something still causes mnesia
partitioning.
The only thing I can think of on the application level is if there are so
many messages between nodes that the nettick doesn't get through. Not
likely though.

I will upgrade to the latest version and try to diagnose problems on the
vmware level.

Thanks,

Zsolt

On Fri, Nov 29, 2013 at 5:41 AM, Simon MacMullen <simon at rabbitmq.com> wrote:

> Well, first of all lots of people are running clusters on virtual machines
> perfectly happily, so it should be possible!
>
> If you are seeing running_partitioned_network events on your cluster,
> that's quite alarming, that would imply (assuming your network is reliable)
> that nodes are being suspended by the hypervisor for at least a minute or
> so - which sounds excessive. Unless you are suspending the nodes yourself
> (in which case I suggest you don't do that). But I've not seen ESX do that
> in my limited experience with it.
>
> You could increase net_ticktime (http://www.rabbitmq.com/nettick.html) to
> cover this up, but it feels like a band aid at this point.
>
> You should probably read http://www.rabbitmq.com/partitions.html if you
> haven't already done so.
>
> Finally, you mention mirrored queues. Note that we have fixed a large
> number of bugs in the mirrored queue implementation since 2.8.2 (and quite
> a few since 3.0.1) so upgrading is likely to be a good idea.
>
> Cheers, Simon
>
> On 27/11/2013 20:23, zsolt.erl at gmail.com wrote:
>
>> Hi,
>>
>> I'm trying to find out if there are any recommendations for running
>> RabbitMq on VMWare ESXi? (eg. clustering, queue mirroring)
>>
>> I have several 4 node clusters running on ESXi4/5 guests. The guests are
>> Ubuntu 10.04 VMs.
>> Erlang version: R15B. RabbitMq versions: 2.8.2 and 3.0.1 .
>>
>> The clusters seem to randomly crash every once in a while (about once
>> every 2 months).
>> Sometimes the whole cluster crashes, sometimes only a couple nodes and
>> the others either work or become unreachable.
>> Logs only show that the nodes lost connection.
>> I'm running 4 node clusters with 1 disk node and about 100 queues
>> mirrored across 2-3 nodes.
>> The same thing was happening when I was running a cluster with 4 disk
>> nodes.
>>
>> Are there any recommended best practices in regards to Virtual Machine
>> settings, VMWare network settings or OS settings that could
>> prevent these random crashes?
>> Would federation be a better solution then clustering in a virtual
>> environment? Or should I just run them on physical hardware?
>>
>>
>> I realize there's not enough data here to find out what is happening
>> exactly but I'm just trying to see if anybody came across similar
>> problems and were able to handle it?
>>
>>
>>
>> Thanks,
>>
>> Zsolt
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> rabbitmq-discuss at lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20131205/72b28d33/attachment.html>