Thanks Francesco.<br><br>Your script looks reasonably close to what I'm doing, except for a couple of key differences:<br><br>All of your nodes are running on the same host. In my setup, each node is in its own VM. I imagine this introduces additional networking into the communication between nodes.<br>
<br>When you start Rabbit instances, you're doing it sequentially. In my setup I start up in parallel on all three VMs via Capistrano. If you can't go the multi-VM route, you may be able to somewhat simulate this by starting the instances with an '&' at the end.<br>
<br>I use "killall -9 beam.smp", rather than just "killall beam.smp"<br><br><br>In short, the goal of my test is to simulate a worst case powerout in the datacenter. All of our Rabbit instances run on separate VMs for fault tolerance. Hopefully we won't lose them all at once, but if we do, we need to be able to reliably restart them.<br>
<br><br><br><div class="gmail_quote">On Thu, Jun 21, 2012 at 6:36 AM, Francesco Mazzoli <span dir="ltr"><<a href="mailto:francesco@rabbitmq.com" target="_blank">francesco@rabbitmq.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Matt,<br>
At Tue, 19 Jun 2012 16:33:05 -0700,<br>
Matt Pietrek wrote:<br>
> Francesco,<br>
><br>
<div class="im">> Thanks again for the valuable insight from your reply. I'm down to<br>
> one issue at this point.<br>
><br>
> Given what you said earlier about it being OK to start the brokers<br>
> in any order, I wrote a simple "catastrophic stress" test. The good<br>
> news is that RabbitMQ does what's expected. The bad news: Only most<br>
> of the time, i.e. about 90%.<br>
<br>
</div>First of all, this is more of an erlang question than a RabbitMQ one -<br>
not that this changes anything, but you could ask about it in<br>
erlang-questions as well. RabbitMQ clusters are mnesia clusters, so<br>
they offer the same guarantees.<br>
<br>
I can't think of a motivation of why this would be happening, but I'm<br>
no expert with mnesia. I've attached a script that reproduces your<br>
test, can you verify that that's more or less what you're doing? I'm<br>
using the puka python client to publish the messages. I've run it 50<br>
times but I wasn't able to reproduce your problem.<br>
<br>
In the case that my test is indeed accurate I think that the best<br>
thing is to ask about someone with more mnesia knowledge - I have CCed<br>
possible candidates :).<br>
<span class="HOEnZb"><font color="#888888"><br>
Francesco.<br>
<br>
</font></span></blockquote></div><br>