<div dir="ltr"><div>The queues are durable/mirrored; there are ~150 of them involved per message.</div><div><br></div>The "140K msgs/hr" value means that a producer places a message on the inbound queue. The application (consumers of this queue) processes the messages using an asynchronous state machine that uses AMQP as a means of pushing data to the next state of the processor (publishes/consumes). For each state there are 2-3 queues such as process, retry, and error; there are many states per message type. There are 6 application instances connecting to the Rabbit cluster to process these messages. As a message flows through the various states its metadata will flow through these various queues until the processing activity is completed. We use MassTransit and the .Net C# Rabbit driver on the application side which implements this behavior. I mention/describe this to be as complete as possible as we're not testing a single produce/consume type of scenario - it's more complex as there are several processing states each publishing/consuming to many different exchanges/queues.<div>
<br></div><div>CPU looked fine, Disk is fine, no latency observed - we're on a SAN capably of 20K IOPs peak and we're not anywhere close to that.</div><div><br></div><div>Publishing is done in the MassTransit implementation using the .Net C# driver, but it publishes then spawns another thread to wait for the ACK. So the main processing keeps going, but if there is an exceptional return it is handled elsewhere/asynchronously.</div>
<div><br></div><div>I did confirm that the entire Erlang process was stopped and restarted on the Rabbit nodes; we will recreate the issue and try just stopping the app and restarting. The VM's were not restarted nor were they migrated between test runs.</div>
<div><br></div><div>Cheers,</div><div><br></div><div>Ron</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Mar 19, 2014 at 10:24 AM, Simon MacMullen <span dir="ltr"><<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On 19/03/14 16:27, Ron Cordell wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Any suggestions on places to look to see what the underlying issue might be?<br>
</blockquote>
<br></div>
The good news is that performance bottlenecks will get easier to diagnose in 3.3.0. The bad news is it's not out yet.<br>
<br>
140kmsg/h is only 40msg/s - so you should not have any difficulty hitting that even on modest hardware. I assume the messages were persistent as well as mirrored, but unless the messages were both very large and persistent you should not have a problem there.<br>
<br>
Was anything on the broker looking busy (CPU, disk?) Did any of the connections show a status of "flow"?<br>
<br>
If the answer to those questions is "no" then could you be publishing (effectively) synchronously? Do you use mandatory publishing, publish inside transactions, or use confirms in a non-streaming way (i.e. publish, wait for confirm, repeat)?<br>
<br>
Cheers, Simon<span class="HOEnZb"><font color="#888888"><br>
<br>
-- <br>
Simon MacMullen<br>
RabbitMQ, Pivotal<br>
</font></span></blockquote></div><br></div>