Hi Simon,<div><br></div><div>Thanks for that - we&#39;ll upgrade this cluster to 2.8.6 as you suggest and let you know how we get on.</div><div><br></div><div>Ian.<br><br><div class="gmail_quote">On Wed, Sep 19, 2012 at 1:08 PM, Simon MacMullen <span dir="ltr">&lt;<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Ian.<br>
<br>
We&#39;ve fixed quite a lot of bugs in mirrored queues since 2.7.1. So I would have to suggest an upgrade to 2.8.6 first of all.<br>
<br>
Cheers, Simon<br>
<br>
On 19/09/12 12:45, Ian wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi all,<br>
<br>
I wonder if anyone can help diagnose problems we&#39;ve been having with our<br>
2-node clustered rabbit which sporadically seizes up completely. None of<br>
the applications can get through to Rabbit though it is still up and<br>
running. CPU and RAM spike up to 100%. The Management UI cannot be<br>
reached and we end up having to restart the nodes to get service back.<br>
Sometimes it does not come back gracefully requiring reboot.<br>
<br>
Some stats:<br>
<br>
    * Both nodes are 4 Core 8GB RAM CentOS 6.2 virtual machines, running<br>
      on VMWare ESXi 4.1 host. We are running RabbitMQ version 2.7.1 on<br>
      Erlang R14B04.<br>
    * Looking at our metrics right now I see around:<br>
          o 1000 queues<br>
          o 4000 channels<br>
          o 8000 bindings<br>
          o 16 exchanges<br>
    * Memory usage, erlang processes, file descriptors, socket<br>
      descriptors are generally low and healthy<br>
<br>
Analysing errors in the rabbit logs from a recent failure reveals:<br>
<br>
    * Before the failure we have a bunch of background errors which may<br>
      be the fault of our applications like &quot;no binding X between<br>
      exchange Y in vhost &#39;/&#39; and queue Z in vhost &#39;/&#39;&quot;<br>
    * As we ramp up to the failure we see<br>
          o Two errors like this:<br>
                + “** Generic server &lt;0.16813.1677&gt; terminating ** Last<br>
                  message in was {&#39;$gen_cast&#39;,<br>
                  {run_backing_queue,rabbit_<u></u>mirror_queue_master,<br>
                  #Fun&lt;rabbit_mirror_queue_<u></u>master.4.85178772&gt;}} ** When<br>
                  Server state == {lim,0,undefined,false,[],0} ** Reason<br>
                  for termination == ** {function_clause,<br>
                  [{rabbit_limiter,handle_cast,<br>
                  [{run_backing_queue,rabbit_<u></u>mirror_queue_master,<br>
                  #Fun&lt;rabbit_mirror_queue_<u></u>master.4.85178772&gt;},<br>
                  {lim,0,undefined,false,[],0}]}<u></u>,<br>
                  {gen_server2,handle_msg,2},<br>
                  {proc_lib,init_p_do_apply,3}]} “<br>
          o A handful like this:<br>
                + “connection &lt;0.14270.7735&gt;, channel 38 - error:<br>
                  {amqp_error,command_invalid,&quot;<u></u>second &#39;channel.open&#39;<br>
                  seen&quot;,&#39;channel.open&#39;} “<br>
          o A couple of these:<br>
                + “connection &lt;0.158.6322&gt;, channel 135 - error:<br>
                  {amqp_error,not_found, &quot;no queue<br>
                  &#39;InRunning.WebClient.<u></u>SessionId[<u></u>l0mpn3egx5n0yj0lbs1hcehj]&#39;<br>
                  in vhost &#39;/&#39;&quot;, &#39;basic.get&#39;} “<br>
          o And then all these:<br>
                + exception on TCP connection &lt;0.14270.7735&gt; from<br>
                  WWW.XXX.YYY.ZZZ:59106 {inet_error,enotconn}<br>
                + exception on TCP connection &lt;0.14577.1677&gt; from<br>
                  WWW.XXX.YYY.ZZZ:53163 {inet_error,enotconn}<br>
                + exception on TCP connection &lt;0.1520.5487&gt; from<br>
                  WWW.XXX.YYY.ZZZ:53435 {timeout,running}<br>
                + exception on TCP connection &lt;0.158.6322&gt; from<br>
                  WWW.XXX.YYY.ZZZ:63187<br>
                  {writer,send_failed,{error,<u></u>enotconn}}<br>
                + exception on TCP connection &lt;0.17097.1918&gt; from<br>
                  WWW.XXX.YYY.ZZZ:55161<br>
                  {writer,send_failed,{error,<u></u>enotconn}}<br>
                + exception on TCP connection &lt;0.18340.7733&gt; from<br>
                  WWW.XXX.YYY.ZZZ:52868 {inet_error,enotconn}<br>
                + exception on TCP connection &lt;0.24514.6782&gt; from<br>
                  WWW.XXX.YYY.ZZZ:64362 {timeout,blocking}<br>
                + exception on TCP connection &lt;0.24518.6782&gt; from<br>
                  WWW.XXX.YYY.ZZZ:61252 {timeout,blocking}<br>
                + exception on TCP connection &lt;0.24524.6782&gt; from<br>
                  WWW.XXX.YYY.ZZZ:55845 {timeout,blocking}<br>
                + exception on TCP connection &lt;0.24528.6782&gt; from<br>
                  WWW.XXX.YYY.ZZZ:53434 {timeout,blocking}<br>
                + exception on TCP connection &lt;0.24532.6782&gt; from<br>
                  WWW.XXX.YYY.ZZZ:54398 {timeout,blocking}<br>
                + exception on TCP connection &lt;0.24536.6782&gt; from<br>
                  WWW.XXX.YYY.ZZZ:58878 {timeout,blocking}<br>
                + exception on TCP connection &lt;0.24552.6782&gt; from<br>
                  WWW.XXX.YYY.ZZZ:63155 {timeout,blocking}<br>
                + exception on TCP connection &lt;0.2577.2793&gt; from<br>
                  WWW.XXX.YYY.ZZZ:52752<br>
                  {writer,send_failed,{error,<u></u>enotconn}}<br>
                + exception on TCP connection &lt;0.26105.2580&gt; from<br>
                  WWW.XXX.YYY.ZZZ:50364<br>
                  {writer,send_failed,{error,<u></u>enotconn}}<br>
                + exception on TCP connection &lt;0.27505.6740&gt; from<br>
                  WWW.XXX.YYY.ZZZ:56170<br>
                  {writer,send_failed,{error,<u></u>enotconn}}<br>
                + exception on TCP connection &lt;0.27741.2921&gt; from<br>
                  WWW.XXX.YYY.ZZZ:54600<br>
                  {writer,send_failed,{error,<u></u>enotconn}}<br>
                + exception on TCP connection &lt;0.28602.6323&gt; from<br>
                  WWW.XXX.YYY.ZZZ:56863<br>
                  {writer,send_failed,{error,<u></u>enotconn}}<br>
                + exception on TCP connection &lt;0.30059.3135&gt; from<br>
                  WWW.XXX.YYY.ZZZ:57078 {writer,send_failed,{error,<u></u>closed}}<br>
                + exception on TCP connection &lt;0.5634.2393&gt; from<br>
                  WWW.XXX.YYY.ZZZ:53807<br>
                  {writer,send_failed,{error,<u></u>enotconn}}<br>
                + exception on TCP connection &lt;0.6691.6783&gt; from<br>
                  WWW.XXX.YYY.ZZZ:64363 {timeout,blocking}<br>
<br>
Can anyone help?<br>
<br>
Thanks,<br>
<br>
Ian<br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com" target="_blank">rabbitmq-discuss@lists.<u></u>rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/<u></u>cgi-bin/mailman/listinfo/<u></u>rabbitmq-discuss</a><span class="HOEnZb"><font color="#888888"><br>

</font></span></blockquote><span class="HOEnZb"><font color="#888888">
<br>
<br>
-- <br>
Simon MacMullen<br>
RabbitMQ, VMware<br>
</font></span></blockquote></div><br></div>