> Pause all publishing before (re)starting any cluster nodes.<br>Just want to report back that the "work around" did the trick :-) Of course the situation is not ideal, but we have a working cluster again<br><br>
Thx Matthias! <br><br>Cheers<br>Matthias<br><br><div class="gmail_quote">On Mon, Aug 27, 2012 at 10:44 PM, Matthias Reik <span dir="ltr"><<a href="mailto:matthias.reik@gmail.com" target="_blank">matthias.reik@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">See comments inline<br><br>Thanks<br>Matthias<br><br><div class="gmail_quote"><div class="im">On Mon, Aug 27, 2012 at 5:22 PM, Matthias Radestock <span dir="ltr"><<a href="mailto:matthias@rabbitmq.com" target="_blank">matthias@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Matthias,<div><br>
<br>
On 27/08/12 16:02, Matthias Reik wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
even though the setup looks slightly differently (since we are not<br>
using the shovel plugin), the reason could be the same. We are<br>
explicitly ACKing the messages (i.e. no auto-ack), even though the<br>
consumers are in the same data-centers (so we should have a reliable<br>
network), but if the acks are lost and that causes memory increase in<br>
the server then it could be the same bug.<br>
</blockquote>
<br></div>
As noted in my analysis, the bug has nothing do with the shovel, or consuming/acking - simply publishing to HA queues when (re)starting slaves is sufficient to trigger it. </blockquote></div><div>Wasn't sure I understood it 100% correctly (sorry not too experienced with RabbitMQ yet). Thx for the confirmation.<br>
</div><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Is there anything I could do to validate this assumption?<br>
</blockquote>
<br></div>
I don't think it's worth the hassle. I am quite certain that you are suffering from the same bug.</blockquote></div><div> OK, if you expect a fix for the issue to appear soon then I could wait with with "fixing" the cluster and try out any updated version. If it will take more time, then I will probably go for your (below) suggested fix/workaround.<br>
<br></div><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Is there anything I can do in the meantime to get into a state where I<br>
have a working cluster again<br>
</blockquote>
<br></div>
Pause all publishing before (re)starting any cluster nodes.<br></blockquote></div><div>Yes, that makes sense.<br><br>Thank you for your quick response. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Regards,<br>
<br>
Matthias.<br>
</blockquote></div><br>
</blockquote></div><br>