<div dir="ltr"><div>Hello, Emile</div><div><br></div><div>Thank you very much for you help. I will try to provide anything what can help to solve this issue. </div><div><br></div><div class="gmail_extra">
<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><br>
</div>The difference between 1Mb and 180Mb is relatively large, even after<br>
taking expected differences due to garbage collection into account. We<br>
can't rule out a memory leak, but need some assistance from you to confirm.<br>
<br>
Do you see the same asymmetry if the master node for the queues switch<br>
from one node to the other? So if you shut the cdaemon2 node, let<br>
cdaemon4 become the master for all the queue, turn cdaemon2 back on (it<br>
will now be a slave node) does the memory on cdaemon2 now grow?<br></blockquote><div><br></div><div>Yes, after current master stops and starts it becomes slave and its memory starts to grow.</div><div>Meantime new selected master memory does not become free. So new master memory stops to</div>
<div>grow but do not fall back to normal. I have attached memory graphs of our nodes to this letter.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
Have you been able to add a third node to the cluster for testing<br>
purposes to see if memory grows on more than one slave node?<br></blockquote><div><br></div><div>We have not tied to do this yet. But if it can help we can allocate one more node.</div><div>Is is ok to create test node at the same physical host as one of existing nodes?</div>
<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
How long does it take for the memory use to reach the VM memory high<br>
watermark? <br></blockquote><div><br></div><div>Critical point for our cluster comes much more earlier than VM memory high watermark.</div><div>The same time with memory grow slave node starts to use CPU more and more active.</div>
<div>In our case when memory consumption reaches ~1Gb broker stops to respond.</div><div><br></div><div>After slave restart memory grows linearily some time. After that memory grow changes its pattern.</div>
<div>At some moments it increases by constant step (~20Mb). I have marked these steps on graphs attached. </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
Can you describe your messaging pattern in a bit more detail for us to<br>
reproduce the problem - how often are new channels created when publishing?<br></blockquote><div><br></div><div>Queues and exchanges:</div><div><br></div><div>1 queue and 1 exchange (Durable, ha-mode: all, no autodelete). </div>
<div><br></div><div>Publishers:</div><div><br></div><div>We have about 100 web servers which publish messages by next steps:</div><div>1) Open connection</div><div>2) Create channel</div><div>
3) Publish message to direct exchange</div><div>4) close connection</div><div>Publishing rate is between 50 and 600 messages per second at peak hours. </div><div>I have attached graph.</div><div><br>
</div><div>Consumers:</div><div><br></div><div>We have 3 servers consuming messages in 30 threads each.</div><div>Each consuming thread</div><div>1) Thread starts</div><div>2) Open broker connection</div>
<div>3) Create channel</div><div>4) Get message from the queue by basic-get with prefetch=1</div><div>5) Process message and acknowledge it.</div><div>6) If there are no new messages sleep some time</div>
<div>7) repeat steps 4,5,6 until limit count is reached</div><div>8) Close connection</div><div>9) Thread stops</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
In order to investigate further it might be helpful to execute some<br>
diagnostic commands on the broker. Are you able to replicate the problem<br>
in a staging or QA environment where it is safe to do this?<br></blockquote><div><br></div><div>I will execute diagnostic commands on the broker. If something goes wrong our messaging</div><div>falls back to version without rabbitMQ involved ).</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span><font color="#888888"><br>
-Emile<br><br></font></span></blockquote><div><br></div><div>Kind regards,</div><div>Dmitry Saprykin </div></div><br></div></div>