Hi Simon,<div><br></div><div>liubida is my colleague. So we are facing the same problem. We know these each other&#39;s mail at the next morning :)</div><div><br></div><div>Thank you very much for you kindly answer.</div><div>
<br></div><div>Samuel</div><div><br><br><div class="gmail_quote">On Fri, Sep 21, 2012 at 1:05 AM, Simon MacMullen <span dir="ltr">&lt;<a href="mailto:simon@rabbitmq.com" target="_blank">simon@rabbitmq.com</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi.<br>
<br>
I think you must be from the same organisation as &quot;liubida&quot;, the log file is identical, so see my answer there:<br>
<br>
<a href="http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2012-September/022560.html" target="_blank">http://lists.rabbitmq.com/<u></u>pipermail/rabbitmq-discuss/<u></u>2012-September/022560.html</a><br>
<br>
Cheers, Simon<div><div class="h5"><br>
<br>
On 19/09/12 17:10, Samuel Chen wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
Hi all<br>
<br>
I&#39;m new to RabbitMQ. We are facing a strange issue against<br>
rabbit_mgmt_db. I could not find any similar issues by searching on<br>
google or stackoverflow. So I wonder if anyone could help to diagnose<br>
this problem. Thanks in advance.<br>
<br>
We have a 2-node clustered RabbitMQ integrated with Celery. It worked<br>
well for several months.<br>
The issue occurred the first time on Jul 4th. After restarted, it worked<br>
for about 2 months. Yesterday the issue occurred twice (one is after<br>
restarted).<br>
The stat was that a child (rabbit_mgmt_db??) was killed automatically.<br>
  By some failures of restarting automatically, it reached the max<br>
restart intensity. Eventually it was shutdown.<br>
(Anther situation is that we deployed to 2-node cluster from 1 node<br>
server at the end of June. Note sure if it caused this issue.)<br>
<br>
The hosts are virtual servers with 8/12G ram and 30G disk.<br>
One node is disc node and the other is ram.<br>
The load balance was very low (around 100M ram, few tasks) . Disk has<br>
2.5G free space.<br>
Log as below.<br>
<br>
Thanks for any help.<br>
<br>
        SUPERVISOR REPORT==== 18-Sep-2012::14:53:13 ===<br>
<br>
              Supervisor: {&lt;0.28802.1777&gt;,mirrored_<u></u>supervisor}<br>
<br>
              Context:    child_terminated<br>
<br>
              Reason:     killed<br>
<br>
              Offender:   [{pid,&lt;0.28804.1777&gt;},<br>
<br>
                           {name,rabbit_mgmt_db},<br>
<br>
                           {mfa,{rabbit_mgmt_db,start_<u></u>link,[]}},<br>
<br>
                           {restart_type,permanent},<br>
<br>
                           {shutdown,4294967295},<br>
<br>
                           {child_type,worker}]<br>
<br>
<br>
<br>
        =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===<br>
<br>
              Supervisor: {&lt;0.28802.1777&gt;,mirrored_<u></u>supervisor}<br>
<br>
              Context:    start_error<br>
<br>
              Reason:     {already_started,&lt;19704.13906.<u></u>2335&gt;}<br>
<br>
              Offender:   [{pid,&lt;0.28804.1777&gt;},<br>
<br>
                           {name,rabbit_mgmt_db},<br>
<br>
                           {mfa,{rabbit_mgmt_db,start_<u></u>link,[]}},<br>
<br>
                           {restart_type,permanent},<br>
<br>
                           {shutdown,4294967295},<br>
<br>
                           {child_type,worker}]<br>
<br>
<br>
<br>
        =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===<br>
<br>
              Supervisor: {&lt;0.28802.1777&gt;,mirrored_<u></u>supervisor}<br>
<br>
              Context:    start_error<br>
<br>
              Reason:     {already_started,&lt;19704.13906.<u></u>2335&gt;}<br>
<br>
              Offender:   [{pid,&lt;0.28804.1777&gt;},<br>
<br>
                           {name,rabbit_mgmt_db},<br>
<br>
                           {mfa,{rabbit_mgmt_db,start_<u></u>link,[]}},<br>
<br>
                           {restart_type,permanent},<br>
<br>
                           {shutdown,4294967295},<br>
<br>
                           {child_type,worker}]<br>
<br>
<br>
<br>
        =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===<br>
<br>
              Supervisor: {&lt;0.28802.1777&gt;,mirrored_<u></u>supervisor}<br>
<br>
              Context:    start_error<br>
<br>
              Reason:     {already_started,&lt;19704.13906.<u></u>2335&gt;}<br>
<br>
              Offender:   [{pid,&lt;0.28804.1777&gt;},<br>
<br>
                           {name,rabbit_mgmt_db},<br>
<br>
                           {mfa,{rabbit_mgmt_db,start_<u></u>link,[]}},<br>
<br>
                           {restart_type,permanent},<br>
<br>
                           {shutdown,4294967295},<br>
<br>
                           {child_type,worker}]<br>
<br>
<br>
<br>
        =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===<br>
<br>
              Supervisor: {&lt;0.28802.1777&gt;,mirrored_<u></u>supervisor}<br>
<br>
              Context:    start_error<br>
<br>
              Reason:     {already_started,&lt;19704.13906.<u></u>2335&gt;}<br>
<br>
              Offender:   [{pid,&lt;0.28804.1777&gt;},<br>
<br>
                           {name,rabbit_mgmt_db},<br>
<br>
                           {mfa,{rabbit_mgmt_db,start_<u></u>link,[]}},<br>
<br>
                           {restart_type,permanent},<br>
<br>
                           {shutdown,4294967295},<br>
<br>
                           {child_type,worker}]<br>
<br>
<br>
<br>
        =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===<br>
<br>
              Supervisor: {&lt;0.28802.1777&gt;,mirrored_<u></u>supervisor}<br>
<br>
              Context:    start_error<br>
<br>
              Reason:     {already_started,&lt;19704.13906.<u></u>2335&gt;}<br>
<br>
              Offender:   [{pid,&lt;0.28804.1777&gt;},<br>
<br>
                           {name,rabbit_mgmt_db},<br>
<br>
                           {mfa,{rabbit_mgmt_db,start_<u></u>link,[]}},<br>
<br>
                           {restart_type,permanent},<br>
<br>
                           {shutdown,4294967295},<br>
<br>
                           {child_type,worker}]<br>
<br>
<br>
<br>
        =SUPERVISOR REPORT==== 18-Sep-2012::14:53:14 ===<br>
<br>
              Supervisor: {local,rabbit_mgmt_sup}<br>
<br>
              Context:    shutdown<br>
<br>
              Reason:     reached_max_restart_intensity<br>
<br>
              Offender:   [{pid,&lt;0.28803.1777&gt;},<br>
<br>
                           {name,mirroring},<br>
<br>
                           {mfa,<br>
<br>
                               {mirrored_supervisor,start_<u></u>internal,<br>
<br>
                                   [rabbit_mgmt_sup,<br>
<br>
                                    [{rabbit_mgmt_db,<br>
<br>
                                         {rabbit_mgmt_db,start_link,[]}<u></u>,<br>
<br>
                                         permanent,4294967295,worker,<br>
<br>
                                         [rabbit_mgmt_db]}]]}},<br>
<br>
                           {restart_type,permanent},<br>
<br>
                           {shutdown,4294967295},<br>
<br>
                           {child_type,worker}]<br>
<br>
<br>
<br>
- Sam<br>
<br>
<br>
<br></div></div>
______________________________<u></u>_________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com" target="_blank">rabbitmq-discuss@lists.<u></u>rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/<u></u>cgi-bin/mailman/listinfo/<u></u>rabbitmq-discuss</a><span class="HOEnZb"><font color="#888888"><br>

</font></span></blockquote><span class="HOEnZb"><font color="#888888">
<br>
<br>
-- <br>
Simon MacMullen<br>
RabbitMQ, VMware<br>
</font></span></blockquote></div><br></div>