[rabbitmq-discuss] RabbitMQ couldn't recover after the free disk space went below the default threshold limit

Simon MacMullen simon at rabbitmq.com
Wed Jan 22 15:26:11 GMT 2014


It looks like two hours after you went under the disk space limit, you 
actually ran out of space. RabbitMQ will in general not handle 
out-of-space very well (which is why the limit is there).

While the disk space alarm will try to stop RabbitMQ from using any more 
disk by blocking publishing, there is nothing to stop other applications 
continuing to eat disk space until you run out. The disk space alarm 
will not guarantee you will not run out of disk space.

Furthermore, before version 3.2.0 RabbitMQ could decide to page 
transient messages out to disk even after the disk alarm went off. In 
3.2.0 and later RabbitMQ will ignore memory pressure when the disk alarm 
is active.

You quite possibly want to increase the default disk limit on production 
boxes, or monitor disk space externally to RabbitMQ.

Cheers, Simon

On 22/01/14 15:13, Jain, Punit wrote:
> Hi All,
>
> We are using RabbitMQ 3.1.3. It was running fine, but recently it
> couldn’t recover itself after the free disk space went below the
> threshold limit. We didn’t modify the default threshold. I have appended
> few excerpts from the logs, which I thought are relevant; let me know if
> you need any other information or logs.
>
> Thanks in Advance!
>
> Punit
>
> =INFO REPORT==== 18-Jan-2014::11:29:38 ===
>
> accepting AMQP connection <0.6291.2> (127.0.0.1:8198 -> 127.0.0.1:5672)
>
> =INFO REPORT==== 18-Jan-2014::11:29:52 ===
>
> Disk free space insufficient. Free bytes:994299904 Limit:1000000000
>
> =WARNING REPORT==== 18-Jan-2014::11:29:52 ===
>
> disk resource limit alarm set on node rabbit at localhost.
>
> **********************************************************
>
> *** Publishers will be blocked until this alarm clears ***
>
> **********************************************************
>
> =WARNING REPORT==== 18-Jan-2014::11:30:10 ===
>
> closing AMQP connection <0.6280.2> (127.0.0.1:8181 -> 127.0.0.1:5672):
>
> connection_closed_abruptly
>
> =ERROR REPORT==== 18-Jan-2014::14:38:54 ===
>
> ** Generic server mnesia_sync terminating
>
> ** Last message in was timeout
>
> ** When Server state == {state,[{<0.1100.2>,#Ref<0.0.16.193586>}],true}
>
> ** Reason for termination ==
>
> **
> {{badmatch,{error,{file_error,"/var/lib/rabbitmq/mnesia/rabbit at localhost/LATEST.LOG",
>
>                                   enospc}}},
>
>      [{mnesia_sync,handle_info,2,[]},
>
>       {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,607}]},
>
>       {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
>
> =ERROR REPORT==== 18-Jan-2014::14:38:54 ===
>
> webmachine error:
> path="/api/parameters/federation-upstream/%2f/LocalDir_Upstream"
>
> {error,
>
>      {exit,
>
>          {{{badmatch,
>
>                {error,
>
>                    {file_error,
>
>
> "/var/lib/rabbitmq/mnesia/rabbit at localhost/LATEST.LOG",
>
>                        enospc}}},
>
>            [{mnesia_sync,handle_info,2,[]},
>
>             {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,607}]},
>
>
> {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]},
>
>           {gen_server,call,[mnesia_sync,sync,infinity]}},
>
>          [{gen_server,call,3,[{file,"gen_server.erl"},{line,188}]},
>
>           {rabbit_misc,execute_mnesia_transaction,1,[]},
>
>           {rabbit_runtime_parameters,set_any0,4,[]},
>
>           {rabbit_runtime_parameters,set_any,4,[]},
>
>           {rabbit_mgmt_wm_parameter,'-accept_content/2-fun-0-',5,[]},
>
>           {rabbit_mgmt_util,with_decode,5,[]},
>
>           {webmachine_resource,resource_call,3,[]},
>
>           {webmachine_resource,do,3,[]}]}}
>
> =WARNING REPORT==== 18-Jan-2014::14:38:54 ===
>
> Connection (<0.7261.2>) closing: internal error in channel (<0.7268.2>):
> shutdown
>
> =WARNING REPORT==== 18-Jan-2014::14:38:54 ===
>
> Connection (<0.7318.2>) closing: internal error in channel (<0.7344.2>):
> shutdown
>
> =WARNING REPORT==== 18-Jan-2014::14:38:54 ===
>
> Connection (<0.7310.2>) closing: internal error in channel (<0.7335.2>):
> shutdown
>
> =WARNING REPORT==== 18-Jan-2014::14:38:54 ===
>
> Connection (<0.7328.2>) closing: internal error in channel (<0.7353.2>):
> shutdown
>
> =INFO REPORT==== 18-Jan-2014::14:38:54 ===
>
> stopped TCP Listener on [::]:5672
>
> =WARNING REPORT==== 18-Jan-2014::14:38:54 ===
>
> Connection (<0.7305.2>) closing: internal error in channel (<0.7323.2>):
> shutdown
>
> =ERROR REPORT==== 18-Jan-2014::14:38:54 ===
>
> AMQP connection <0.20548.1> (blocking), channel 1 - error:
>
> shutdown
>
> =WARNING REPORT==== 18-Jan-2014::14:38:54 ===
>
> Non-AMQP exit reason 'shutdown'
>
> =ERROR REPORT==== 18-Jan-2014::14:38:54 ===
>
> AMQP connection <0.11369.2> (blocked), channel 3 - error:
>
> shutdown
>
> =WARNING REPORT==== 18-Jan-2014::14:38:54 ===
>
> Non-AMQP exit reason 'shutdown'
>
> =ERROR REPORT==== 18-Jan-2014::14:38:54 ===
>
> AMQP connection <0.6437.2> (blocked), channel 3 - error:
>
> shutdown
>
> =WARNING REPORT==== 18-Jan-2014::14:38:54 ===
>
> Non-AMQP exit reason 'shutdown'
>
> =ERROR REPORT==== 18-Jan-2014::14:38:54 ===
>
> closing AMQP connection <0.6437.2> (127.0.0.1:8950 -> 127.0.0.1:5672):
>
> {inet_error,enotconn}
>
> =ERROR REPORT==== 18-Jan-2014::14:38:54 ===
>
> ** Generic server msg_store_persistent terminating
>
> ** Last message in was {'EXIT',<0.166.0>,shutdown}
>
> ** When Server state == {msstate,
>
>
> "/var/lib/rabbitmq/mnesia/rabbit at localhost/msg_store_persistent",
>
>                              rabbit_msg_store_ets_index,
>
>                              {state,241739,
>
>
> "/var/lib/rabbitmq/mnesia/rabbit at localhost/msg_store_persistent"},
>
>                              0,#Ref<0.0.0.1083>,
>
>                              {dict,0,16,16,8,80,48,
>
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>
>                                   []},
>
>
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>
>                                    [],[]}}},
>
>                              undefined,657,7536,[],<0.229.0>,245836,237642,
>
>                              249933,254030,
>
>                              {set,0,16,16,8,80,48,
>
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>
>                                   []},
>
>
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>
>                                    [],[]}}},
>
>                              {dict,4,16,16,8,80,48,
>
>
> {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>
>                                   []},
>
>                                  {{[],[],[],[],[],[],[],
>
>
> [[<<47,239,116,176,238,253,171,103,27,228,
>
>                                        101,97,202,108,131,196>>|
>
>                                      {<0.233.0>,
>
>
> #Fun<rabbit_variable_queue.1.98519756>,
>
>
> #Fun<rabbit_variable_queue.13.110761503>}]],
>
>                                    [],
>
>
> [[<<245,14,237,152,23,39,217,67,22,170,155,
>
>                                        57,40,30,29,108>>|
>
>                                      {<0.234.0>,
>
>
> #Fun<rabbit_variable_queue.1.98519756>,
>
>
> #Fun<rabbit_variable_queue.13.110761503>}]],
>
>
> [[<<3,56,140,195,127,215,175,32,168,21,13,
>
>                                        192,178,56,76,211>>|
>
>                                      {<0.4047.0>,
>
>
> #Fun<rabbit_variable_queue.1.98519756>,
>
>
> #Fun<rabbit_variable_queue.13.110761503>}]],
>
>                                   [[<<71,36,220,167,246,169,123,248,249,183,
>
>                                        209,118,140,198,38,8>>|
>
>                                      {<0.4010.0>,
>
>
> #Fun<rabbit_variable_queue.1.98519756>,
>
>
> #Fun<rabbit_variable_queue.13.110761503>}]],
>
>                                    [],[],[],[]}}},
>
>                              false,16777216,
>
>                              {dict,0,16,16,8,80,48,
>
>
>             {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>
>                                   []},
>
>
> {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
>
>                                    [],[]}}}}
>
> ** Reason for termination ==
>
> **
> {{badmatch,{error,{file_error,"/var/lib/rabbitmq/mnesia/rabbit at localhost/msg_store_persistent/file_summary.ets",
>
>                                   enospc}}},
>
>      [{rabbit_msg_store,store_file_summary,2,[]},
>
>       {rabbit_msg_store,terminate,2,[]},
>
>       {gen_server2,terminate,3,[]},
>
>       {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,237}]}]}
>
> ** In 'terminate' callback with reason ==
>
> ** shutdown
>
> =ERROR REPORT==== 18-Jan-2014::14:38:58 ===
>
> ** Generic server rabbit_mgmt_external_stats terminating
>
> ** Last message in was emit_update
>
> ** When Server state == {state,1024}
>
> ** Reason for termination ==
>
> ** {noproc,{gen_server,call,[rabbit_node_monitor,partitions,infinity]}}
>
> =ERROR REPORT==== 18-Jan-2014::14:38:59 ===
>
> webmachine error:
> path="/api/parameters/federation-upstream/%2f/LocalDir_Upstream"
>
> {error,{error,badarg,
>
>                [{ets,lookup,
>
>                      [rabbit_registry,
>
>                       {runtime_parameter,'federation-upstream'}],
>
>                      []},
>
>                 {rabbit_registry,lookup_module,2,[]},
>
>                 {rabbit_runtime_parameters,lookup_component,1,[]},
>
>                 {rabbit_runtime_parameters,set_any0,4,[]},
>
>                 {rabbit_runtime_parameters,set_any,4,[]},
>
>                 {rabbit_mgmt_wm_parameter,'-accept_content/2-fun-0-',5,[]},
>
>                 {rabbit_mgmt_util,with_decode,5,[]},
>
>                 {webmachine_resource,resource_call,3,[]}]}}
>
> =INFO REPORT==== 18-Jan-2014::15:20:52 ===
>
> Error description:
>
>
> {error,{could_not_write_file,"/var/lib/rabbitmq/mnesia/rabbit at localhost/cluster_nodes.config",
>
>                                  enospc}}
>
> Log files (may contain more information):
>
>     /var/log/rabbitmq/rabbit at localhost.log
>
>     /var/log/rabbitmq/rabbit at localhost-sasl.log
>
> Stack trace:
>
>     [{rabbit_node_monitor,write_cluster_status,1,[]},
>
>      {rabbit_node_monitor,prepare_cluster_status_files,0,[]},
>
>      {rabbit,'-boot/0-fun-1-',0,[]},
>
>      {rabbit,start_it,1,[]},
>
>      {init,start_it,1,[]},
>
>      {init,start_em,1,[]}]
>
> =INFO REPORT==== 18-Jan-2014::15:32:08 ===
>
> Error description:
>
>     {error,corrupt_cluster_status_files,[]}
>
> Log files (may contain more information):
>
>     /var/log/rabbitmq/rabbit at localhost.log
>
>     /var/log/rabbitmq/rabbit at localhost-sasl.log
>
> Stack trace:
>
>     [{rabbit_node_monitor,'-prepare_cluster_status_files/0-fun-0-',1,[]},
>
>      {rabbit_node_monitor,prepare_cluster_status_files,0,[]},
>
>      {rabbit,'-boot/0-fun-1-',0,[]},
>
>      {rabbit,start_it,1,[]},
>
>      {init,start_it,1,[]},
>
>      {init,start_em,1,[]}]
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>


-- 
Simon MacMullen
RabbitMQ, Pivotal


More information about the rabbitmq-discuss mailing list