[rabbitmq-discuss] Confusing disk free space limit warning

Mon Sep 17 07:43:17 BST 2012

Thanks for the reply Matthias. The log change sounds like a good idea.

As for our situation here, the two log messages that I posted were the 
only messages that existed in the log file. That log file covered the 
24-hour period previous to when I noticed the issue and at least 24 
hours before that, most likely back to when rabbitmq was started. We 
didn't see a log saying that the alarm was 'set' previous to the clear 
message. We also didn't see that alarm set afterwards, though we did 
observe that for the next ~24 hours after the 'clear' log, 
rabbitmq/celery were acting very much like the way others have described 
in discussions such as:

https://groups.google.com/forum/#!msg/celery-users/G-tq8SsHkuA/01Ndr2VBZhwJ

The most important behaviour we observed was that our celery worker was 
not receiving rabbit messages, although they seemed to be being sent by 
the producers. So that make me think it's quite likely that rabbitmq did 
think that it was above the disk space threshold and was rate limiting 
producers.

Is it possible we need to change our logging settings in order to see 
these logs?

Some more information about the issue that we saw:

As commented above, this was the only contents of the log when we 
noticed the issue:

=INFO REPORT==== 16-Sep-2012::06:49:17 ===
Disk free space limit now below limit. Free bytes:2000396288 
Limit:1000000000

=INFO REPORT==== 16-Sep-2012::06:49:17 ===
     alarm_handler: {clear,{resource_limit,disk,rabbit at mq1}}

Then we restarted our single celery worker process and we saw these logs:

=INFO REPORT==== 17-Sep-2012::01:16:58 ===
closing AMQP connection <0.1053.112> (10.255.115.80:59994 -> 
10.255.115.80:5672)

=INFO REPORT==== 17-Sep-2012::01:16:58 ===
closing AMQP connection <0.1045.112> (10.255.115.80:59993 -> 
10.255.115.80:5672)

=INFO REPORT==== 17-Sep-2012::01:16:59 ===
accepting AMQP connection <0.11515.221> (10.255.115.80:58122 -> 
10.255.115.80:5672)

=ERROR REPORT==== 17-Sep-2012::01:17:02 ===
closing AMQP connection <0.11515.221> (10.255.115.80:58122 -> 
10.255.115.80:5672):
{channel0_error,opening,
                 {error,{badarg,{error,bad_module}},
                        'connection.open',
                        [{rabbit_reader,control_throttle,1},
                         {rabbit_reader,handle_method0,2},
                         {rabbit_reader,handle_method0,3},
                         {rabbit_reader,handle_input,3},
                         {rabbit_reader,recvloop,2},
                         {rabbit_reader,start_connection,7},
                         {proc_lib,init_p_do_apply,3}]}}

and the two "accept" / "closing (badarg, bad_module)" logs kept 
repeating as our celery client tried to reconnect. However it was never 
able to reconnect. This message had us a bit baffled.

Then when we restarted rabbitmq, we saw the normal startup logs for 
rabbit, including:

...

=INFO REPORT==== 17-Sep-2012::01:25:27 ===
Memory limit set to 677MB of 1692MB total.

=INFO REPORT==== 17-Sep-2012::01:25:27 ===
Disk free limit set to 953MB

...

followed by:

=INFO REPORT==== 17-Sep-2012::01:25:27 ===
accepting AMQP connection <0.287.0> (10.255.115.80:59031 -> 
10.255.115.80:5672)

=INFO REPORT==== 17-Sep-2012::01:25:27 ===
accepting AMQP connection <0.295.0> (10.255.115.80:59032 -> 
10.255.115.80:5672)

=INFO REPORT==== 17-Sep-2012::01:25:33 ===
accepting STOMP connection <0.350.0> (10.255.115.80:42428 -> 
10.255.115.80:61613)

=INFO REPORT==== 17-Sep-2012::01:29:24 ===
accepting AMQP connection <0.2291.0> (10.255.115.80:59376 -> 
10.255.115.80:5672)

=INFO REPORT==== 17-Sep-2012::05:17:54 ===
accepting AMQP connection <0.18051.3> (10.255.115.80:47455 -> 
10.255.115.80:5672)

And this seemed to be the point at which our celery worker was happy 
again and could receive messages.

However, at this point the messages that had not been delivered to the 
celery worker process were not all of a sudden delivered - they appeared 
to have vanished. This happened despite the fact that I'm confident that 
our celery queue and messages on that queue were both marked as 
persistent. I'm not sure I understand rate limiting well enough to know 
whether or not I should have expected to see these messages be sent to 
our consumer when we restarted rabbitmq.

Also BTW, maybe I have this wrong, but it seems strange that the rabbit 
documentation (http://www.rabbitmq.com/configure.html) refers to the 
default Disk free limit as 1GB but that our default install has it set 
to 1000000000 , which rabbit reports on startup as "Disk free limit set 
to 953MB". Prob should be 1073741824? (Sorry for the massive nitpick) :)

Thanks,
Mark