[rabbitmq-discuss] Confusing disk free space limit warning
Mark Hingston
mark.hingston at vardr.com
Mon Sep 17 07:43:17 BST 2012
Thanks for the reply Matthias. The log change sounds like a good idea.
As for our situation here, the two log messages that I posted were the
only messages that existed in the log file. That log file covered the
24-hour period previous to when I noticed the issue and at least 24
hours before that, most likely back to when rabbitmq was started. We
didn't see a log saying that the alarm was 'set' previous to the clear
message. We also didn't see that alarm set afterwards, though we did
observe that for the next ~24 hours after the 'clear' log,
rabbitmq/celery were acting very much like the way others have described
in discussions such as:
https://groups.google.com/forum/#!msg/celery-users/G-tq8SsHkuA/01Ndr2VBZhwJ
The most important behaviour we observed was that our celery worker was
not receiving rabbit messages, although they seemed to be being sent by
the producers. So that make me think it's quite likely that rabbitmq did
think that it was above the disk space threshold and was rate limiting
producers.
Is it possible we need to change our logging settings in order to see
these logs?
Some more information about the issue that we saw:
As commented above, this was the only contents of the log when we
noticed the issue:
=INFO REPORT==== 16-Sep-2012::06:49:17 ===
Disk free space limit now below limit. Free bytes:2000396288
Limit:1000000000
=INFO REPORT==== 16-Sep-2012::06:49:17 ===
alarm_handler: {clear,{resource_limit,disk,rabbit at mq1}}
Then we restarted our single celery worker process and we saw these logs:
=INFO REPORT==== 17-Sep-2012::01:16:58 ===
closing AMQP connection <0.1053.112> (10.255.115.80:59994 ->
10.255.115.80:5672)
=INFO REPORT==== 17-Sep-2012::01:16:58 ===
closing AMQP connection <0.1045.112> (10.255.115.80:59993 ->
10.255.115.80:5672)
=INFO REPORT==== 17-Sep-2012::01:16:59 ===
accepting AMQP connection <0.11515.221> (10.255.115.80:58122 ->
10.255.115.80:5672)
=ERROR REPORT==== 17-Sep-2012::01:17:02 ===
closing AMQP connection <0.11515.221> (10.255.115.80:58122 ->
10.255.115.80:5672):
{channel0_error,opening,
{error,{badarg,{error,bad_module}},
'connection.open',
[{rabbit_reader,control_throttle,1},
{rabbit_reader,handle_method0,2},
{rabbit_reader,handle_method0,3},
{rabbit_reader,handle_input,3},
{rabbit_reader,recvloop,2},
{rabbit_reader,start_connection,7},
{proc_lib,init_p_do_apply,3}]}}
and the two "accept" / "closing (badarg, bad_module)" logs kept
repeating as our celery client tried to reconnect. However it was never
able to reconnect. This message had us a bit baffled.
Then when we restarted rabbitmq, we saw the normal startup logs for
rabbit, including:
...
=INFO REPORT==== 17-Sep-2012::01:25:27 ===
Memory limit set to 677MB of 1692MB total.
=INFO REPORT==== 17-Sep-2012::01:25:27 ===
Disk free limit set to 953MB
...
followed by:
=INFO REPORT==== 17-Sep-2012::01:25:27 ===
accepting AMQP connection <0.287.0> (10.255.115.80:59031 ->
10.255.115.80:5672)
=INFO REPORT==== 17-Sep-2012::01:25:27 ===
accepting AMQP connection <0.295.0> (10.255.115.80:59032 ->
10.255.115.80:5672)
=INFO REPORT==== 17-Sep-2012::01:25:33 ===
accepting STOMP connection <0.350.0> (10.255.115.80:42428 ->
10.255.115.80:61613)
=INFO REPORT==== 17-Sep-2012::01:29:24 ===
accepting AMQP connection <0.2291.0> (10.255.115.80:59376 ->
10.255.115.80:5672)
=INFO REPORT==== 17-Sep-2012::05:17:54 ===
accepting AMQP connection <0.18051.3> (10.255.115.80:47455 ->
10.255.115.80:5672)
And this seemed to be the point at which our celery worker was happy
again and could receive messages.
However, at this point the messages that had not been delivered to the
celery worker process were not all of a sudden delivered - they appeared
to have vanished. This happened despite the fact that I'm confident that
our celery queue and messages on that queue were both marked as
persistent. I'm not sure I understand rate limiting well enough to know
whether or not I should have expected to see these messages be sent to
our consumer when we restarted rabbitmq.
Also BTW, maybe I have this wrong, but it seems strange that the rabbit
documentation (http://www.rabbitmq.com/configure.html) refers to the
default Disk free limit as 1GB but that our default install has it set
to 1000000000 , which rabbit reports on startup as "Disk free limit set
to 953MB". Prob should be 1073741824? (Sorry for the massive nitpick) :)
Thanks,
Mark
More information about the rabbitmq-discuss
mailing list