[rabbitmq-discuss] Confusing disk free space limit warning

Mon Sep 17 08:21:24 BST 2012

Mark,

On 17/09/12 07:43, Mark Hingston wrote:
> As for our situation here, the two log messages that I posted were
> the only messages that existed in the log file. That log file covered
> the 24-hour period previous to when I noticed the issue and at least
> 24 hours before that, most likely back to when rabbitmq was started.
> We didn't see a log saying that the alarm was 'set' previous to the
> clear message.

The alarm must have been set previously, and there must have been a
corresponding log message.

The first message a starting rabbit writes to the logs is usually
something like

=INFO REPORT==== 17-Sep-2012::07:54:14 ===
Limiting to approx 16284 file handles (14653 sockets)

If you don't see that in your logs then they don't go back far enough.

>  The most important behaviour we observed was that our celery worker
> was not receiving rabbit messages, although they seemed to be being
> sent by the producers. So that make me think it's quite likely that
> rabbitmq did think that it was above the disk space threshold and was
> rate limiting producers.

I suggest that in the future you look at the management UI. That makes 
it very clear (things go 'red') when there is an alarm condition and 
when connections are getting throttled or blocked.

> =ERROR REPORT==== 17-Sep-2012::01:17:02 === closing AMQP connection
> <0.11515.221> (10.255.115.80:58122 -> 10.255.115.80:5672):
> {channel0_error,opening, {error,{badarg,{error,bad_module}},
> 'connection.open', [{rabbit_reader,control_throttle,1},
> {rabbit_reader,handle_method0,2}, {rabbit_reader,handle_method0,3},
> {rabbit_reader,handle_input,3}, {rabbit_reader,recvloop,2},
> {rabbit_reader,start_connection,7}, {proc_lib,init_p_do_apply,3}]}}
>
> and the two "accept" / "closing (badarg, bad_module)" logs kept
> repeating as our celery client tried to reconnect. However it was
> never able to reconnect. This message had us a bit baffled.

Something must have been seriously broken. Can you post the rabbit log 
*and sasl log* from around the time of the above message?

> However, at this point the messages that had not been delivered to
> the celery worker process were not all of a sudden delivered - they
> appeared to have vanished. This happened despite the fact that I'm
> confident that our celery queue and messages on that queue were both
> marked as persistent. I'm not sure I understand rate limiting well
> enough to know whether or not I should have expected to see these
> messages be sent to our consumer when we restarted rabbitmq.

Throttling/blocking affects producers (only). Messages published by a 
blocked producer will end up in various buffers at the client / network 
/ server and be lost when the server is restarted. That's just normal 
TCP/IP behaviour.

> Also BTW, maybe I have this wrong, but it seems strange that the
> rabbit documentation (http://www.rabbitmq.com/configure.html) refers
> to the default Disk free limit as 1GB but that our default install
> has it set to 1000000000 , which rabbit reports on startup as "Disk
> free limit set to 953MB". Prob should be 1073741824? (Sorry for the
> massive nitpick) :)

It's the other way round - it should be reported as 1000MB since for 
disk space (unlike memory) the units are decimal. Will be fixed in the 
next release.

Regards,

Matthias.