[rabbitmq-discuss] Strange subtract_acks crash in RabbitMQ 3.1.0-1 (case_clause empty) on EC2

Karl Rieb karl.rieb at gmail.com
Mon May 20 15:12:28 BST 2013

Brett Cameron <brett.r.cameron at ...> writes:

> Karl,I am not aware of any known issues in this regard (queues strangely 
disapearing). The fact that you get the problem with different versions of 
Erlang and dofferent versions of RabbitMQ would make me suspect that it is 
more likely to be something external to the broker environment. The "INFO: 
task beam.smp:18971 blocked for more than 120 seconds." messages in syslog 
are interesting. What I think is happening here is as follows: By default 
Linux uses up to 40% of the available memory for file system caching and 
if/when this threshold is reached the file system flushes all outstanding 
cached data to disk, causing all following I/O's become synchronous (until 
the flush completes). There is a default time limit of 120 seconds for the 
flush to complete. If you're EC2 VM's have a lot of RAM and are processing 
heavy I/O-intensive workloads, it is possible that the EBS volumes might 
sometimes not be able to keep up (or possibly there's some spike in network 
activity and available bandwidth goes through the floor). You have mentioned 
that your messages are not persistent, but RabbitMQ's memory usage might be 
such that it is occasionally deciding to flush a whole pile of stuff to 
disk. I notice that you have  {vm_memory_high_watermark,0.8}, and this could 
have something to do with the problem - maybe try reducing this, and/or 
maybe set vm.dirty_ratio to something less than 40% in /etc/sysctl.conf 
(this might increase the frequency of flushes, but you'll be flushing less 
data). I'd possibly start by reducing vm_memory_high_watermark. I can't 
really correlate this with your queue disappearing, but if I/O's are getting 
messed up then all manner of weird things could happen I guess. Hopefully 
others might have some more tangible ideas!
> Brett

Hi Brett,

Thanks for the reply!  I will go ahead and reduce the watermark limit and 
configure the kernel to flush earlier to disk.  Hopefully this issue will go 
away after the changes.


