[rabbitmq-discuss] Strange subtract_acks crash in RabbitMQ 3.1.0-1 (case_clause empty) on EC2
Karl Rieb
karl.rieb at gmail.com
Mon May 20 15:12:28 BST 2013
Brett Cameron <brett.r.cameron at ...> writes:
>
>
>
> Karl,I am not aware of any known issues in this regard (queues strangely
disapearing). The fact that you get the problem with different versions of
Erlang and dofferent versions of RabbitMQ would make me suspect that it is
more likely to be something external to the broker environment. The "INFO:
task beam.smp:18971 blocked for more than 120 seconds." messages in syslog
are interesting. What I think is happening here is as follows: By default
Linux uses up to 40% of the available memory for file system caching and
if/when this threshold is reached the file system flushes all outstanding
cached data to disk, causing all following I/O's become synchronous (until
the flush completes). There is a default time limit of 120 seconds for the
flush to complete. If you're EC2 VM's have a lot of RAM and are processing
heavy I/O-intensive workloads, it is possible that the EBS volumes might
sometimes not be able to keep up (or possibly there's some spike in network
activity and available bandwidth goes through the floor). You have mentioned
that your messages are not persistent, but RabbitMQ's memory usage might be
such that it is occasionally deciding to flush a whole pile of stuff to
disk. I notice that you have {vm_memory_high_watermark,0.8}, and this could
have something to do with the problem - maybe try reducing this, and/or
maybe set vm.dirty_ratio to something less than 40% in /etc/sysctl.conf
(this might increase the frequency of flushes, but you'll be flushing less
data). I'd possibly start by reducing vm_memory_high_watermark. I can't
really correlate this with your queue disappearing, but if I/O's are getting
messed up then all manner of weird things could happen I guess. Hopefully
others might have some more tangible ideas!
>
>
> Brett
>
Hi Brett,
Thanks for the reply! I will go ahead and reduce the watermark limit and
configure the kernel to flush earlier to disk. Hopefully this issue will go
away after the changes.
-Karl
More information about the rabbitmq-discuss
mailing list