[rabbitmq-discuss] Queue Paging - Disk Overflow

Edwin Fine rabbitmq-discuss_efine at usa.net
Mon Nov 17 00:15:29 GMT 2008


Count me in too. Heck, (he said generously), make it $10!

Seriously, though, I don't want to be an ass about this, but I would think
that there are now enough people (namely, non-paying customers ;) with
enough valid use cases making enough noise about this particular issue, to
warrant bumping it to the top of the Rabbit architectural issues list. When
someone cannot even restart a node to drain the persister log after the node
crashes with OOM, and has to delete the persister log, you know that there's
a significant design deficiency to be remedied post-haste.

Please believe me when I tell you that I am not trying to beat you up about
it or be nasty or unfair - I have great regard for the
Cohesive/LShift/Rabbit team - but I think it's reasonable to say that it's
time to spend the estimated 3 work-weeks (I assume 120 work-hours) to remedy
this. The cost of this should not be extreme. Let me put my project-manager
hat on here... unless I am badly out of touch, one of your team shouldn't
cost much more than, say, $80/hour at contract rates, so you are talking
$10K to fix this (if I am wrong, and the rates are higher than that, PLEASE
can I work for you? ;). Is there a way you could beg, borrow or
venture-capital this to fast-track it? Or, being more constructive, how
about 50 Rabbit users contributing $200 each? 25 @ $400? I would gladly
contribute US$200-$400 to get this done (really). I wish I had $10K to just
pay for this but sadly I don't.

What do you say, Rabbit list? Would 49 of you PayPal $200 to get this done?

I agree there is *maybe* a workaround by bumping up the swap space to a big
number, but I see that as an emergency measure. I tried an experiment last
night where I started an Erlang node in smp -disable mode (so that I had
some free CPUs to kill it if things got hairy), and ran a tight loop that
grew a list very quickly. I have a 20GB swap partition on Ubuntu x86_64,
4-core, 8GB RAM. The Erlang node to got 8GB resident, and just before (and
after) that, Linux starting trimming the working sets of the other processes
to feed the hungry node. If I read the "top" display right, it managed to
trim the working sets of most other processes down to around 5 - 10MB each
(amazing!). The swap space used went up to 10GB, the Erlang node up to
14.5GB of virtual memory, 6.5GB of resident memory. At that point the system
ran like an anaesthetized snail, even though there were 3 almost idle CPUs.
Command-line and GUI response times went sky-high, probably because
everything was swapped half to death. The Erlang shell of the memory hog
didn't even respond to Ctrl-C any more. It almost seemed as the node hung,
but I can't swear to that. The good news is that I was able to kill the
gluttonous node, and the system didn't crash (although it likely would have
if it ran out of swap space) and it recovered perfectly. Did I mention I
love Linux?

I'd like to repeat this test using Rabbit, by feeding it a metric grillion
persistent messages without draining, stopping it before it crashes, and
seeing if I can get it to recover once the swap file is almost full (by
starting consumers). Unless someone has done this already?

I suppose if one did run out of swap and got a panic crash, one could add a
big fat terabyte disc (or logical volume) to the machine and put a humongous
swap partition there, then restart and let Rabbit try to recover the
persister log. Theoretically, it should be able to do so - eventually -
because the swap space should now be large enough to take the entire
persister log's memory-resident bits. One might need to set up the Linux
kernel flags suitably to prevent the process killer from killing the Rabbit
process before it has a chance to drain the swamp, I mean queue.

Just to reiterate: Rabbit is a great product and you are a great team. This
is not a "moan".

Regards,
Edwin


On Sun, Nov 16, 2008 at 3:41 PM, Ben Hood <0x6e6562 at gmail.com> wrote:

> Ez,
>
> On Sun, Nov 16, 2008 at 7:36 PM, Ezra Zygmuntowicz <ez at engineyard.com>
> wrote:
> >        I got 5 on it ;)
>
> Maybe we need to get one of those Paypal Donate buttons........ ;-)
>
> Ben
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20081116/8ebbddec/attachment.htm 


More information about the rabbitmq-discuss mailing list