Count me in too. Heck, (he said generously), make it $10!<br><br>Seriously, though, I don't want to be an ass about this, but I would think that there are now enough people (namely, non-paying customers ;) with enough valid use cases making enough noise about this particular issue, to warrant bumping it to the top of the Rabbit architectural issues list. When someone cannot even restart a node to drain the persister log after the node crashes with OOM, and has to delete the persister log, you know that there's a significant design deficiency to be remedied post-haste.<br>
<br>Please believe me when I tell you that I am not trying to beat you up about it or be nasty or unfair - I have great regard for the Cohesive/LShift/Rabbit team - but I think it's reasonable to say that it's time to spend the estimated 3 work-weeks (I assume 120 work-hours) to remedy this. The cost of this should not be extreme. Let me put my project-manager hat on here... unless I am badly out of touch, one of your team shouldn't cost much more than, say, $80/hour at contract rates, so you are talking $10K to fix this (if I am wrong, and the rates are higher than that, PLEASE can I work for you? ;). Is there a way you could beg, borrow or venture-capital this to fast-track it? Or, being more constructive, how about 50 Rabbit users contributing $200 each? 25 @ $400? I would gladly contribute US$200-$400 to get this done (really). I wish I had $10K to just pay for this but sadly I don't.<br>
<br>What do you say, Rabbit list? Would 49 of you PayPal $200 to get this done?<br><br>I agree there is <i>maybe</i> a workaround by bumping up the swap space to a big number, but I see that as an emergency measure. I tried an experiment last night where I started an Erlang node in smp -disable mode (so that I had some free CPUs to kill it if things got hairy), and ran a tight loop that grew a list very quickly. I have a 20GB swap partition on Ubuntu x86_64, 4-core, 8GB RAM. The Erlang node to got 8GB resident, and just before (and after) that, Linux starting trimming the working sets of the other processes to feed the hungry node. If I read the "top" display right, it managed to trim the working sets of most other processes down to around 5 - 10MB each (amazing!). The swap space used went up to 10GB, the Erlang node up to 14.5GB of virtual memory, 6.5GB of resident memory. At that point the system ran like an anaesthetized snail, even though there were 3 almost idle CPUs. Command-line and GUI response times went sky-high, probably because everything was swapped half to death. The Erlang shell of the memory hog didn't even respond to Ctrl-C any more. It almost seemed as the node hung, but I can't swear to that. The good news is that I was able to kill the gluttonous node, and the system didn't crash (although it likely would have if it ran out of swap space) and it recovered perfectly. Did I mention I love Linux?<br>
<br>I'd like to repeat this test using Rabbit, by feeding it a metric grillion persistent messages without draining, stopping it before it crashes, and seeing if I can get it to recover once the swap file is almost full (by starting consumers). Unless someone has done this already?<br>
<br>I suppose if one did run out of swap and got a panic crash, one could add a big fat terabyte disc (or logical volume) to the machine and put a humongous swap partition there, then restart and let Rabbit try to recover the persister log. Theoretically, it should be able to do so - eventually - because the swap space should now be large enough to take the entire persister log's memory-resident bits. One might need to set up the Linux kernel flags suitably to prevent the process killer from killing the Rabbit process before it has a chance to drain the swamp, I mean queue.<br>
<br>Just to reiterate: Rabbit is a great product and you are a great team. This is not a "moan".<br><br>Regards,<br>Edwin<br><br><br><div class="gmail_quote">On Sun, Nov 16, 2008 at 3:41 PM, Ben Hood <span dir="ltr"><<a href="mailto:0x6e6562@gmail.com">0x6e6562@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Ez,<br>
<br>
On Sun, Nov 16, 2008 at 7:36 PM, Ezra Zygmuntowicz <<a href="mailto:ez@engineyard.com">ez@engineyard.com</a>> wrote:<br>
> I got 5 on it ;)<br>
<br>
Maybe we need to get one of those Paypal Donate buttons........ ;-)<br>
<div><div></div><div class="Wj3C7c"><br>
Ben<br>
<br>
_______________________________________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br>
<a href="http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>
<br>
</div></div></blockquote></div><br>