[rabbitmq-discuss] Newbie Q: Bugs in 2.8.2-1?

Tue Jun 19 22:16:55 BST 2012

Simon,

On Tue, Jun 19, 2012 at 10:26 AM, Simon MacMullen <simon at rabbitmq.com>wrote:

> On 19/06/12 15:16, Eric Bravick wrote:
>
>> For people starting with 2.8.2+, you might consider re-visiting the hard
>> failure and making this a warning state instead...  seems to me there's
>> a valid argument to throw a warning here rather than blocking (however,
>> I could be entirely incorrect about that.)
>>
>
> The trouble is that AMQP doesn't give us the ability to warn-on-publish
> (and even if it did it's easy to imagine that the typical default client
> callback would be a no-op). So it's not clear where a warning would go - we
> already write to logs and show a big red thing in mgmt, but from the number
> of questions we're getting asked it's clear people are not seeing these.
>
>
Ah, I see.  Even if AMQP had a warn-on-publish strategy, that would require
code changes for everyone, so that's a non-starter.  Here, we watch logs
and the management interface - it seems to me (granted I'm from the ops
world more than the dev world) that users of Rabbit should be watching
those...  in this case for a warning that said something about insufficient
disk space to page out the full memory footprint.  If that was OK and
understood, maybe a flag in the config file could be set to silence that
warning.

For example, one of the applications we have under development will be very
high throughput and time sensitive (meaning that the value of the messages
decays very quickly.)  In the event we had a failure that ended up hitting
disk, we'd end up with zero value in the messages anyway, so in that
particular case we'd be running with big memory, almost no disk, and all
warnings suppressed.

> However, there's no denying that this has hurt a lot more people than we
> expected, and we really don't like that.
>
> Its often very difficult to predict this sort of issue in the field.

> So for the next release we're intending to go with a lower disk space
> limit - probably a hard 1GB. This is not ideal - ideally we would always
> have the ability to page the whole of memory out to disk - but it's at
> least somewhat likely to stop you before you *actually* run out of disk
> space and crash, while hopefully not causing the wave of false-positives
> that have inconvenienced people running 2.8.2.
>

Hmmm.  Well, its just my opinion, keep in mind that I'm sure you know far
more than I do about what all the Rabbit users need - but if it were my
choice I'd probably consider that a compromise where no one wins overall -
it lowers the bar.  I certainly understand what you were trying to achieve
with the current limits - more stable and consistent outcomes for the vast
majority of users.  I'd either:

1) Stick to your original guns and let us all adjust to best practices.
2) Move this from a block to a warn (in log + management) (my favorite, it
would not shut people off, but it would firmly tell us all that we should
be aware of *why* we are breaking what is a good practice.)

This option seems less attractive:

3) Assign a lower floor.  You'll move the problem quantitatively, but not
qualitatively...  those that hit the limit will still be confused (you'll
discover how many people have <1GB /var ).  Those who don't have enough
disk to page out memory might be in for a very nasty surprise, since they
won't get warned of the potential issue in advance, they might end up
pretty unhappy in a failure mode.

Again, I might be off base here...

-- 
Regards,
--
-- Eric Bravick, CTO
-- Spring Semantics, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120619/2568de90/attachment.htm>