Matthew, thanks very much for your comments. I'll just add a brief top-post note:<div><br></div><div>After posting the message below and reading your remarks my thinking is a better design is to just not depend on messages never being "lost". (By lost I was probably thinking once published that they would be consumed in a few minutes of time regardless of a server failure.)</div>
<div><br></div><div>If we design with the knowledge that a message could be lost and thus be prepared to publish again when necessary then our messaging system is much more simple -- and probably less fragile and more reliable as a result. Might not work for all businesses, but likely for ours.</div>
<div><br></div><div>We have a database for recording state. The messaging system is about changing state in many cases. So, for important tasks we should be able to look at the state and say "this was suppose to be done an hour ago, so queue it again." A trickier part is determining if a job was actually lost or is just still in the queue. </div>
<div><br></div><div>A much more important task is just making sure that clients can publish messages when needed.</div><div><br></div><div>Oh, yes Celery has a lot of nice features that are probably commonly needed by applications. I've been testing it for the last week, and even know we are not a Python shop I suspect we could still use it to fork off other jobs very easily.</div>
<div><br></div><div>Thanks,</div><div><br></div><div><br><br><div class="gmail_quote">On Fri, Jan 14, 2011 at 8:20 AM, Matthew Sackman <span dir="ltr"><<a href="mailto:matthew@rabbitmq.com">matthew@rabbitmq.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hi Bill,<br>
<div class="im"><br>
On Tue, Jan 11, 2011 at 07:57:19PM -0800, Bill Moseley wrote:<br>
> The High Availablity docs at <a href="http://www.rabbitmq.com/pacemaker.html" target="_blank">http://www.rabbitmq.com/pacemaker.html</a> seem<br>
> pretty thorough. Are there any other approaches commonly used for HA? What<br>
> about for syncing between data centers? Can anyone discuss their HA approach<br>
> if it differs from the link above?<br>
<br>
</div>There are a couple of other things to mention. One is that work is<br>
currently been done on active/active HA, but that will provide<br>
"mirrored" queues (think RAID-1) within a cluster only. The other thing<br>
to mention is the shovel plugin, which can be used for a very<br>
rudimentary form of federation between different brokers. The main<br>
limitation with the shovel though is that its configuration can't be<br>
dynamically changed, so it's only really a sensible solution if you're<br>
topology is mainly static.<br>
<div class="im"><br>
> We are evaluating messaging systems and the question of HA has come up<br>
> frequently. The concern is that some items, once entered onto the queue,<br>
> should never be lost -- even if the entire data center goes down.<br>
<br>
</div>Right, but is that "lost" as in "provided it's stored on disk, that's<br>
ok, even if it takes us a month to get the data back off disk", or is it<br>
"lost" as it "must always be (near) instantly available"?<br>
<div class="im"><br>
> We are comparing RabbitMQ with writing a system in-house. The in-house<br>
> queue system would use a Postgresql table for the queue with replication<br>
> (currently via Slony) for hot-backup (it's not really HA). We also<br>
> replicate to a secondary data center with the eventual goal of reasonably<br>
> fast tip-over between data centers. We are not in the financial or medical<br>
> industry so nobody's life is at risk if we drop a few jobs. I suspect we<br>
> only need to handle three to five million message a day -- nothing too big.<br>
> (Oddly, one argument against using RabbitMQ was it was overkill for our<br>
> needs.)<br>
<br>
</div>Yeah, rather obviously, we're somewhat biased against building message<br>
brokers on top of databases ;) I guess the things I'd suggest you look<br>
as is whilst you may be fine with postgres at the moment, what happens<br>
in a couple of years time? What will your messaging requirements be<br>
then, and will you have sufficient flexibility in your home-grown system<br>
to be able to cope with those needs?<br>
<div class="im"><br>
> Postgresql and replication is what we use for application data currently, so<br>
> it is a familiar technology for us. Another reason we are considering<br>
> building a custom message queue system is to put more functionality into the<br>
> broker -- such as scheduling and job routing that would be specific to our<br>
> business. And there's fear that nobody knows Erlang if something broke and<br>
> we needed to try and resolve.<br>
<br>
</div>Sure, those are valid concerns. There are ways of extending the<br>
functionality of Rabbit, for example through exchange types, or even<br>
custom plugins, but these do normally require writing in Erlang.<br>
<div class="im"><br>
> My opinion is AMQP is very flexible and we should be able to make it meet<br>
> our needs. We are not doing anything that unusual. And I suspect building<br>
> something as reliable as RabbitMQ is no easy task -- especially if the point<br>
> is to make a system more complex than what RabbitMQ provides. Scheduling,<br>
> for example, seems like something a simple database table and cron could<br>
> solve easily with RabbitMQ.<br>
<br>
</div>Indeed - use the right tool for the job etc. Job scheduling and such are<br>
probably on the boundary of what we consider pure messaging, and so it<br>
does normally require additional client-side applications to add to<br>
Rabbit to provide such functionality. You might like to look at celery<br>
in this space which does job scheduling on top of Rabbit.<br>
<div class="im"><br>
> Another argument for a custom broker was to make better use of workers --<br>
> i.e. the broker would look at load and other factors when determining where<br>
> to send jobs. My feeling here is resources are limited so it's a matter of<br>
> balancing the number and type of consumers with queue load -- and an<br>
> external process can manage starting and stopping consumers easily as demand<br>
> profile changes (by looking at queue sizes and rates) without having to be<br>
> part of the broker. Are there common approaches for dynamically adjusting<br>
> workers?<br>
<br>
</div>I suspect that's something that falls squarely in the remit of tools<br>
like celery. It's definitely outside the scope of Rabbit itself.<br>
<br>
Matthew<br>
_______________________________________________<br>
rabbitmq-discuss mailing list<br>
<a href="mailto:rabbitmq-discuss@lists.rabbitmq.com">rabbitmq-discuss@lists.rabbitmq.com</a><br>
<a href="https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss" target="_blank">https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss</a><br>
</blockquote></div><br><br clear="all"><br>-- <br>Bill Moseley<br><a href="mailto:moseley@hank.org" target="_blank">moseley@hank.org</a><br>
</div>