[rabbitmq-discuss] AWS clustering
Glade
gladed at gmail.com
Wed Sep 5 04:01:38 BST 2012
Hi all,
In a project I'm working on we set up a cluster of application nodes, each
with RabbitMQ installed. Everybody can talk to everybody, and we can scale
the number of nodes pretty easily. But on more than one occasion, we have
seen mnesia become partitioned. You've seen this before:
*Mnesia('rabbit at app-6'): ** ERROR ** mnesia_event got
{inconsistent_database, running_partitioned_network, 'rabbit at app-5'}*
As best as we can tell, this is caused by temporary network outages, or
possibly high-load conditions, or possibly the nexus of both. However it
happens, you end up with one or more nodes down for the count with
non-deterministic behavior (messages sent to that node may or may not reach
other nodes). It doesn't recover until you *manually* stop_app/start_app.
And if it happened to be a disc node, *rm -rf
/var/lib/rabbitmq/mnesia/rabbitmq/** in between.
For a supposedly "just works" kind of service, that is just not good
enough. I can't have my ops people rolling out of bed to take action every
time there's a minor network glitch. So, I either need to provide a network
that never becomes partitioned (does such a network exist? Certainly not at
AWS!), or I need to drop clustering and have a single RabbitMQ server which
won't scale, or I need to cobble together some kind of automated supervisor
which is certain not to handle all cases, or I need to use a different
messaging tool.
Please, somebody dispute my conclusion because I would love to continue
using RabbitMQ.
Best regards,
Glade
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20120904/ed3a8ae9/attachment.htm>
More information about the rabbitmq-discuss
mailing list