[rabbitmq-discuss] Cluster size limit

Advait Alai advaitalai at gmail.com
Mon Mar 28 08:14:08 BST 2011


Alexis,

Thanks for the inputs.

On Mon, Mar 28, 2011 at 2:55 AM, Alexis Richardson <alexis at rabbitmq.com>wrote:

> Advait
>
> There is no benefit to clustering terminal nodes which (as I
> understand it) are not sharing internal state with each other.
> Instead look to share queues and exchanges by running them on brokers,
> and connecting multiple terminal nodes to those brokers.
>

I am running the entire setup -- VMs and the network on a single system. Does
that count as sharing the same internal state? The VMs mount the same bare
file system, so, for example, I could skip replicating erlang-cookies before
clustering (/var/lib/rabbitmq is common). Of course, they have
virtualized/individual network stacks and interfaces.


> You should run RabbitMQ clients at terminal nodes.  Connect them to
> brokers running at non-terminal nodes.  For broker-broker links, use
> Shovel.  We are working on a federation model that is more flexible
> than shovel - if you are interested, talk to Simon (cc'd).
>

Ah, this might be a problem, as I cannot run any third party applications on
non terminal nodes. These nodes are primitives of NS-3 -- just objects
responsible for routing etc. Since I did not have any control on non
terminal nodes, I had to cluster terminal nodes and assumed that the
middleware would take care of setting up overlay paths from publishers to
subscribers. My bad, should've understood how clustering worked before
starting :-/

But coming back to the errors cropping up, is there a way to increase the
cluster limit size, even if there is no added benefit of clustering this
way?

@Simon -- could you please direct me to any documentation for the federation
model I could use here? Thanks


> On Sun, Mar 27, 2011 at 9:33 PM, Advait Alai <advaitalai at gmail.com> wrote:
> >> However - RabbitMQ clustering is not designed as a way to create
> >> pubsub overlays for the wide area.  Its primary goal is scalability of
> >> one broker by adding nodes on the same LAN.  This is for, eg, cases
> >> where the number of subscriptions (or "bindings") on the broker grows
> >> beyond what one machine can physically cope with.
> >>
> >> Now, this does not stop you wiring up 1,000s of RabbitMQ brokers.  But
> >> using RabbitMQ clustering is the wrong way to do that.  Can you tell
> >> us a bit more about the pubsub topology you want to study, please?
> >> That may help us suggest the optimal way to solve your problem..
> >
> > The topology looks like this: http://imgur.com/8iep0 (taken
> > from
> http://ctieware.eng.monash.edu.au/twiki/bin/view/Simulation/LargePacket-switchingNetworkTopologies
> )
> > The terminal nodes shown in the topology are either publishers or
> > subscribers, and I want RabbitMQ to run as a middleware over the entire
> > network. The network (routers and links) are simulated by NS3, So it is
> > essentially a set of point to point links, where each link is a separate
> > network. The terminal nodes are VMs (that would run the amqp/rabbitmq
> > scripts) which I am attempting to cluster so they can use a common set of
> > queues and exchanges. So all terminal nodes are in different LANs.
> >>
> >> Also, can you explain why QoS studies require the study of 1,000s of
> >> nodes?  Perhaps you mean 1,000s of clients?
> >
> > I'm not sure I understood -- I'm trying to provide high fidelity results,
> so
> > I thought the testbed should be atleast of the same order of size as the
> > network on which it needs to be deployed (>50000 nodes). There are a few
> > strict IEEE standards regarding this network that specify the minimum QoS
> > guarantees required.
> > Is there an easier way to do this?
> > Thanks
> >>
> >> On Sun, Mar 27, 2011 at 7:11 PM, Advait Alai <advaitalai at gmail.com>
> wrote:
> >> > Thanks for the reply --
> >> > On Sun, Mar 27, 2011 at 9:38 PM, Jerry Kuch <jerryk at vmware.com>
> wrote:
> >> >>
> >> >> 150 is a pretty big sounding cluster...  Out of curiosity, what's
> >> >> motivating you to go so big (if you don't mind saying)?
> >> >
> >> > I am doing a QoS analysis of publish-subscribe overlays using RabbitMQ
> >> > in
> >> > wide area networks (These are country wide networks, so even a 1000
> >> > nodes
> >> > might be insufficient :-)) So stuff like packet delay, loss,
> >> > out-of-order
> >> > delivery etc.
> >> >
> >> >>
> >> >> On that note, because RabbitMQ clustering is based on Erlang
> >> >> distribution,
> >> >> the current practical limit you'll probably run up against is
> somewhat
> >> >> lower
> >> >> than the 150 you have in mind.  Something more like 32 to 64.
> >> >
> >> > Is there a configuration that would let me scale to >64 nodes, even if
> >> > it
> >> > would not be practical? And in case Erlang does not scale well, I'll
> >> > probably have to resort to entirely another middleware -- any
> >> > suggestions
> >> > that would work on a larger number of nodes?
> >> >
> >> >>
> >> >> If you can say more about your goals it's likely that someone on the
> >> >> Rabbit team can suggest something helpful.
> >> >
> >> > The analysis I'm carrying out on >100 nodes is actually on a single
> >> > system.
> >> > These 'nodes' are actually many light weight linux containers (more or
> >> > less
> >> > virtual machines) connected by a simulated NS3 network topology. But I
> >> > doubt
> >> > this would be the cause of the clustering problem, as <50 nodes were
> >> > clustering without any difficulty.
> >> >
> >> >>
> >> >> On Mar 26, 2011, at 10:32 PM, "Advait Alai" <advaitalai at gmail.com>
> >> >> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I'm trying to add 150 nodes to a RabbitMQ cluster. After around 50
> >> >> > nodes, the stop-reset-cluster-start iteration starts giving the
> >> >> > error:
> >> >> >
> >> >> > Stopping node rabbit at node130 ...
> >> >> > ...done.
> >> >> > Resetting node rabbit at node130 ...
> >> >> > ...done.
> >> >> > Clustering node rabbit at node130 with [rabbit at node117] ...
> >> >> > ...done.
> >> >> > Starting node rabbit at node130 ...
> >> >> > Error: {cannot_start_application,rabbit,
> >> >> >            {bad_return,
> >> >> >                {{rabbit,start,[normal,[]]},
> >> >> >                 {'EXIT',{rabbit,failure_during_boot}}}}}
> >> >> >
> >> >> > Note that I am sequentially adding nodes to build a cluster (as an
> >> >> > initialization step) before creating any queues/exchanges or
> running
> >> >> > any
> >> >> > amqp script.
> >> >> >
> >> >> > How do I solve this problem? Is it because RabbitMQ imposes a hard
> >> >> > cluster size limit?
> >> >> >
> >> >> > Also, does RabbitMQ scale well to around 1000 nodes?
> >> >> >
> >> >> > Thanks
> >> >> > _______________________________________________
> >> >> > rabbitmq-discuss mailing list
> >> >> > rabbitmq-discuss at lists.rabbitmq.com
> >> >> >
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> >> >
> >> >
> >> > _______________________________________________
> >> > rabbitmq-discuss mailing list
> >> > rabbitmq-discuss at lists.rabbitmq.com
> >> > https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20110328/1b0ec0f6/attachment.htm>


More information about the rabbitmq-discuss mailing list