[rabbitmq-discuss] Queue inspector.

Fri Nov 9 11:46:39 GMT 2007

Brian

Very interesting stuff :-)

I'm addressing a subset of your questions since Ben has sensibly
forked off the conversation about conversational messaging.

On Nov 7, 2007 8:42 PM, Brian Granger <ellisonbg.net at gmail.com> wrote:
>
> We are using this architecture for parallel computing in Python.  Our
> workers are basically Python interpreters exposed to the network.  Our
> "clients" are also python scripts that submit python code/objects to
> the workers to be performed.  As you can imagine I am looking at the
> qpid python amqp client.  We have run already with up to 256 workers
> and we already see scaling problems.

Is this with RabbitMQ as is?  Can you shed a little more light on this please.

> Our stuff works just fine on
> supercomputers, so we would like to scale up to.....2**10, 2**11,
> 2**12, etc.  We hit scaling problems in a number of ways:
>
> 1) # of file descriptors per process.
> 2) broadcasting/multicasting to all (or large subsets) of workers.
> 3) Latency - the latency determines the grain size of work, which we
> would like to be as small as possible.

If you could share some of the numbers with the list, that would be
very interesting.

> 4) Large messages.  In some cases, the messages can be very large
> (files or large data sets).  It is not uncommon for people to want to
> distribute 100 MB messages to subsets of workers.  Our current
> solution generates a short lived temporary copy of the large objects
> for each worker they are sent to.  This means if we have 128 workers,
> we use 128*msg_size memory for a short period of time.

Eek.  And does each worker need to have a local copy of the whole
message, every time, to work from?

(As an aside one fun idea we have looked at is bittorrent style
sharing of data between broker peers, essentially implementing a DHT).

> We know how to
> fix this in Python, but it still doesn't solve the other scalability
> problems.  All that to say, any solution that we come up with must
> also be able to handle large messages well.  Does rabbitmq have any
> streaming support?

I shall leave this one for Tony.

> I would love to avoid the lower level stuff, but I also realize we
> have a somewhat unusual set of design requirements (scalability,
> low-latency && ability to handle large messages).  In various
> prototypes we even implemented custom protocols on top of TCP.  Once
> you start doing that Erlang becomes a "high-level" solution :)

Indeed :-)

I still think you should be ok staying above the waterline and coding
to RabbitMQ, at least for now.

> > Cool.  Please let us know if you need help, eg with the deregistration methods.
>
> Ahh yes, the deregistration methods could be subtle.  The main concern
> there is if a worker dies without notifying the server that they are
> going to die.  What facilities does rabbitmq/amqp have to handle this
> type of stuff?

Do you mean in the sense of leasing?  I'll see what we can say about that.

One point to note in passing is that Erlang/OTP provides the notion of
a 'supervisor', see eg here:
http://www.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&pName=dso_level1&path=dsonline/2007/12&file=w6tow.xml&xsl=article.xsl&;jsessionid=Hp7BjTtbB6JYdhP2H6p6v61x4GFCfp0CF6lXF59Y7dTvjCdGFLPG!1017941068

alexis

-- 
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)