[rabbitmq-discuss] Node scheduling with RabbitMQ

Sat Aug 7 10:04:35 BST 2010

On Aug 7, 2010, at 10:43 AM, Ask Solem wrote:
>> 
> 
> cache-aware? You mean data locality?
> 
> I've been contemplating this as well.
> I think this could be solved by having a proxy in front of RabbitMQ, that routes
> the messages to the correct hosts (using routing keys).
> 
> E.g. if each worker is also consuming from a private queue, this proxy could
> just reroute the message so the task is received by a single worker.

On second thought it doesn't need to be a proxy at all,
just some service knowing what worker(s) is near the data.

node1.example.com consumes from
    queue: tasks exchange: tasks type: direct
    queue: node1.example.com exchange: directly_to type: direct

node2.example.com consumes from
    queue: tasks exchange: tasks rkey=tasks
    queue: node2.example.com exchange: directly_to rkey=node2.example.com

Then these workers will receive tasks from the "tasks" queue in round robin style,
and it's also possible to send tasks directly to one or more workers. 

When you publish a task that could benefit from optimal data locality, you consult
the central authority to know what nodes is closest to the file(s):

routing_key = node_close_to(file)
basic_publish(message, exchange="directly_to", routing_key=routing_key) 

This doesn't take into account failing nodes, but maybe you could use
the immediate flag for this.

As the maintainer of Celery (http://celeryproject.org), I'm very interested
in what you end up with here.

-- 
{Ask Solem,
 +47 98435213 | twitter.com/asksol }.