[rabbitmq-discuss] Node scheduling with RabbitMQ
Ask Solem
askh at opera.com
Sat Aug 7 10:04:35 BST 2010
On Aug 7, 2010, at 10:43 AM, Ask Solem wrote:
>>
>
> cache-aware? You mean data locality?
>
> I've been contemplating this as well.
> I think this could be solved by having a proxy in front of RabbitMQ, that routes
> the messages to the correct hosts (using routing keys).
>
> E.g. if each worker is also consuming from a private queue, this proxy could
> just reroute the message so the task is received by a single worker.
On second thought it doesn't need to be a proxy at all,
just some service knowing what worker(s) is near the data.
node1.example.com consumes from
queue: tasks exchange: tasks type: direct
queue: node1.example.com exchange: directly_to type: direct
node2.example.com consumes from
queue: tasks exchange: tasks rkey=tasks
queue: node2.example.com exchange: directly_to rkey=node2.example.com
Then these workers will receive tasks from the "tasks" queue in round robin style,
and it's also possible to send tasks directly to one or more workers.
When you publish a task that could benefit from optimal data locality, you consult
the central authority to know what nodes is closest to the file(s):
routing_key = node_close_to(file)
basic_publish(message, exchange="directly_to", routing_key=routing_key)
This doesn't take into account failing nodes, but maybe you could use
the immediate flag for this.
As the maintainer of Celery (http://celeryproject.org), I'm very interested
in what you end up with here.
--
{Ask Solem,
+47 98435213 | twitter.com/asksol }.
More information about the rabbitmq-discuss
mailing list