[rabbitmq-discuss] performance with thousands of auto_delete queues
Ask Solem
ask at rabbitmq.com
Thu Nov 17 16:20:18 GMT 2011
On 17 Nov 2011, at 14:42, Muharem Hrnjadovic wrote:
>>
> We do need the results.
>
>> You can do so by upgrading to the latest version (where results are
>> disabled by default), or setting @task(ignore_result=True) for
>> individual tasks, or CELERY_IGNORE_RESULT=True globally.
>
>> An auto_delete queue is only deleted when it's empty,
>> so you have to collect the results.
> How does one collect the results? We do
>
> result = TaskSet(tasks=subtasks).apply_async()
>
> # Wait for all subtasks to complete.
> while not result.ready():
> time.sleep(0.25)
>
> the_results = result.join()
>
> Is there something we need to do beyond that?
>
This would collect the result, but maybe there are cases where it's not collected,
or you have so many tasks that using one queue per task is not feasible.
If the process to publish the task, and to collect the result is always the same,
you can use reply-to style replies (one queue per publisher, instead of
one queue per task). There's no built-in support for this in Celery, but adding
the capability to your task should be fairly easy.
Of course if there are as many publishers as there are tasks, then this doesn't help much.
The best thing you can do right now is to set an expiry
time for the results, that would probably help at least in the short term.
Also you could consider using a database, Redis or memcached to store the
results in. The downside then is that you have to use polling to retrieve
the results in the way you are doing (join).
Note also that you should never wait for the results of a subtask within a task:
@task
def X():
r = TaskSet(Y.subtask((i, ))
for i in iterable).apply_async()
r.join() # <-- VERY BAD
This is bad because it can result in a deadlock;
Imagine that there are only 5 worker process available,
and that there are 5 X tasks currently running. In this scenario
there are no more worker processes to finish the subtasks X
is waiting for: it deadlocks and waits forever.
See http://docs.celeryproject.org/en/latest/userguide/tasks.html#avoid-launching-synchronous-subtasks
Instead you should use callbacks for single task invocations and chords
for groups of tasks that needs to be synchronized (TaskSet callbacks).
http://docs.celeryproject.org/en/latest/userguide/tasksets.html
The Celery master branch in git contains support for chords when using
the AMQP result backend too (will be part of 2.5.0).
>
More information about the rabbitmq-discuss
mailing list