[rabbitmq-discuss] last sticky wicket on map/reduce

Alexis Richardson alexis at rabbitmq.com
Fri Oct 1 14:51:42 BST 2010


Why can't you use a checksum instead? Each time you create a set of n
subtasks from some task T, attach a fraction m/n to each subtask where m is
the fraction attached to T. Start with m equals 1. The sum of the fractions
will always be 1. No need for shared counters...

On Oct 1, 2010 3:35 PM, "Jon Brisbin" <jon.brisbin at npcinternational.com>
wrote:

I'm also wondering if anyone uses counts to determine when a job is finished
or not. By that I mean, increment a counter for every outgoing message and
decrement the counter when a response is received. In the case of a
map/reduce job, I'd need to do something like:

SQL -> Map phase = +1 (per row)
Map phase -> Reduce phase = -1 (that we got the original msg) +1 * (num of
emit's)
Reduce phase -> Response|ReReduce = -1 (for emit's) +1 (for
response/rereduce)
[ReReduce -> Response] = -1 +1 (for sending response)
Response = -1

Essentially, each step would decrement a counter for the incoming message
and increment the counter for the outgoing message. A reduce phase might
decrement the counter 1000 times and increment it once. But since the map
phase incremented it 1000 times prior, the count after map/reduce would be
"1". The response listener would then decrement the counter when it
processed the response, see that it's now zero, and know to continue.

If my goal is to beat processing times on the AS/400 when doing large
financial calculations (daily acct'g reports take several hours to
generate), I can't really depend on timeouts to make sure I've gathered all
my results. I want the job to return as soon as results are ready. I'd like
to go to management and show them a 2 hr -> 15 min improvement by using
parallel processing.

I'm just wondering if using ZooKeeper or similar to do distributed,
synchronized counters will have enough atomicity to not miss a count
incr/decr. If I miss even one, I'm screwed because it'll never get back to
zero (or get there prematurely).

I need a sentence with a question mark or this will definitely go
unanswered: are message counters like this a good way to monitor
asynchronous, distributed processing state?

Thanks! :)

Jon Brisbin Portal Webmaster NPC International, Inc.

On Oct 1, 2010, at 8:11 AM, Jon Brisbin wrote: > I had not really looked at
the spring integration ...

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq-discuss at lists.rabbitmq.com
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20101001/f8d07728/attachment-0001.htm>


More information about the rabbitmq-discuss mailing list