[rabbitmq-discuss] last sticky wicket on map/reduce
jon.brisbin at npcinternational.com
Thu Sep 30 21:41:51 BST 2010
I've got a pest of a sticky wicket in my map/reduce implementation that's using Groovy for the logic and RabbitMQ for the plumbing. It's frustrating because I'm so close.
The problem I'm having is knowing when I'm finished. Using data like this:
The "END" goes through a separate consumer thread because it goes out on a fanout exchange (it has to go to all workers), so it comes in out-of-order from the other data:
1 2 3
4 5 6
7 8 9
I can sort of work around this by keeping track of id changes in my consumers using the classic "if this.id != last.id" approach. But the last record is a tricky one because there's no key change event to trigger sending the response back. Unless I simply wait until a timeout has occurred, I'm not sure how I can tell when I've collected all the responses I'm going to get.
The problem is that I only know how many message I've sent and not how many to expect in return. emit() can be called multiple times from a map phase and the reduce phase can take (records per key) * (emitted) and either rereduce the result or reply back to the requestor. The requestor shouldn't know whether the result has been rereduced or not. It should simply process the return values.
What am I missing to handle situations like this? Should I introduce another component to this that keeps track of how many messages are sent and received? Maybe put a ZooKeeper install in somewhere and coordinate all this? I've already got Riak integrated, though I'd think ZooKeeper would be better at managing concurrent updates.
Any help or suggestions here would be greatly appreciated! :)
NPC International, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rabbitmq-discuss