[rabbitmq-discuss] last sticky wicket on map/reduce

Jon Brisbin jon.brisbin at npcinternational.com
Fri Oct 1 14:11:44 BST 2010


I had not really looked at the spring integration stuff for a solution. It looks interesting, though.

Thanks for the link...

Jon Brisbin
Portal Webmaster
NPC International, Inc.



On Sep 30, 2010, at 4:19 PM, Shane Witbeck wrote:

> Have you thought about using an Aggregator? Spring Integration offers this: 
> 
> http://static.springsource.org/spring-integration/reference/htmlsingle/spring-integration-reference.html#aggregator
> 
> I think Apache Camel offers this too. Both might be overkill in your case but maybe a look at how they're doing it will help.
> 
> HTH,
> Shane
> 
> 
> On Thu, Sep 30, 2010 at 4:41 PM, Jon Brisbin <jon.brisbin at npcinternational.com> wrote:
> I've got a pest of a sticky wicket in my map/reduce implementation that's using Groovy for the logic and RabbitMQ for the plumbing. It's frustrating because I'm so close.
> 
> The problem I'm having is knowing when I'm finished. Using data like this:
> 
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> END
> 
> The "END" goes through a separate consumer thread because it goes out on a fanout exchange (it has to go to all workers), so it comes in out-of-order from the other data:
> 
> 1    2    3
>           END
> 4    5    6
> END  END  
> 7    8    9
> ...etc...
> 
> I can sort of work around this by keeping track of id changes in my consumers using the classic "if this.id != last.id" approach. But the last record is a tricky one because there's no key change event to trigger sending the response back. Unless I simply wait until a timeout has occurred, I'm not sure how I can tell when I've collected all the responses I'm going to get.
> 
> The problem is that I only know how many message I've sent and not how many to expect in return. emit() can be called multiple times from a map phase and the reduce phase can take (records per key) * (emitted) and either rereduce the result or reply back to the requestor. The requestor shouldn't know whether the result has been rereduced or not. It should simply process the return values.
> 
> What am I missing to handle situations like this? Should I introduce another component to this that keeps track of how many messages are sent and received? Maybe put a ZooKeeper install in somewhere and coordinate all this? I've already got Riak integrated, though I'd think ZooKeeper would be better at managing concurrent updates.
> 
> Any help or suggestions here would be greatly appreciated! :)
> 
> Jon Brisbin
> Portal Webmaster
> NPC International, Inc.
> 
> 
> 
> 
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
> 
> 
> 
> 
> -- 
> -Shane

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20101001/5e0578c0/attachment-0001.htm>


More information about the rabbitmq-discuss mailing list