[rabbitmq-discuss] last sticky wicket on map/reduce
Jon Brisbin
jon.brisbin at npcinternational.com
Fri Oct 1 14:11:44 BST 2010
I had not really looked at the spring integration stuff for a solution. It looks interesting, though.
Thanks for the link...
Jon Brisbin
Portal Webmaster
NPC International, Inc.
On Sep 30, 2010, at 4:19 PM, Shane Witbeck wrote:
> Have you thought about using an Aggregator? Spring Integration offers this:
>
> http://static.springsource.org/spring-integration/reference/htmlsingle/spring-integration-reference.html#aggregator
>
> I think Apache Camel offers this too. Both might be overkill in your case but maybe a look at how they're doing it will help.
>
> HTH,
> Shane
>
>
> On Thu, Sep 30, 2010 at 4:41 PM, Jon Brisbin <jon.brisbin at npcinternational.com> wrote:
> I've got a pest of a sticky wicket in my map/reduce implementation that's using Groovy for the logic and RabbitMQ for the plumbing. It's frustrating because I'm so close.
>
> The problem I'm having is knowing when I'm finished. Using data like this:
>
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> END
>
> The "END" goes through a separate consumer thread because it goes out on a fanout exchange (it has to go to all workers), so it comes in out-of-order from the other data:
>
> 1 2 3
> END
> 4 5 6
> END END
> 7 8 9
> ...etc...
>
> I can sort of work around this by keeping track of id changes in my consumers using the classic "if this.id != last.id" approach. But the last record is a tricky one because there's no key change event to trigger sending the response back. Unless I simply wait until a timeout has occurred, I'm not sure how I can tell when I've collected all the responses I'm going to get.
>
> The problem is that I only know how many message I've sent and not how many to expect in return. emit() can be called multiple times from a map phase and the reduce phase can take (records per key) * (emitted) and either rereduce the result or reply back to the requestor. The requestor shouldn't know whether the result has been rereduced or not. It should simply process the return values.
>
> What am I missing to handle situations like this? Should I introduce another component to this that keeps track of how many messages are sent and received? Maybe put a ZooKeeper install in somewhere and coordinate all this? I've already got Riak integrated, though I'd think ZooKeeper would be better at managing concurrent updates.
>
> Any help or suggestions here would be greatly appreciated! :)
>
> Jon Brisbin
> Portal Webmaster
> NPC International, Inc.
>
>
>
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>
>
>
> --
> -Shane
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/attachments/20101001/5e0578c0/attachment-0001.htm>
More information about the rabbitmq-discuss
mailing list