[datatable-help] Rolling Joins Replicated in Java MapReduce

Michael Smith my.r.help at gmail.com
Wed Dec 3 07:44:11 CET 2014


Maybe it is easier to build what you're looking for by contributing to 
plyrmr:

https://github.com/RevolutionAnalytics/plyrmr

It already implements "plyr for Hadoop" on top or the rmr2 package. Not 
sure whether merging is already implemented, but using rmr2 it should 
not be prohibitively difficult (hopefully).

Best,
M


On 12/03/2014 11:47 AM, Mike.Gahan wrote:
> Hello all,
>
> I absolutely love the rolling join capabilities of data.table. It is
> extremely useful for the work I do. However, sometimes I work with data that
> is too large to fit into RAM (even when using a large server). I want to
> implement this rolling join code in a Java Map Reduce setting to be able to
> leverage some of the other resources available at the company I work for.
> Unfortunately I am not an experienced Java programmer. I figured that a
> project like this would provide an excellent incentive to learn this skill.
>
> My question is this: what data.table current code for rolling joins would be
> most useful to reference in starting this project? I am guessing the
> bmerge.c code
> <https://github.com/Rdatatable/data.table/blob/master/src/bmerge.c>   has
> much of what I want. Any other code in the data.table package I should be
> aware of? Any other advice that might make this process go more smoothly? I
> know the function is based on a Modified Binary Search algorithm. Are there
> any libraries anyone is aware of that might help this along?
>
> I really appreciate all help.
> Mike
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Rolling-Joins-Replicated-in-Java-MapReduce-tp4700329.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


More information about the datatable-help mailing list