[datatable-help] Rolling Joins Replicated in Java MapReduce
Michael Smith
my.r.help at gmail.com
Wed Dec 3 07:44:11 CET 2014
Maybe it is easier to build what you're looking for by contributing to
plyrmr:
https://github.com/RevolutionAnalytics/plyrmr
It already implements "plyr for Hadoop" on top or the rmr2 package. Not
sure whether merging is already implemented, but using rmr2 it should
not be prohibitively difficult (hopefully).
Best,
M
On 12/03/2014 11:47 AM, Mike.Gahan wrote:
> Hello all,
>
> I absolutely love the rolling join capabilities of data.table. It is
> extremely useful for the work I do. However, sometimes I work with data that
> is too large to fit into RAM (even when using a large server). I want to
> implement this rolling join code in a Java Map Reduce setting to be able to
> leverage some of the other resources available at the company I work for.
> Unfortunately I am not an experienced Java programmer. I figured that a
> project like this would provide an excellent incentive to learn this skill.
>
> My question is this: what data.table current code for rolling joins would be
> most useful to reference in starting this project? I am guessing the
> bmerge.c code
> <https://github.com/Rdatatable/data.table/blob/master/src/bmerge.c> has
> much of what I want. Any other code in the data.table package I should be
> aware of? Any other advice that might make this process go more smoothly? I
> know the function is based on a Modified Binary Search algorithm. Are there
> any libraries anyone is aware of that might help this along?
>
> I really appreciate all help.
> Mike
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Rolling-Joins-Replicated-in-Java-MapReduce-tp4700329.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
More information about the datatable-help
mailing list