[datatable-help] Rolling Joins Replicated in Java MapReduce

Mike.Gahan michael.gahan at gmail.com
Wed Dec 3 04:47:38 CET 2014


Hello all,

I absolutely love the rolling join capabilities of data.table. It is
extremely useful for the work I do. However, sometimes I work with data that
is too large to fit into RAM (even when using a large server). I want to
implement this rolling join code in a Java Map Reduce setting to be able to
leverage some of the other resources available at the company I work for.
Unfortunately I am not an experienced Java programmer. I figured that a
project like this would provide an excellent incentive to learn this skill.

My question is this: what data.table current code for rolling joins would be
most useful to reference in starting this project? I am guessing the 
bmerge.c code
<https://github.com/Rdatatable/data.table/blob/master/src/bmerge.c>   has
much of what I want. Any other code in the data.table package I should be
aware of? Any other advice that might make this process go more smoothly? I
know the function is based on a Modified Binary Search algorithm. Are there
any libraries anyone is aware of that might help this along?

I really appreciate all help.
Mike



--
View this message in context: http://r.789695.n4.nabble.com/Rolling-Joins-Replicated-in-Java-MapReduce-tp4700329.html
Sent from the datatable-help mailing list archive at Nabble.com.


More information about the datatable-help mailing list