[datatable-help] understanding joins in data.table

Alexander Chernyakov alexander.chernyakov at gmail.com
Mon Aug 20 17:04:57 CEST 2012


>
> I have posted this on SO as well:
>
> I am trying to understand a little more about the way rolling joins work
> and am having some confusion, I was hoping somebody could clarify this for
> me.  To take a concrete example:
>
>     dt1 <- data.table(id=rep(1:5, 10), t=1:50, val1=1:50, key=list(id, t))
>     dt2 <- data.table(id=rep(1:5, 2), t=1:10, val2=1:10, key=list(id, t))
>
> I expected this to produce a long `data.table` where the values in `dt2`
> are rolled:
>
>     dt1[dt2,roll=TRUE]
>
> Instead, the correct way to do this seems to be:
>
>     dt2[dt1,roll=TRUE]
>
> Could someone explain to me more about how joining in `data.table` works
> as I am clearly not understanding it correctly.  I thought that
> `dt1[dt2,roll=TRUE]` corresponded to the sql equivalent of `select * from
> dt1 right join dt2 on (dt1.id = dt2.id and dt1.t = dt2.t)`, except with
> the added functionality locf.
>
> Additionally the documentation says:
>
>     X[Y] is a join, looking up X's rows using Y (or Y's key if it has one)
>     as an index.
>
> This makes it seem that only rows in X should be returned and the join
> being done is an inner join, not outer.  What about in the case when
> `roll=T` but that particular `id` does not exist in `dt1`? Playing around a
> bit more I can't understand what value is being placed into the column when
> there is only 1 key in dt2.. this might be inappropriate usage.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20120820/fc778e8b/attachment.html>


More information about the datatable-help mailing list