[datatable-help] understanding joins in data.table

Matthew Dowle mdowle at mdowle.plus.com
Mon Aug 20 17:39:02 CEST 2012


Sure, have answered on SO :

http://stackoverflow.com/questions/12030932/rolling-joins-data-table-in-r

>>
>> I have posted this on SO as well:
>>
>> I am trying to understand a little more about the way rolling joins work
>> and am having some confusion, I was hoping somebody could clarify this
>> for
>> me.  To take a concrete example:
>>
>>     dt1 <- data.table(id=rep(1:5, 10), t=1:50, val1=1:50, key=list(id,
>> t))
>>     dt2 <- data.table(id=rep(1:5, 2), t=1:10, val2=1:10, key=list(id,
>> t))
>>
>> I expected this to produce a long `data.table` where the values in `dt2`
>> are rolled:
>>
>>     dt1[dt2,roll=TRUE]
>>
>> Instead, the correct way to do this seems to be:
>>
>>     dt2[dt1,roll=TRUE]
>>
>> Could someone explain to me more about how joining in `data.table` works
>> as I am clearly not understanding it correctly.  I thought that
>> `dt1[dt2,roll=TRUE]` corresponded to the sql equivalent of `select *
>> from
>> dt1 right join dt2 on (dt1.id = dt2.id and dt1.t = dt2.t)`, except with
>> the added functionality locf.
>>
>> Additionally the documentation says:
>>
>>     X[Y] is a join, looking up X's rows using Y (or Y's key if it has
>> one)
>>     as an index.
>>
>> This makes it seem that only rows in X should be returned and the join
>> being done is an inner join, not outer.  What about in the case when
>> `roll=T` but that particular `id` does not exist in `dt1`? Playing
>> around a
>> bit more I can't understand what value is being placed into the column
>> when
>> there is only 1 key in dt2.. this might be inappropriate usage.
>>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list