[datatable-help] join example from faq

Matthew Dowle mdowle at mdowle.plus.com
Thu Jun 7 00:28:08 CEST 2012


Hi,

Hopefully this paragraph from ?data.table sheds some light :

  "When i is a data.table, x must have a key. i is joined to x using the
key and the rows in x that match are returned. An equi-join is performed
between each column in i to each column in x's key. The match is a
binary search in compiled C in O(log n) time. If i has less columns than
x's key then many rows of x may match to each row of i. If i has more
columns than x's key, the columns of i not involved in the join are
included in the result. If i also has a key, it is i's key columns that
are used to match to x's key columns and a binary merge of the two
tables is carried out."

The critical sentence is "If i also has a key ..."; i.e., i doesn't have
to be keyed. Only x must have a key. It's often faster if i is keyed
too, though. There have been some speed improvements in 1.8.1, too.

In the example you highlighted I think it goes on to show 'join
inherited scope', which is this paragraph :

  "Advanced: In the X[Y,j] form of grouping, the j expression sees
variables in X first, then Y. We call this join inherited scope. If the
variable is not in X or Y then the calling frame is searched, its
calling frame, and so on in the usual way up to and including the global
environment."

Encouraging to hear you've reduced hours to minutes. At least someone
knows not to use benchmark(...,replications=100) for tasks that take
under 0.01 seconds and then conclude data.table is slow!

Matthew

On Wed, 2012-06-06 at 13:47 -0400, Juliet Hannah wrote:
> All,
> 
> I am not understanding a few basic things. I am looking at pg 5 of the faq.
> 
> X = data.table(grp=c("a","a","b","b","b","c","c"), foo=1:7)
> setkey(X,grp)
>  Y = data.table(c("b","c"), bar=c(4,2))
>  X[Y]
> 
> The faq says X[Y] is a join looking up X's rows using Y.
> 
> Does this mean data.table looks up X's key using Y?
> 
> Y has two columns. How does it know to use the first column in this
> example? Y's key has not been set.
> 
> Hope my question is not too obvious. :)
> 
> Thanks,
> 
> Juliet
> 
> P.S. Thanks for the wonderful package.  I had to do some aggregations
> the other day and my other solutions were
> running for hours, but data.table finished in a couple of minutes!




More information about the datatable-help mailing list