[datatable-help] datatable-help Digest, Vol 28, Issue 2

Yike Lu yikelu.home at gmail.com
Thu Jun 7 16:44:45 CEST 2012


Matt,

Two things
1) I noticed that in Juliet's example, Y's first column is not named at 
all, so defaults to a name of "V1", yet the join carries through. How 
does the underlying implementation work here? I believe this was the 
original question, and I'm curious as well now.
2) Having re-read the paragraph you sent, I now realize that X[Y] is 
equivalent to X right join Y. It may be helpful to explicitly write this 
in. I certainly had this flipped (I thought it was X left join Y), and 
the column ordering is flipped from standard SQL.

On 6/7/2012 6:00 AM, datatable-help-request at lists.r-forge.r-project.org 
wrote:
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 6 Jun 2012 13:47:00 -0400
> From: Juliet Hannah<juliet.hannah at gmail.com>
> To: datatable-help at lists.r-forge.r-project.org
> Subject: [datatable-help] join example from faq
> Message-ID:
> 	<CALzuZRQw2QaR=uAfPbmt_R8qnZEhLF7-=_zsoXAhXsUhcdCzBw at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> All,
>
> I am not understanding a few basic things. I am looking at pg 5 of the faq.
>
> X = data.table(grp=c("a","a","b","b","b","c","c"), foo=1:7)
> setkey(X,grp)
>   Y = data.table(c("b","c"), bar=c(4,2))
>   X[Y]
>
> The faq says X[Y] is a join looking up X's rows using Y.
>
> Does this mean data.table looks up X's key using Y?
>
> Y has two columns. How does it know to use the first column in this
> example? Y's key has not been set.
>
> Hope my question is not too obvious. :)
>
> Thanks,
>
> Juliet
>
> P.S. Thanks for the wonderful package.  I had to do some aggregations
> the other day and my other solutions were
> running for hours, but data.table finished in a couple of minutes!
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 06 Jun 2012 23:28:08 +0100
> From: Matthew Dowle<mdowle at mdowle.plus.com>
> To: Juliet Hannah<juliet.hannah at gmail.com>
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] join example from faq
> Message-ID:<1339021688.2573.16.camel at netbook>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi,
>
> Hopefully this paragraph from ?data.table sheds some light :
>
>    "When i is a data.table, x must have a key. i is joined to x using the
> key and the rows in x that match are returned. An equi-join is performed
> between each column in i to each column in x's key. The match is a
> binary search in compiled C in O(log n) time. If i has less columns than
> x's key then many rows of x may match to each row of i. If i has more
> columns than x's key, the columns of i not involved in the join are
> included in the result. If i also has a key, it is i's key columns that
> are used to match to x's key columns and a binary merge of the two
> tables is carried out."
>
> The critical sentence is "If i also has a key ..."; i.e., i doesn't have
> to be keyed. Only x must have a key. It's often faster if i is keyed
> too, though. There have been some speed improvements in 1.8.1, too.
>
> In the example you highlighted I think it goes on to show 'join
> inherited scope', which is this paragraph :
>
>    "Advanced: In the X[Y,j] form of grouping, the j expression sees
> variables in X first, then Y. We call this join inherited scope. If the
> variable is not in X or Y then the calling frame is searched, its
> calling frame, and so on in the usual way up to and including the global
> environment."
>
> Encouraging to hear you've reduced hours to minutes. At least someone
> knows not to use benchmark(...,replications=100) for tasks that take
> under 0.01 seconds and then conclude data.table is slow!
>
> Matthew
>
> On Wed, 2012-06-06 at 13:47 -0400, Juliet Hannah wrote:
>> All,
>>
>> I am not understanding a few basic things. I am looking at pg 5 of the faq.
>>
>> X = data.table(grp=c("a","a","b","b","b","c","c"), foo=1:7)
>> setkey(X,grp)
>>   Y = data.table(c("b","c"), bar=c(4,2))
>>   X[Y]
>>
>> The faq says X[Y] is a join looking up X's rows using Y.
>>
>> Does this mean data.table looks up X's key using Y?
>>
>> Y has two columns. How does it know to use the first column in this
>> example? Y's key has not been set.
>>
>> Hope my question is not too obvious. :)
>>
>> Thanks,
>>
>> Juliet
>>
>> P.S. Thanks for the wonderful package.  I had to do some aggregations
>> the other day and my other solutions were
>> running for hours, but data.table finished in a couple of minutes!
>
>
>
> ------------------------------
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
> End of datatable-help Digest, Vol 28, Issue 2
> *********************************************


More information about the datatable-help mailing list