[datatable-help] Revisiting scoping rules in "j" (reviving Gabor's post)

Arunkumar Srinivasan aragorn168b at gmail.com
Mon Nov 11 14:55:27 CET 2013


Eddi,  

Thank you. However, I've realised something and made a slight change to the concept (at least I think that's the way to go).

Basically, if you've:

require(data.table)
d1 <- data.table(id1=c(1L, 2L, 2L, 3L), val=1:4, key="id1")

and you do:

d1[, print(id1), by=id1]
[1] 1
[1] 2
[1] 3


That is, while grouping, the grouping variables length for every group remains 1 (when grouping using "by"). for id=2, we don't get "2" two times. Going by the same logic, if we were to do:

d1[J(2), id1]
   id1 id1
1:   2   2


Here' the first "id1" is the grouping "id1" (from J(2)). Following FR #2693 from mnel, I've changed the names of J(.) when it has no names to resemble that of key columns of "d1". The second "id1" corresponds to the corresponding value of "id1" for "id1=2". And even though it's present 2 times, we print it only once. That is, it'll be identical to d1[, id1, by=id1], even though d1's "id1" is *not really* the grouping variable.  

If we've to refer to i's columns, then we've to explicitly state "i.id1". That is, here, it would be:

d1[J(2), i.id1] # identical results, but i.id1 corresponds to data.table from J(2) with column name = id1

To illustrate the difference nicely, let's consider these data.tables:
d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1")  
d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2")  
d3 <- copy(d2)
setnames(d3, names(d1))

d1[d2, list(id1)] # what Gabor's post highlighted should work (but it doesn't give 1,2,2,NA as pointed out in the earlier post)
   id1 id1
1:   1   1
2:   2   2
3:   4  NA


d1[d3, list(id1, i.id1)] # id1 refers to values from d1 and i.id1 to d3.
   id1 id1 i.id1
1:   1   1     1
2:   2   2     2
3:   4  NA     4


Note that for every (implicit) grouping value from d3, the only possible values for d1's grouping would be 1) identical to that of d3 or 2) NA.

Let me know what you guys think.  

Arun


On Monday, November 11, 2013 at 2:45 PM, Eduard Antonyan wrote:

> I haven't checked yet what it does currently but what you wrote makes perfect sense.  
> On Nov 10, 2013 5:44 AM, "Arunkumar Srinivasan" <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > Hi everyone,  
> >  
> > To revive the discussion Gabor started here: http://r.789695.n4.nabble.com/Problem-with-FAQ-2-8-tt4668878.html and the (related, but subtly different) FR mnel filed here: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2693&group_id=240&atid=978  
> >  
> > Suppose you have:
> >  
> > require(data.table)  
> > d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1")  
> > d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2")
> >  
> > Then as Gabor points out: `d1[d2, id1]`  should *not* result in an error, because FAQ 2.8 states (copied from Gabor's post linked above):
> >  
> > 1. The scope of X's subset; i.e., X's column names.  
> > 2. The scope of each row of Y; i.e., Y's column names (join inherited scope)  
> > …
> >  
> > In this case, the desired output for `d1[d2, id1]` should then be:
> >    id1 id1
> > 1:   1   1
> > 2:   2   2
> > 3:   2   2
> > 4:   4  NA
> >  
> >  
> > That's what I at least understand from what the documentation intends.  
> >  
> > However, this recommends a subtle change to the current method of referring to columns, if we were to keep this idea. That is, consider the data.table "d3" as follows:  
> >  
> > d3 <- copy(d2)
> > setnames(d3, names(d1))
> >  
> > Now, what should `d1[d3, id1]` give? The answer, I believe, is same as `d1[d2, id1]`. Why? Because, X's (here d1's) column names should be looked up first (as per FAQ 2.8). Therefore, corresponding to d2=c(1,2,4), the values for "id1" are c(1, (2,2), NA). Now, if the old behaviour is to be intended - here comes the "subtle change", then one should do:  
> >  
> > d1[d3, i.d1] # referring to i's variables with the "i." notation.
> >  
> > I've managed to implement the first part where X's columns are looked up so that `d1[d2, id1]` doesn't result in error. However, I'd like to ensure that my understanding of the FAQ is right (and that the FAQ makes sense - it does to me).  
> >  
> > Please let me know what you all think so that I can implement the second part and commit. This, I believe will let us get a step closer to the consistency in DT syntax.
> >  
> > Arun  
> >  
> >  
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131111/9e3203c9/attachment-0001.html>


More information about the datatable-help mailing list