[datatable-help] Revisiting scoping rules in "j" (reviving Gabor's post)

Eduard Antonyan eduard.antonyan at gmail.com
Mon Nov 11 16:53:05 CET 2013


Everything looks good to me. Note that there is also .BY[[1]] that one can
potentially also want to use in those examples (which is basically same as
i.id1).


On Mon, Nov 11, 2013 at 7:55 AM, Arunkumar Srinivasan <aragorn168b at gmail.com
> wrote:

>  Eddi,
>
> Thank you. However, I've realised something and made a slight change to
> the concept (at least I think that's the way to go).
>
> Basically, if you've:
>
> require(data.table)
> d1 <- data.table(id1=c(1L, 2L, 2L, 3L), val=1:4, key="id1")
>
> and you do:
>
> d1[, print(id1), by=id1]
> [1] 1
> [1] 2
> [1] 3
>
> That is, while grouping, the grouping variables length for every group
> remains 1 (when grouping using "by"). for id=2, we don't get "2" two times.
> Going by the same logic, if we were to do:
>
> d1[J(2), id1]
>    id1 id1
> 1:   2   2
>
> Here' the first "id1" is the grouping "id1" (from J(2)). Following FR
> #2693 from mnel, I've changed the names of J(.) when it has no names to
> resemble that of key columns of "d1". The second "id1" corresponds to the
> corresponding value of "id1" for "id1=2". And even though it's present 2
> times, we print it only once. That is, it'll be identical to d1[, id1,
> by=id1], even though d1's "id1" is *not really* the grouping variable.
>
> If we've to refer to i's columns, then we've to explicitly state "i.id1".
> That is, here, it would be:
>
> d1[J(2), i.id1] # identical results, but i.id1 corresponds to data.table
> from J(2) with column name = id1
>
> To illustrate the difference nicely, let's consider these data.tables:
> d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1")
> d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2")
> d3 <- copy(d2)
> setnames(d3, names(d1))
>
> d1[d2, list(id1)] # what Gabor's post highlighted should work (but it
> doesn't give 1,2,2,NA as pointed out in the earlier post)
>    id1 id1
> 1:   1   1
> 2:   2   2
> 3:   4  NA
>
> d1[d3, list(id1, i.id1)] # id1 refers to values from d1 and i.id1 to d3.
>    id1 id1 i.id1
> 1:   1   1     1
> 2:   2   2     2
> 3:   4  NA     4
>
> Note that for every (implicit) grouping value from d3, the only possible
> values for d1's grouping would be 1) identical to that of d3 or 2) NA.
>
> Let me know what you guys think.
>
> Arun
>
> On Monday, November 11, 2013 at 2:45 PM, Eduard Antonyan wrote:
>
> I haven't checked yet what it does currently but what you wrote makes
> perfect sense.
>  On Nov 10, 2013 5:44 AM, "Arunkumar Srinivasan" <aragorn168b at gmail.com>
> wrote:
>
>  Hi everyone,
>
> To revive the discussion Gabor started here:
> http://r.789695.n4.nabble.com/Problem-with-FAQ-2-8-tt4668878.html and the
> (related, but subtly different) FR mnel filed here:
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2693&group_id=240&atid=978
>
> Suppose you have:
>
> require(data.table)
> d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1")
> d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2")
>
> Then as Gabor points out: `d1[d2, id1]`  should *not* result in an error,
> because FAQ 2.8 states (copied from Gabor's post linked above):
>
> 1. The scope of X's subset; i.e., X's column names.
> 2. The scope of each row of Y; i.e., Y's column names (join inherited
> scope)
>>
> In this case, the desired output for `d1[d2, id1]` should then be:
>    id1 id1
> 1:   1   1
> 2:   2   2
> 3:   2   2
> 4:   4  NA
>
> That's what I at least understand from what the documentation intends.
>
> However, this recommends a subtle change to the current method of
> referring to columns, if we were to keep this idea. That is, consider the
> data.table "d3" as follows:
>
> d3 <- copy(d2)
> setnames(d3, names(d1))
>
> Now, what should `d1[d3, id1]` give? The answer, I believe, is same as
> `d1[d2, id1]`. Why? Because, X's (here d1's) column names should be looked
> up first (as per FAQ 2.8). Therefore, corresponding to d2=c(1,2,4), the
> values for "id1" are c(1, (2,2), NA). Now, if the old behaviour is to be
> intended - here comes the "subtle change", then one should do:
>
> d1[d3, i.d1] # referring to i's variables with the "i." notation.
>
> I've managed to implement the first part where X's columns are looked up
> so that `d1[d2, id1]` doesn't result in error. However, I'd like to ensure
> that my understanding of the FAQ is right (and that the FAQ makes sense -
> it does to me).
>
> Please let me know what you all think so that I can implement the second
> part and commit. This, I believe will let us get a step closer to the
> consistency in DT syntax.
>
> Arun
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131111/ca08c945/attachment.html>


More information about the datatable-help mailing list