[datatable-help] Cannot access cols of y when doing x[y, ...]

Tue Feb 1 23:12:47 CET 2011

Hi Prasad and all,

Join inherited scope is now back on, in v1.5.3 on R-Forge.
FAQ 1.11 has been simplified; see latest pdf on homepage.

Please check the NEWS link for v1.5.3 as there are
several other changes too e.g. X[Y] now includes Y's
non-join columns for consistency with JIS. FAQ 1.12
now strongly encourages X[Y,j] rather than X[Y].

v1.5.3 contains several changes that may require changes
to existing code. Please check and let us know if these
will cause any issues.

http://datatable.r-forge.r-project.org/

Matthew

On Sat, 2011-01-22 at 19:57 -0500, Prasad Chalasani wrote:
> Thank you for clarifying this! I was getting stuck on this. Yes the
> syntax and speedup are very nice indeed. I am dealing with 10 million
> row tables so it is very useful for me.
> 
> 
> Incidentally, I also asked this question on StackOverflow, and updated
> it based on your reply
> http://stackoverflow.com/questions/4764434/r-when-using-data-table-how-do-i-get-columns-of-y-when-i-do-xy
> 
> 
> 
> 
> On Sat, Jan 22, 2011 at 4:48 PM, Matthew Dowle
> <mdowle at mdowle.plus.com> wrote:
>         Welcome to the list.
>         You're right, the FAQ is wrong.
>         FR#1095 is "Turn back on 'join inherited scope'".
>         This was a known problem in NEWS at v1.4 and still is.
>         When the grouping code was moved from R into C in v1.4 that
>         feature
>         wasn't something that made it into the port.
>         
>         Glad you appreciate the neater syntax and yes it should be
>         faster (the
>         more columns in x and y the faster the speed up could be, over
>         a merge
>         followed by a query).
>         
>         I'll try and take a look soon.
>         
>         Matthew
>         
>         
>         
>         On Sat, 2011-01-22 at 10:41 -0500, Prasad Chalasani wrote:
>         > The Data-table FAQ 1.11 states:
>         >
>         >
>         > "When you write x[y,foo*boo], data.table automatically
>         inspects the j
>         > expression to see which columns it uses.
>         > It will only subset, or group, those columns only. Memory is
>         only
>         > created for the columns the j uses.
>         >
>         > Let’s say foo is in x, and boo is in y (along with 20 other
>         columns in
>         > y).
>         >
>         > Isn’t x[y,foo*boo] quicker to program and quicker to run
>         than a merge
>         > step followed by another subset step ?"
>         >
>         >
>         > Contrary to what it says above, I get an error when I try to
>         access a
>         > y-column in the "j" argument of x[y,j].
>         >
>         > See the sequence of code below.
>         >
>         >
>         > > x <- data.table( foo = c(1,1,1,2,2,3), a = 1:6, key =
>         'foo')
>         >
>         >
>         > > y <- data.table( foo = c(1,2), boo = 10:11, key = 'foo')
>         >
>         >
>         >
>         > # the below works as expected
>         >
>         > > x[y]
>         >
>         >      foo a
>         >
>         > [1,]   1 1
>         >
>         > [2,]   2 4
>         >
>         >
>         > > with( merge(x,y), foo*boo)
>         >
>         > [1] 10 10 10 22 22
>         >
>         >
>         > # I want to acheive the same result as the above using the
>         >
>         > # syntactically more compact (and faster?) code below:
>         >
>         >
>         > > x[y, foo * boo ]
>         >
>         > Error in eval(expr, envir, enclos) : object 'boo' not found
>         >
>         >
>         > So is the FAQ just wrong, or am I misunderstanding
>         something?
>         >
>         >
>         
>         > _______________________________________________
>         > datatable-help mailing list
>         > datatable-help at lists.r-forge.r-project.org
>         >
>         https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>         
>         
> 
>