[datatable-help] Unable to have expression for "by" criteria

Matthew Dowle mdowle at mdowle.plus.com
Wed Jun 23 23:53:07 CEST 2010


Those two bugs fixed now, just committed.

DT <- data.table( a=1:5, b=11:50, d=c("A","B","C","D") )
f <- quote( list(d) )
DT[ , mean(b), by=eval(f) ]  # works
foo <- function( grp ) {
   DT[ , mean(b), by=eval( grp ) ]
}
foo( quote( list(d) ) )             # works, colname in result is d
foo( quote( list(d,a) ) )           # works
foo( quote( list(d,even=a%%2L) ) )  # works

Also fixed where a colname is called grp, the same name as the variable
holding the expression.  Before, the eval would see the grp column first
and complain that didn't evaluate to a list.  Now when eval is passed to
by, that eval is done in calling frame, before using the result within
the frame. So now, this works :

DT <- data.table( a=1:5, b=11:50, d=c("A","B","C","D"), f=1:5, grp=1:5 )
DT[,mean(b),by=eval(f)]  # works using quote(list(d)) not f column
f = quote(list(grp))
foo(f)                   # works, groups by the grp column

Matthew

On Mon, 2010-06-21 at 08:10 +0100, Matthew Dowle wrote:
> Thanks for raising this one. Have just committed a fix for that, latest
> version on r-forge.
> 
> DT <- data.table( a=1:5, b=11:50, d=c("A","B","C","D") )
> f <- quote( list(d) )
> DT[ , mean(b), by=eval(f) ]  # worked before
> foo <- function( grp ) {
>    DT[ , mean(b), by=eval( grp ) ]
> }
> foo( quote( list(d) ) )   # works now
> 
> The column names of the result are 'f' and 'grp' respectively though,
> rather than d. Bug #974 raised for that.
> 
> Multiple expressions in the quoted by don't yet work :
> > foo( quote( list(d,a) ) )
> Error in bysubl[[jj + 1]] : subscript out of bounds
> > 
> Bug #975 raised for that.
> 
> Matthew
> 
> 
> On Fri, 2010-06-18 at 23:15 -0700, Harish wrote:
> > Thanks.  The eval() did the trick in the simplified example.
> > 
> > Now I run into another hurdle when I make the code a little more complex.
> > 
> > # ==================
> > 
> > DT <- data.table( a=1:5, b=11:50, d=c("A","B","C","D") )
> > f <- quote( list(d) )
> > DT[ , mean(b), by=eval(f) ]   # Now this works; thanks
> > foo <- function( grp ) {
> >    DT[ , mean(b), by=eval( grp ) ]
> > }
> > foo( list(d) ) # Gives an error
> > foo( quote( list(d) ) )  # Also gives the same error
> > 
> > # ==================
> > 
> > The error I get is:
> >    Error in eval(grp) : object 'grp' not found
> > 
> > 
> > Conceptually, it looks like it should work.
> > 
> > 
> > Regards,
> > Harish
> > 
> > 
> > --- On Fri, 6/18/10, mdowle at mdowle.plus.com <mdowle at mdowle.plus.com> wrote:
> > 
> > > From: mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
> > > Subject: Re: [datatable-help] Unable to have expression for "by" criteria
> > > To: "Harish" <harishv_99 at yahoo.com>
> > > Cc: datatable-help at lists.r-forge.r-project.org
> > > Date: Friday, June 18, 2010, 4:50 AM
> > > Try this (works for me) :
> > > 
> > > f <- quote( list(d) )
> > > DT[ , mean(b), by=eval(f) ]
> > > 
> > > If that works, you were very close, just needed to use eval
> > > in the by.  I
> > > _think_ this makes sense as syntax, something needs to
> > > signal to the
> > > reader of the query that f is not a column name but a
> > > pre-defined
> > > expression, for clarity.
> > > 
> > > I basically need to have another look at this and tidy up
> > > the
> > > documentation and examples. Might add a FAQ on it. There
> > > were big changes
> > > in this area internally when grouping was sped up e.g. its
> > > very recent
> > > that by can be list(), by used to be just a character
> > > string.
> > > 
> > > You can do the same thing for j btw. Kind of like a macro.
> > > There might
> > > already be a FAQ on that.
> > > 
> > > Its quite neat actually what R allows ... I don't believe
> > > in SQL you can
> > > as easily create expressions for criteria (select, group by
> > > and where) and
> > > re-use them like this.
> > > 
> > > I'll need an example for #2 as I don't quite follow
> > > that.  Maybe it drops
> > > out of answer above?
> > > 
> > > Matthew
> > > 
> > > 
> > > > I am trying to compute some values in a data.table by
> > > dynamically
> > > > generating the "by" criteria.  However, I am
> > > unable to figure out how to
> > > > do it.  (I had to resort to using the "plyr"
> > > package.)
> > > >
> > > > Questions:
> > > > 1) Why am I unable to pass a variable for the "by"
> > > criteria?  The comments
> > > > in the code indicate that it should be possible.
> > > > 2) Assuming the issue is a bug (and will be fixed),
> > > what is a "good" way
> > > > for me to accomplish dynamically creating a
> > > criteria?  (This is more of a
> > > > generic R question I suppose.)
> > > >
> > > > -----
> > > >
> > > > Question #1 -- Unable to pass variable for "by"
> > > criteria
> > > >
> > > > The comments in the code state: "The by expression
> > > also see variables in
> > > > the calling frame, just like j... but from v1.3 is
> > > e.g. bycriteria =
> > > > quote(list(colA,colB%%100)); DT[...,by=bycriteria]"
> > > >
> > > > Then the following code should work, but it does not.
> > > >
> > > > DT <- data.table( a=1:5, b=11:50,
> > > d=c("A","B","C","D"))
> > > > DT[ , mean(b), by=d ]     
> > >    # Works
> > > > DT[ , mean(b), by=list(d) ]   # Works
> > > > f <- quote( list(d) )
> > > > DT[ , mean(b), by=f ]     
> > >    # This does not work
> > > >
> > > > The response is:
> > > > Error in `[.data.table`(DT, , mean(b), by = f) :
> > > >   column 1 of 'by' list does not
> > > evaluate to integer e.g. the by should be
> > > > a list of expressions. Do not quote column names when
> > > using
> > > > by=list(...).
> > > >
> > > > What is going on?
> > > >
> > > > -----
> > > >
> > > > Question #2 -- Tips for dynamically generating
> > > criteria
> > > >
> > > > What are some tips and generally accepted approaches
> > > (in R) to dynamically
> > > > generate a criteria?
> > > >
> > > > The first step is to generate a list of columns to
> > > group by.  How should I
> > > > structure the function that gathers this info? 
> > > Is it better to get them
> > > > straight as strings, or should I get the variables
> > > directly and then use
> > > > either match.call() or deparse(substitute(x)) to
> > > convert to strings?  The
> > > > aes() function in ggplot2 uses the match.call()
> > > approach.  Or is the
> > > > conversion to strings even required?  (The plyr
> > > package accepted a vector
> > > > of strings, and a few other formats for its "by"
> > > criteria.)
> > > >
> > > > The next step is to generate the "language" or
> > > "symbol" object that I need
> > > > to create.  I would appreciate some guidance on
> > > how I can put together my
> > > > required columns into the by criteria dynamically.
> > > >
> > > > (I understand that it is a generic R question, but
> > > since it is so closely
> > > > related to my question #1, I am asking both of them
> > > here.)
> > > >
> > > > Thanks for your help.
> > > >
> > > >
> > > > Regards,
> > > > Harish
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > datatable-help mailing list
> > > > datatable-help at lists.r-forge.r-project.org
> > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > >
> > > 
> > > 
> > > 
> > 
> > 
> >       
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list