[datatable-help] Unable to have expression for "by" criteria

mdowle at mdowle.plus.com mdowle at mdowle.plus.com
Fri Jun 18 13:50:54 CEST 2010


Try this (works for me) :

f <- quote( list(d) )
DT[ , mean(b), by=eval(f) ]

If that works, you were very close, just needed to use eval in the by.  I
_think_ this makes sense as syntax, something needs to signal to the
reader of the query that f is not a column name but a pre-defined
expression, for clarity.

I basically need to have another look at this and tidy up the
documentation and examples. Might add a FAQ on it. There were big changes
in this area internally when grouping was sped up e.g. its very recent
that by can be list(), by used to be just a character string.

You can do the same thing for j btw. Kind of like a macro. There might
already be a FAQ on that.

Its quite neat actually what R allows ... I don't believe in SQL you can
as easily create expressions for criteria (select, group by and where) and
re-use them like this.

I'll need an example for #2 as I don't quite follow that.  Maybe it drops
out of answer above?

Matthew


> I am trying to compute some values in a data.table by dynamically
> generating the "by" criteria.  However, I am unable to figure out how to
> do it.  (I had to resort to using the "plyr" package.)
>
> Questions:
> 1) Why am I unable to pass a variable for the "by" criteria?  The comments
> in the code indicate that it should be possible.
> 2) Assuming the issue is a bug (and will be fixed), what is a "good" way
> for me to accomplish dynamically creating a criteria?  (This is more of a
> generic R question I suppose.)
>
> -----
>
> Question #1 -- Unable to pass variable for "by" criteria
>
> The comments in the code state: "The by expression also see variables in
> the calling frame, just like j... but from v1.3 is e.g. bycriteria =
> quote(list(colA,colB%%100)); DT[...,by=bycriteria]"
>
> Then the following code should work, but it does not.
>
> DT <- data.table( a=1:5, b=11:50, d=c("A","B","C","D"))
> DT[ , mean(b), by=d ]         # Works
> DT[ , mean(b), by=list(d) ]   # Works
> f <- quote( list(d) )
> DT[ , mean(b), by=f ]         # This does not work
>
> The response is:
> Error in `[.data.table`(DT, , mean(b), by = f) :
>   column 1 of 'by' list does not evaluate to integer e.g. the by should be
> a list of expressions. Do not quote column names when using
> by=list(...).
>
> What is going on?
>
> -----
>
> Question #2 -- Tips for dynamically generating criteria
>
> What are some tips and generally accepted approaches (in R) to dynamically
> generate a criteria?
>
> The first step is to generate a list of columns to group by.  How should I
> structure the function that gathers this info?  Is it better to get them
> straight as strings, or should I get the variables directly and then use
> either match.call() or deparse(substitute(x)) to convert to strings?  The
> aes() function in ggplot2 uses the match.call() approach.  Or is the
> conversion to strings even required?  (The plyr package accepted a vector
> of strings, and a few other formats for its "by" criteria.)
>
> The next step is to generate the "language" or "symbol" object that I need
> to create.  I would appreciate some guidance on how I can put together my
> required columns into the by criteria dynamically.
>
> (I understand that it is a generic R question, but since it is so closely
> related to my question #1, I am asking both of them here.)
>
> Thanks for your help.
>
>
> Regards,
> Harish
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>




More information about the datatable-help mailing list