[datatable-help] Unable to have expression for "by" criteria

Harish harishv_99 at yahoo.com
Fri Jun 18 08:24:40 CEST 2010


I am trying to compute some values in a data.table by dynamically generating the "by" criteria.  However, I am unable to figure out how to do it.  (I had to resort to using the "plyr" package.)

Questions:
1) Why am I unable to pass a variable for the "by" criteria?  The comments in the code indicate that it should be possible.
2) Assuming the issue is a bug (and will be fixed), what is a "good" way for me to accomplish dynamically creating a criteria?  (This is more of a generic R question I suppose.)

-----

Question #1 -- Unable to pass variable for "by" criteria

The comments in the code state: "The by expression also see variables in the calling frame, just like j... but from v1.3 is e.g. bycriteria = quote(list(colA,colB%%100)); DT[...,by=bycriteria]"

Then the following code should work, but it does not.

DT <- data.table( a=1:5, b=11:50, d=c("A","B","C","D"))
DT[ , mean(b), by=d ]         # Works
DT[ , mean(b), by=list(d) ]   # Works
f <- quote( list(d) )
DT[ , mean(b), by=f ]         # This does not work

The response is:
Error in `[.data.table`(DT, , mean(b), by = f) : 
  column 1 of 'by' list does not evaluate to integer e.g. the by should be a list of expressions. Do not quote column names when using by=list(...).

What is going on?

-----

Question #2 -- Tips for dynamically generating criteria

What are some tips and generally accepted approaches (in R) to dynamically generate a criteria?

The first step is to generate a list of columns to group by.  How should I structure the function that gathers this info?  Is it better to get them straight as strings, or should I get the variables directly and then use either match.call() or deparse(substitute(x)) to convert to strings?  The aes() function in ggplot2 uses the match.call() approach.  Or is the conversion to strings even required?  (The plyr package accepted a vector of strings, and a few other formats for its "by" criteria.)

The next step is to generate the "language" or "symbol" object that I need to create.  I would appreciate some guidance on how I can put together my required columns into the by criteria dynamically.

(I understand that it is a generic R question, but since it is so closely related to my question #1, I am asking both of them here.)

Thanks for your help.


Regards,
Harish



      


More information about the datatable-help mailing list