[datatable-help] constructing expressions for the j argument from character vectors

Chris Neff caneff at gmail.com
Wed Sep 28 17:04:34 CEST 2011


Use the paste command to make the whole list as a character vector,
then use parse(text=var.list) to turn var.list into an expression that
you can call in data.table?

On 28 September 2011 10:30, Erik Iverson <erikriverson at gmail.com> wrote:
> Hello,
>
> Thank you for providing the data.table package, I think it will be
> very useful to me going forward.  I have a question about passing
> around expressions, and have come up with an example to show what I'm
> after.
>
> library(data.table)
> ## test data
> N <- 500000
> set.seed(100)
> testData <- data.frame(id     = c(sample(1:10000, N, replace = TRUE)),
>                       clinic = c(sample(1:10,    N, replace = TRUE)),
>                       dx     = c(sample(1:200,    N, replace = TRUE)),
>                       rx     = c(sample(1:1000,    N, replace = TRUE)))
>
> ## want to know mean number of dx per ID
> mean(tapply(testData$dx, testData$id,
>            function(x) length(unique(x)))) ## 44.2212
>
> ## in my real use case, I want to run this with different 'by'
> ## variables, so let's write a function and try to use data.table,
> ## call the function uniqueSummary1
>
> uniqueSummary1 <- function(df, key) {
>  DT <- data.table(df)
>  key(DT) <- key
>
>  summaryDT <- DT[, list(length(unique(dx)),
>                         length(unique(rx))), by = key]
>
>  mean(summaryDT[,list(V1, V2)])
>
> }
>
> ## agrees with tapply
> uniqueSummary1(df  = testData, key = c("id"))
>
> ## The above works great, but isn't general, since in my real use
> ## case, I won't know dx and rx are the variables of interest. I want
> ## to be able to pass them in as arguments. This is exactly what FAQ
> ## 1.6 is, so let's use that solution to define uniqueSummary2
>
> uniqueSummary2 <- function(df, key, vars) {
>  DT <- data.table(df)
>  key(DT) <- key
>
>  sList <- substitute(vars)
>  summaryDT <- DT[, eval(sList), by = key]
>  ncols <- ncol(summaryDT)
>
>  mean(summaryDT[,(ncols-length(sList) + 2):ncols, with = FALSE])
> }
>
> uniqueSummary2(df = testData, key = c("id"),
>               vars = list(length(unique(dx)),
>                 length(unique(rx)),
>                 length(unique(clinic))))
>
> ## uniqueSummary2 is better, but relies on me repeating the
> ## "length(unique())" bit several times.  Ideally, I'd just like to
> ## pass in a list of QUOTED vars to summarize, like the following
> ## hypothetical call to my yet-unwritten uniqueSummary3 function:
>
> uniqueSummary3(df = testData, key = c("id"),
>               vars = c("dx", "rx", "clinic"))
>
> I assume I can somehow construct the expression for the j index inside
> my function, based on the 'vars' character vector, but am stuck on
> how.  Any ideas?
>
> Thanks so much,
> Erik
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


More information about the datatable-help mailing list