[datatable-help] Using list valued columns with by

Tue Feb 21 23:32:54 CET 2012

Hi,

Not sure, but how about this :

f = quote(list(a=mean(y), b=list(rep(y[1],3))))
data[,eval(f),by=x]
     x           a                                  b
[1,] 1 -0.07760762 -0.1715334, -0.1715334, -0.1715334
[2,] 2  0.36923570          1.01892, 1.01892, 1.01892

or,

f = quote(list(V1=list(c(mean(y), rep(y[1],3)))))
data[,eval(f),by=x]
     x                                                 V1
[1,] 1 -0.07760762, -0.17153338, -0.17153338, -0.17153338
[2,] 2         0.3692357, 1.0189195, 1.0189195, 1.0189195

or functional form :

f <- function(y) list(a=mean(y), b=list(rep(y[1],3)) )
data[, f(y), by=x]
     x           a                                  b
[1,] 1 -0.07760762 -0.1715334, -0.1715334, -0.1715334
[2,] 2  0.36923570          1.01892, 1.01892, 1.01892

Or if f() returning a list of vectors is given and can't be changed, how
about :

f <- function(y) list(a=mean(y), b=rep(y[1],3) )
data[, lapply(f(y),list), by=x]
     x          a                                  b
[1,] 1  0.2377054    -1.302181, -1.302181, -1.302181
[2,] 2 -0.3548439 -0.5239135, -0.5239135, -0.5239135

Always open to ideas how to make this easier. It was suggested recently
that list() columns should be created by default when grouping rather
than repeating the group values to match the longest item of j. Might be
an idea. Then to get the current behaviour you'd have to unlist()
afterwards. Probably more efficient in most cases, but quite a big
change.

Matthew 

On Tue, 2012-02-21 at 15:18 -0500, Chris Neff wrote:
> Hi all,
> 
> A colleague asked me a question, and while I found a solution it
> doesn't seem quite optimal.
> 
> The example:
> 
> data <- data.table(x=rep(1:2,each=10), y=rnorm(20), key="x")
> 
> f <- function(y) {
>   return( list(a=mean(y), b=rep(y[1],10) )
> }
> 
> result <- data[, list(f(y)), by=x]
> 
> 
> What winds up happening is that result winds up having V1 alternate
> between f(y)$a and f(y)$b, resulting in 4 rows, 2 for each value of x.
> What I want instead is result to have 2 rows,  with V1 being the list
> that gets returned from f(y).
> 
> I have found that this works:
> 
> result <- data[, list(list(f(y))), by=x]
> 
> But then I have to do:
> 
> result[J(1),][,V1][[1]]
> 
> to get the same thing I would get from f(result[J(1),][,V1]).  I want
> to lose the [[1]] but I can't seem to see how I would do so.  Really
> what I would envision is like with sapply, I want to do
> 
> 
> result <- data[, f(y), by=x, simplify=FALSE]
> 
> But of course simplify isn't an argument for data.table. Thoughts?
> 
> -Chris
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help