[datatable-help] Summing over many variables. A new approach; a new problem

Joseph Voelkel jgvcqa at rit.edu
Thu Jan 13 21:17:26 CET 2011


In an earlier post, I had a large number of variables in a data table that I wanted to summarize. For example, I had A1-A30 and wanted to find the mean of A1-A30, by certain groups.

While I (with help!) eventually found a solution for this, I am finding out that I need something more sophisticated. For example, sometimes I want to only find the mean of A1-A10, or I may want to find a more complex function of the Ai's. I would like to be able to vary these functions of A1-A30 very easily.

For this reason, I decided to restructure the data table so that, for each row, A1-A30 (30 individual numbers) is expressed as A, a list of length 1 whose first (and only) entry is the vector of the 30 A1-A30 values.

The problem I am now having is shown below (first, prompt+code & o/p, then just pure code in case you want to try it). While I can make this work for data frames, I am hoping I can find a work to have it work for data tables. (This can be really useful, by the way. In a recent project, my data frame included lists of 4D arrays, which was the perfect structure for summing, e.g. in a variety of dimensions.)


> #create matrix that includes list elements A
> (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(31:41))))
     index var A
[1,] 1     101 Integer,5
[2,] 2     102 Integer,5
[3,] 3     103 Integer,11
> class(mat)
[1] "matrix"
> # convert to data frame and "fix" the first two entries
> (df<-as.data.frame(mat))
  index var                                          A
1     1 101                         11, 12, 13, 14, 15
2     2 102                         21, 22, 23, 24, 25
3     3 103 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41
> class(df$index) # because mat is atomic
[1] "list"
> df$index<-as.integer(df$index) # convert to integer
> df$var<-as.integer(df$var) # likewise
> # conver to data table
> dt<-data.table(df)
> setkey(dt,index)
>
> # try some operations
> dt[,A] # works
[[1]]
[1] 11 12 13 14 15

[[2]]
[1] 21 22 23 24 25

[[3]]
[1] 31 32 33 34 35 36 37 38 39 40 41

> dt[,mean(A)] # Does not work. each row of A is a list
[1] NA
Warning message:
In mean.default(A) : argument is not numeric or logical: returning NA
> dt[,mean(unlist(A))] # But here is an easy fix to make this work
[1] 27.42857
>
> dt[,mean(var),by=index] # works (of course)
     index  V1
[1,]     1 101
[2,]     2 102
[3,]     3 103
>
> dt[,mean(unlist(A)),by=index] # does not work!
Error in `[.data.table`(dt, , mean(unlist(A)), by = index) :
  only integer,double,logical and character vectors are allowed so far. Type 19 would need to be added.
>
>

#### Pure code ####
#create matrix that includes list elements A
(mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(31:41))))
class(mat)
# convert to data frame and "fix" the first two entries
(df<-as.data.frame(mat))
class(df$index) # because mat is atomic
df$index<-as.integer(df$index) # convert to integer
df$var<-as.integer(df$var) # likewise
# conver to data table
dt<-data.table(df)
setkey(dt,index)

# try some operations
dt[,A] # works
dt[,mean(A)] # Does not work. each row of A is a list
dt[,mean(unlist(A))] # But here is an easy fix to make this

dt[,mean(var),by=index] # works (of course)

dt[,mean(unlist(A)),by=index] # does not work!




Joseph G. Voelkel, Ph.D.
Professor, Center for Quality and Applied Statistics
Kate Gleason College of Engineering
Rochester Institute of Technology
V 585-475-2231
F 585-475-5959
joseph.voelkel at rit.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20110113/d8ce15fa/attachment.htm>


More information about the datatable-help mailing list