[datatable-help] Summing over many variables. A new approach; a new problem

Short, Tom TShort at epri.com
Fri Jan 14 00:37:26 CET 2011


Matthew fixed you case, but the following workaround may be helpful if
you have other types of data stuffed in a data table. Basically, you use
one table to index another. Of course, you need to remember to keep them
in sync. It doesn't work for a table to index itself in this manner.

> dt1 <- dt[,1:2,with=FALSE]
> dt2 <- dt[,3, with=FALSE]
> # try some operations
> dt1[,mean(dt2[index, unlist(A)]),by=index]
     index V1
[1,]     1 13
[2,]     2 23
[3,]     3 36

- Tom
 

> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org 
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org] 
> On Behalf Of Matthew Dowle
> Sent: Thursday, January 13, 2011 17:21
> To: Joseph Voelkel
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Summing over many variables. A 
> new approach; a new problem
> 
> Which is now implemented and committed. Either
> install.packages(...,type="source") from R-Forge on unix/mac, 
> or wait a day or two for the R-Forge binary if you're on Windows.
> Thanks for the nudge on this one.
> 
> > dt = data.table(a=c(1,1,2,3,3),key="a")
> > dt$b=list(1:2,1:3,1:4,1:5,1:6)
> > dt
>      a                b
> [1,] 1             1, 2
> [2,] 1          1, 2, 3
> [3,] 2       1, 2, 3, 4
> [4,] 3    1, 2, 3, 4, 5
> [5,] 3 1, 2, 3, 4, 5, 6
> > dt[,mean(unlist(b)),by=a]
>      a       V1
> [1,] 1 1.800000
> [2,] 2 2.500000
> [3,] 3 3.272727
> > dt[,sapply(b,mean),by=a]
>      a  V1
> [1,] 1 1.5
> [2,] 1 2.0
> [3,] 2 2.5
> [4,] 3 3.0
> [5,] 3 3.5
> > 
> 
> 
> On Thu, 2011-01-13 at 21:07 +0000, Matthew Dowle wrote:
> > Hi Joseph,
> > You've found feature request #1092 'Make 'by' work for 
> list() columns' :
> > 
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1092&g
> > roup_id=240&atid=978
> > 
> > Notes on the FR have this though :
> >    Currently type 19 isn't supported in dogroups (both input and 
> > output). This might be straightforward (with luck) to implement.
> >    See
> > 
> http://r.789695.n4.nabble.com/Suggest-a-cool-feature-Use-data-
> table-like-a-sorted-indexed-data-list-tp2544213p2544213.html
> >    Note this is related but different to FR#202 since a 
> list() column
> > *is* a vector [is.vector()=TRUE].
> > 
> > Matthew
> > 
> > On Thu, 2011-01-13 at 15:17 -0500, Joseph Voelkel wrote:
> > > > #create matrix that includes list elements A
> > > 
> > > >
> > > 
> (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(3
> > > 1:41))))
> > > 
> > >      index var A         
> > > 
> > > [1,] 1     101 Integer,5 
> > > 
> > > [2,] 2   &nbsp ; 102 In class=MsoNormal>[3,] 3     103 Integer,11
> > > 
> > > > class(mat)
> > > 
> > > [1] "matrix"
> > > 
> > > > # convert to data frame and "fix" the first two entries
> > > 
> > > > (df<-as.data.frame(mat))
> > > 
> > >   index var                                          A
> > > 
> > > 1     1 101                         11, 12, 13, 14, 15
> > > 
> > > 2     2 102          &n bsp;&nbs ;         21, 22, 23, 24, 25
> > > 
> > > 3     3 103 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41
> > > 
> > > > class(df$index) # because mat is atomic
> > > 
> > > [1] "list"
> > > 
> > > > df$index<-as.integer(df$index) # convert to integer
> > > 
> > > > df$var<-as.integer(df$var) # likewise
> > > 
> > > > # conver to data table
> > > 
> > > > dt<-data.table(df)
> > > 
> > > > setkey(dt,index)
> > > 
> > > > 
> > > 
> > > > # try some operations
> > > 
> > > > dt[,A] # works
> > > 
> > > [[1]]
> > > 
> > > [1] 11 12 13 14 15
> > > 
> > >  
> > > 
> > > [[2]]< /p>
> > > 
> > >  
> > > 
> > > [[3]]
> > > 
> > > [1] 31 32 33 34 35 36 37 38 39 40 41
> > > 
> > >  
> > > 
> > > > dt[,mean(A)] # Does not work. each row of A is a list
> > > 
> > > [1] NA
> > > 
> > > Warning message:
> > > 
> > > In mean.default(A) : argument is not numeric or logical: 
> returning 
> > > NA
> > > 
> > > > dt[,mean(unlist(A))] # But here is an easy fix to make this work
> > > 
> > > [1] 27.42857
> > > 
> > > > 
> > > 
> > > > dt[,mean(var),by=index] # works (of course)
> > > 
> > >      index  V1
> > > 
> > > [1,]     1 101
> > > 
> > > [2,]     2 102
> > > 
> > > [3, 3 103
> > > 
> > > > 
> > > 
> > > > dt[,mean(unlist(A)),by=index] # does not work! 
> > > 
> > > Error in `[.data.table`(dt, , mean(unlist(A)), by = index) : 
> > > 
> > >   only integer,double,logical and character vectors are 
> allowed so 
> > > far. Type 19 would need to be added.
> > > 
> > > > 
> > > 
> > > > 
> > > 
> > >  
> > > 
> > > #### Pure code ####
> > > 
> > > #create matrix that includes list elements A
> > > 
> > > 
> (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(3
> > > 1:41))))
> > > 
> > > class(mat)
> > > 
> > > # convert to data frame and "fix" the first two entries
> > > 
> > > (df<-as.data.frame(mat))
> > > 
> > > class(df$ind ex) # be /o:p>
> > > 
> > > df$index<-as.integer(df$index) # convert to integer
> > > 
> > > df$var<-as.integer(df$var) # likewise
> > > 
> > > # conver to data table
> > > 
> > > dt<-data.table(df)
> > > 
> > > setkey(dt,index)
> > > 
> > >  
> > > 
> > > # try some operations
> > > 
> > > dt[,A] # works
> > > 
> > > dt[,mean(A)] # Does not work. each row of A is a list
> > > 
> > > dt[,mean(unlist(A))] # But here is an easy fix to make this
> > > 
> > >  
> > > 
> > > dt[,mean(var),by=index] # works (of course)
> > > 
> > >  
> > > 
> > > dt[,mean(unlist(A)),by=index] # does not work! 
> > > 
> > >   
> > 
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > 
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable
> > -help
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
atatable-help
> 


More information about the datatable-help mailing list