[datatable-help] Summing over many variables. A new approach; a new problem

Joseph Voelkel jgvcqa at rit.edu
Fri Jan 14 19:18:17 CET 2011


Great! Thanks, Matthew. This should be a big help. (It was also nice to see that others are using lists in as well. Use of lists here--well, in data frames--was an "aha" moment for me about a year ago.)

-----Original Message-----
From: Matthew Dowle [mailto:mdowlenoreply at virginmedia.com] On Behalf Of Matthew Dowle
Sent: Thursday, January 13, 2011 5:21 PM
To: Joseph Voelkel
Cc: datatable-help at lists.r-forge.r-project.org
Subject: Re: [datatable-help] Summing over many variables. A new approach; a new problem

Which is now implemented and committed. Either
install.packages(...,type="source") from R-Forge on unix/mac, or wait a
day or two for the R-Forge binary if you're on Windows.
Thanks for the nudge on this one.

> dt = data.table(a=c(1,1,2,3,3),key="a")
> dt$b=list(1:2,1:3,1:4,1:5,1:6)
> dt
     a                b
[1,] 1             1, 2
[2,] 1          1, 2, 3
[3,] 2       1, 2, 3, 4
[4,] 3    1, 2, 3, 4, 5
[5,] 3 1, 2, 3, 4, 5, 6
> dt[,mean(unlist(b)),by=a]
     a       V1
[1,] 1 1.800000
[2,] 2 2.500000
[3,] 3 3.272727
> dt[,sapply(b,mean),by=a]
     a  V1
[1,] 1 1.5
[2,] 1 2.0
[3,] 2 2.5
[4,] 3 3.0
[5,] 3 3.5
> 


On Thu, 2011-01-13 at 21:07 +0000, Matthew Dowle wrote:
> Hi Joseph,
> You've found feature request #1092 'Make 'by' work for list() columns' :
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1092&group_id=240&atid=978
> 
> Notes on the FR have this though :
>    Currently type 19 isn't supported in dogroups (both input and
> output). This might be straightforward (with luck) to implement.
>    See
> http://r.789695.n4.nabble.com/Suggest-a-cool-feature-Use-data-table-like-a-sorted-indexed-data-list-tp2544213p2544213.html
>    Note this is related but different to FR#202 since a list() column
> *is* a vector [is.vector()=TRUE].
> 
> Matthew
> 
> On Thu, 2011-01-13 at 15:17 -0500, Joseph Voelkel wrote:
> > > #create matrix that includes list elements A
> > 
> > >
> > (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(31:41))))
> > 
> >      index var A         
> > 
> > [1,] 1     101 Integer,5 
> > 
> > [2,] 2   &nbsp ; 102 In class=MsoNormal>[3,] 3     103 Integer,11
> > 
> > > class(mat)
> > 
> > [1] "matrix"
> > 
> > > # convert to data frame and "fix" the first two entries
> > 
> > > (df<-as.data.frame(mat))
> > 
> >   index var                                          A
> > 
> > 1     1 101                         11, 12, 13, 14, 15
> > 
> > 2     2 102          &n bsp;&nbs ;         21, 22, 23, 24, 25
> > 
> > 3     3 103 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41
> > 
> > > class(df$index) # because mat is atomic
> > 
> > [1] "list"
> > 
> > > df$index<-as.integer(df$index) # convert to integer
> > 
> > > df$var<-as.integer(df$var) # likewise
> > 
> > > # conver to data table
> > 
> > > dt<-data.table(df)
> > 
> > > setkey(dt,index)
> > 
> > > 
> > 
> > > # try some operations
> > 
> > > dt[,A] # works
> > 
> > [[1]]
> > 
> > [1] 11 12 13 14 15
> > 
> >  
> > 
> > [[2]]< /p>
> > 
> >  
> > 
> > [[3]]
> > 
> > [1] 31 32 33 34 35 36 37 38 39 40 41
> > 
> >  
> > 
> > > dt[,mean(A)] # Does not work. each row of A is a list
> > 
> > [1] NA
> > 
> > Warning message:
> > 
> > In mean.default(A) : argument is not numeric or logical: returning NA
> > 
> > > dt[,mean(unlist(A))] # But here is an easy fix to make this work
> > 
> > [1] 27.42857
> > 
> > > 
> > 
> > > dt[,mean(var),by=index] # works (of course)
> > 
> >      index  V1
> > 
> > [1,]     1 101
> > 
> > [2,]     2 102
> > 
> > [3, 3 103
> > 
> > > 
> > 
> > > dt[,mean(unlist(A)),by=index] # does not work! 
> > 
> > Error in `[.data.table`(dt, , mean(unlist(A)), by = index) : 
> > 
> >   only integer,double,logical and character vectors are allowed so
> > far. Type 19 would need to be added.
> > 
> > > 
> > 
> > > 
> > 
> >  
> > 
> > #### Pure code ####
> > 
> > #create matrix that includes list elements A
> > 
> > (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(31:41))))
> > 
> > class(mat)
> > 
> > # convert to data frame and "fix" the first two entries
> > 
> > (df<-as.data.frame(mat))
> > 
> > class(df$ind ex) # be /o:p>
> > 
> > df$index<-as.integer(df$index) # convert to integer
> > 
> > df$var<-as.integer(df$var) # likewise
> > 
> > # conver to data table
> > 
> > dt<-data.table(df)
> > 
> > setkey(dt,index)
> > 
> >  
> > 
> > # try some operations
> > 
> > dt[,A] # works
> > 
> > dt[,mean(A)] # Does not work. each row of A is a list
> > 
> > dt[,mean(unlist(A))] # But here is an easy fix to make this
> > 
> >  
> > 
> > dt[,mean(var),by=index] # works (of course)
> > 
> >  
> > 
> > dt[,mean(unlist(A)),by=index] # does not work! 
> > 
> >   
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list