[datatable-help] Summing over many variables. A new approach; a new problem
Joseph Voelkel
jgvcqa at rit.edu
Fri Jan 14 19:18:17 CET 2011
Great! Thanks, Matthew. This should be a big help. (It was also nice to see that others are using lists in as well. Use of lists here--well, in data frames--was an "aha" moment for me about a year ago.)
-----Original Message-----
From: Matthew Dowle [mailto:mdowlenoreply at virginmedia.com] On Behalf Of Matthew Dowle
Sent: Thursday, January 13, 2011 5:21 PM
To: Joseph Voelkel
Cc: datatable-help at lists.r-forge.r-project.org
Subject: Re: [datatable-help] Summing over many variables. A new approach; a new problem
Which is now implemented and committed. Either
install.packages(...,type="source") from R-Forge on unix/mac, or wait a
day or two for the R-Forge binary if you're on Windows.
Thanks for the nudge on this one.
> dt = data.table(a=c(1,1,2,3,3),key="a")
> dt$b=list(1:2,1:3,1:4,1:5,1:6)
> dt
a b
[1,] 1 1, 2
[2,] 1 1, 2, 3
[3,] 2 1, 2, 3, 4
[4,] 3 1, 2, 3, 4, 5
[5,] 3 1, 2, 3, 4, 5, 6
> dt[,mean(unlist(b)),by=a]
a V1
[1,] 1 1.800000
[2,] 2 2.500000
[3,] 3 3.272727
> dt[,sapply(b,mean),by=a]
a V1
[1,] 1 1.5
[2,] 1 2.0
[3,] 2 2.5
[4,] 3 3.0
[5,] 3 3.5
>
On Thu, 2011-01-13 at 21:07 +0000, Matthew Dowle wrote:
> Hi Joseph,
> You've found feature request #1092 'Make 'by' work for list() columns' :
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1092&group_id=240&atid=978
>
> Notes on the FR have this though :
> Currently type 19 isn't supported in dogroups (both input and
> output). This might be straightforward (with luck) to implement.
> See
> http://r.789695.n4.nabble.com/Suggest-a-cool-feature-Use-data-table-like-a-sorted-indexed-data-list-tp2544213p2544213.html
> Note this is related but different to FR#202 since a list() column
> *is* a vector [is.vector()=TRUE].
>
> Matthew
>
> On Thu, 2011-01-13 at 15:17 -0500, Joseph Voelkel wrote:
> > > #create matrix that includes list elements A
> >
> > >
> > (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(31:41))))
> >
> > index var A
> >
> > [1,] 1 101 Integer,5
> >
> > [2,] 2   ; 102 In class=MsoNormal>[3,] 3 103 Integer,11
> >
> > > class(mat)
> >
> > [1] "matrix"
> >
> > > # convert to data frame and "fix" the first two entries
> >
> > > (df<-as.data.frame(mat))
> >
> > index var A
> >
> > 1 1 101 11, 12, 13, 14, 15
> >
> > 2 2 102 &n bsp;&nbs ; 21, 22, 23, 24, 25
> >
> > 3 3 103 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41
> >
> > > class(df$index) # because mat is atomic
> >
> > [1] "list"
> >
> > > df$index<-as.integer(df$index) # convert to integer
> >
> > > df$var<-as.integer(df$var) # likewise
> >
> > > # conver to data table
> >
> > > dt<-data.table(df)
> >
> > > setkey(dt,index)
> >
> > >
> >
> > > # try some operations
> >
> > > dt[,A] # works
> >
> > [[1]]
> >
> > [1] 11 12 13 14 15
> >
> >
> >
> > [[2]]< /p>
> >
> >
> >
> > [[3]]
> >
> > [1] 31 32 33 34 35 36 37 38 39 40 41
> >
> >
> >
> > > dt[,mean(A)] # Does not work. each row of A is a list
> >
> > [1] NA
> >
> > Warning message:
> >
> > In mean.default(A) : argument is not numeric or logical: returning NA
> >
> > > dt[,mean(unlist(A))] # But here is an easy fix to make this work
> >
> > [1] 27.42857
> >
> > >
> >
> > > dt[,mean(var),by=index] # works (of course)
> >
> > index V1
> >
> > [1,] 1 101
> >
> > [2,] 2 102
> >
> > [3, 3 103
> >
> > >
> >
> > > dt[,mean(unlist(A)),by=index] # does not work!
> >
> > Error in `[.data.table`(dt, , mean(unlist(A)), by = index) :
> >
> > only integer,double,logical and character vectors are allowed so
> > far. Type 19 would need to be added.
> >
> > >
> >
> > >
> >
> >
> >
> > #### Pure code ####
> >
> > #create matrix that includes list elements A
> >
> > (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(31:41))))
> >
> > class(mat)
> >
> > # convert to data frame and "fix" the first two entries
> >
> > (df<-as.data.frame(mat))
> >
> > class(df$ind ex) # be /o:p>
> >
> > df$index<-as.integer(df$index) # convert to integer
> >
> > df$var<-as.integer(df$var) # likewise
> >
> > # conver to data table
> >
> > dt<-data.table(df)
> >
> > setkey(dt,index)
> >
> >
> >
> > # try some operations
> >
> > dt[,A] # works
> >
> > dt[,mean(A)] # Does not work. each row of A is a list
> >
> > dt[,mean(unlist(A))] # But here is an easy fix to make this
> >
> >
> >
> > dt[,mean(var),by=index] # works (of course)
> >
> >
> >
> > dt[,mean(unlist(A)),by=index] # does not work!
> >
> >
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list