[datatable-help] Summing over many variables. A new approach; a new problem
Short, Tom
TShort at epri.com
Fri Jan 14 00:37:26 CET 2011
Matthew fixed you case, but the following workaround may be helpful if
you have other types of data stuffed in a data table. Basically, you use
one table to index another. Of course, you need to remember to keep them
in sync. It doesn't work for a table to index itself in this manner.
> dt1 <- dt[,1:2,with=FALSE]
> dt2 <- dt[,3, with=FALSE]
> # try some operations
> dt1[,mean(dt2[index, unlist(A)]),by=index]
index V1
[1,] 1 13
[2,] 2 23
[3,] 3 36
- Tom
> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
> On Behalf Of Matthew Dowle
> Sent: Thursday, January 13, 2011 17:21
> To: Joseph Voelkel
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Summing over many variables. A
> new approach; a new problem
>
> Which is now implemented and committed. Either
> install.packages(...,type="source") from R-Forge on unix/mac,
> or wait a day or two for the R-Forge binary if you're on Windows.
> Thanks for the nudge on this one.
>
> > dt = data.table(a=c(1,1,2,3,3),key="a")
> > dt$b=list(1:2,1:3,1:4,1:5,1:6)
> > dt
> a b
> [1,] 1 1, 2
> [2,] 1 1, 2, 3
> [3,] 2 1, 2, 3, 4
> [4,] 3 1, 2, 3, 4, 5
> [5,] 3 1, 2, 3, 4, 5, 6
> > dt[,mean(unlist(b)),by=a]
> a V1
> [1,] 1 1.800000
> [2,] 2 2.500000
> [3,] 3 3.272727
> > dt[,sapply(b,mean),by=a]
> a V1
> [1,] 1 1.5
> [2,] 1 2.0
> [3,] 2 2.5
> [4,] 3 3.0
> [5,] 3 3.5
> >
>
>
> On Thu, 2011-01-13 at 21:07 +0000, Matthew Dowle wrote:
> > Hi Joseph,
> > You've found feature request #1092 'Make 'by' work for
> list() columns' :
> >
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1092&g
> > roup_id=240&atid=978
> >
> > Notes on the FR have this though :
> > Currently type 19 isn't supported in dogroups (both input and
> > output). This might be straightforward (with luck) to implement.
> > See
> >
> http://r.789695.n4.nabble.com/Suggest-a-cool-feature-Use-data-
> table-like-a-sorted-indexed-data-list-tp2544213p2544213.html
> > Note this is related but different to FR#202 since a
> list() column
> > *is* a vector [is.vector()=TRUE].
> >
> > Matthew
> >
> > On Thu, 2011-01-13 at 15:17 -0500, Joseph Voelkel wrote:
> > > > #create matrix that includes list elements A
> > >
> > > >
> > >
> (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(3
> > > 1:41))))
> > >
> > > index var A
> > >
> > > [1,] 1 101 Integer,5
> > >
> > > [2,] 2   ; 102 In class=MsoNormal>[3,] 3 103 Integer,11
> > >
> > > > class(mat)
> > >
> > > [1] "matrix"
> > >
> > > > # convert to data frame and "fix" the first two entries
> > >
> > > > (df<-as.data.frame(mat))
> > >
> > > index var A
> > >
> > > 1 1 101 11, 12, 13, 14, 15
> > >
> > > 2 2 102 &n bsp;&nbs ; 21, 22, 23, 24, 25
> > >
> > > 3 3 103 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41
> > >
> > > > class(df$index) # because mat is atomic
> > >
> > > [1] "list"
> > >
> > > > df$index<-as.integer(df$index) # convert to integer
> > >
> > > > df$var<-as.integer(df$var) # likewise
> > >
> > > > # conver to data table
> > >
> > > > dt<-data.table(df)
> > >
> > > > setkey(dt,index)
> > >
> > > >
> > >
> > > > # try some operations
> > >
> > > > dt[,A] # works
> > >
> > > [[1]]
> > >
> > > [1] 11 12 13 14 15
> > >
> > >
> > >
> > > [[2]]< /p>
> > >
> > >
> > >
> > > [[3]]
> > >
> > > [1] 31 32 33 34 35 36 37 38 39 40 41
> > >
> > >
> > >
> > > > dt[,mean(A)] # Does not work. each row of A is a list
> > >
> > > [1] NA
> > >
> > > Warning message:
> > >
> > > In mean.default(A) : argument is not numeric or logical:
> returning
> > > NA
> > >
> > > > dt[,mean(unlist(A))] # But here is an easy fix to make this work
> > >
> > > [1] 27.42857
> > >
> > > >
> > >
> > > > dt[,mean(var),by=index] # works (of course)
> > >
> > > index V1
> > >
> > > [1,] 1 101
> > >
> > > [2,] 2 102
> > >
> > > [3, 3 103
> > >
> > > >
> > >
> > > > dt[,mean(unlist(A)),by=index] # does not work!
> > >
> > > Error in `[.data.table`(dt, , mean(unlist(A)), by = index) :
> > >
> > > only integer,double,logical and character vectors are
> allowed so
> > > far. Type 19 would need to be added.
> > >
> > > >
> > >
> > > >
> > >
> > >
> > >
> > > #### Pure code ####
> > >
> > > #create matrix that includes list elements A
> > >
> > >
> (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(3
> > > 1:41))))
> > >
> > > class(mat)
> > >
> > > # convert to data frame and "fix" the first two entries
> > >
> > > (df<-as.data.frame(mat))
> > >
> > > class(df$ind ex) # be /o:p>
> > >
> > > df$index<-as.integer(df$index) # convert to integer
> > >
> > > df$var<-as.integer(df$var) # likewise
> > >
> > > # conver to data table
> > >
> > > dt<-data.table(df)
> > >
> > > setkey(dt,index)
> > >
> > >
> > >
> > > # try some operations
> > >
> > > dt[,A] # works
> > >
> > > dt[,mean(A)] # Does not work. each row of A is a list
> > >
> > > dt[,mean(unlist(A))] # But here is an easy fix to make this
> > >
> > >
> > >
> > > dt[,mean(var),by=index] # works (of course)
> > >
> > >
> > >
> > > dt[,mean(unlist(A)),by=index] # does not work!
> > >
> > >
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable
> > -help
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
atatable-help
>
More information about the datatable-help
mailing list