<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal>In an earlier post, I had a large number of variables in a data table that I wanted to summarize. For example, I had A1-A30 and wanted to find the mean of A1-A30, by certain groups.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>While I (with help!) eventually found a solution for this, I am finding out that I need something more sophisticated. For example, sometimes I want to only find the mean of A1-A10, or I may want to find a more complex function of the Ai’s. I would like to be able to vary these functions of A1-A30 very easily.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>For this reason, I decided to restructure the data table so that, for each row, A1-A30 (30 individual numbers) is expressed as A, a list of length 1 whose first (and only) entry is the vector of the 30 A1-A30 values.<o:p></o:p></
> </o:p></p><p class=MsoNormal>The problem I am now having is shown below (first, prompt+code & o/p, then just pure code in case you want to try it). While I can make this work for data frames, I am hoping I can find a work to have it work for data tables. (This can be really useful, by the way. In a recent project, my data frame included lists of 4D arrays, which was the perfect structure for summing, e.g. in a variety of dimensions.)<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> #create matrix that includes list elements A<o:p></o:p></p><p class=MsoNormal>> (mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(31:41))))<o:p></o:p></p><p class=MsoNormal> index var A <o:p></o:p></p><p class=MsoNormal>[1,] 1 101 Integer,5 <o:p></o:p></p><p class=MsoNormal>[2,] 2 102 In
class=MsoNormal>[3,] 3 103 Integer,11<o:p></o:p></p><p class=MsoNormal>> class(mat)<o:p></o:p></p><p class=MsoNormal>[1] "matrix"<o:p></o:p></p><p class=MsoNormal>> # convert to data frame and "fix" the first two entries<o:p></o:p></p><p class=MsoNormal>> (df<-as.data.frame(mat))<o:p></o:p></p><p class=MsoNormal> index var A<o:p></o:p></p><p class=MsoNormal>1 1 101 11, 12, 13, 14, 15<o:p></o:p></p><p class=MsoNormal>2 2 102 &nbs
; 21, 22, 23, 24, 25<o:p></o:p></p><p class=MsoNormal>3 3 103 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41<o:p></o:p></p><p class=MsoNormal>> class(df$index) # because mat is atomic<o:p></o:p></p><p class=MsoNormal>[1] "list"<o:p></o:p></p><p class=MsoNormal>> df$index<-as.integer(df$index) # convert to integer<o:p></o:p></p><p class=MsoNormal>> df$var<-as.integer(df$var) # likewise<o:p></o:p></p><p class=MsoNormal>> # conver to data table<o:p></o:p></p><p class=MsoNormal>> dt<-data.table(df)<o:p></o:p></p><p class=MsoNormal>> setkey(dt,index)<o:p></o:p></p><p class=MsoNormal>> <o:p></o:p></p><p class=MsoNormal>> # try some operations<o:p></o:p></p><p class=MsoNormal>> dt[,A] # works<o:p></o:p></p><p class=MsoNormal>[[1]]<o:p></o:p></p><p class=MsoNormal>[1] 11 12 13 14 15<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[[2]]<o:p></o:p></p><p cl
3 24 25<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[[3]]<o:p></o:p></p><p class=MsoNormal> [1] 31 32 33 34 35 36 37 38 39 40 41<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>> dt[,mean(A)] # Does not work. each row of A is a list<o:p></o:p></p><p class=MsoNormal>[1] NA<o:p></o:p></p><p class=MsoNormal>Warning message:<o:p></o:p></p><p class=MsoNormal>In mean.default(A) : argument is not numeric or logical: returning NA<o:p></o:p></p><p class=MsoNormal>> dt[,mean(unlist(A))] # But here is an easy fix to make this work<o:p></o:p></p><p class=MsoNormal>[1] 27.42857<o:p></o:p></p><p class=MsoNormal>> <o:p></o:p></p><p class=MsoNormal>> dt[,mean(var),by=index] # works (of course)<o:p></o:p></p><p class=MsoNormal> index V1<o:p></o:p></p><p class=MsoNormal>[1,] 1 101<o:p></o:p></p><p class=MsoNormal>[2,] 2 102<o:p></o:p></p><p class=MsoNormal>[3,
3 103<o:p></o:p></p><p class=MsoNormal>> <o:p></o:p></p><p class=MsoNormal>> dt[,mean(unlist(A)),by=index] # does not work! <o:p></o:p></p><p class=MsoNormal>Error in `[.data.table`(dt, , mean(unlist(A)), by = index) : <o:p></o:p></p><p class=MsoNormal> only integer,double,logical and character vectors are allowed so far. Type 19 would need to be added.<o:p></o:p></p><p class=MsoNormal>> <o:p></o:p></p><p class=MsoNormal>><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>#### Pure code ####<o:p></o:p></p><p class=MsoNormal>#create matrix that includes list elements A<o:p></o:p></p><p class=MsoNormal>(mat<-cbind(index=1:3,var=101:103,A=c(list(11:15),list(21:25),list(31:41))))<o:p></o:p></p><p class=MsoNormal>class(mat)<o:p></o:p></p><p class=MsoNormal># convert to data frame and "fix" the first two entries<o:p></o:p></p><p class=MsoNormal>(df<-as.data.frame(mat))<o:p></o:p></p><p class=MsoNormal>class(df$index) # be
/o:p></p><p class=MsoNormal>df$index<-as.integer(df$index) # convert to integer<o:p></o:p></p><p class=MsoNormal>df$var<-as.integer(df$var) # likewise<o:p></o:p></p><p class=MsoNormal># conver to data table<o:p></o:p></p><p class=MsoNormal>dt<-data.table(df)<o:p></o:p></p><p class=MsoNormal>setkey(dt,index)<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal># try some operations<o:p></o:p></p><p class=MsoNormal>dt[,A] # works<o:p></o:p></p><p class=MsoNormal>dt[,mean(A)] # Does not work. each row of A is a list<o:p></o:p></p><p class=MsoNormal>dt[,mean(unlist(A))] # But here is an easy fix to make this<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>dt[,mean(var),by=index] # works (of course)<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>dt[,mean(unlist(A)),by=index] # does not work! <o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal><o:p>&
soNormal><o:p> </o:p></p><p class=MsoNormal>Joseph G. Voelkel, Ph.D.<o:p></o:p></p><p class=MsoNormal>Professor, Center for Quality and Applied Statistics<o:p></o:p></p><p class=MsoNormal>Kate Gleason College of Engineering<o:p></o:p></p><p class=MsoNormal>Rochester Institute of Technology<o:p></o:p></p><p class=MsoNormal>V 585-475-2231<o:p></o:p></p><p class=MsoNormal>F 585-475-5959<o:p></o:p></p><p class=MsoNormal>joseph.voelkel@rit.edu<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p></div></body></html>