[datatable-help] Data table syntax

David Winsemius dwinsemius at comcast.net
Sun Sep 5 19:02:44 CEST 2010


On Sep 5, 2010, at 12:43 PM, David Winsemius wrote:

>
> On Sep 5, 2010, at 11:38 AM, Damian Betebenner wrote:
>
>> Thanks for the invaluable help on my previous questions. The speed  
>> up in create summary tables has been immense and I’m enthused about  
>> all the possibilities going forward.
>>
>> I’m currently stuck in trying to put together syntax for a “long”  
>> for table. In the example below, each case is a unique Student by  
>> Year combination. What I’m trying to do is take
>> such a table, aggregate on the student’s  current year (i.e., 2009  
>> in this data) SCHOOL_NUMBER, and calculate their mean score in the  
>> previous year (i.e., 2008 in this data).
>>
>> If the file were “wide”, with each case representing a unique  
>> student with separate variables for the year, then it would be easy  
>> to break on the 2009 SCHOOL_NUMBER and take the
>> mean of the 2008 SCORE.

But there is only one 2008 SCORE for each student???

>>
>> Is conversion of long to wide necessary to do this?
>
> Probably not. Are you familiar with the "ave" function in base R?

I am having some difficulty understanding the structure of the desired  
output. I initially thought it might be something like:
  rd.txt <-
function(txt, header=TRUE, ...) {
      rd <- read.table(textConnection(txt), header=header, ...)
        closeAllConnections()
      rd }
txt <- rd.txt("STUDENT_ID SCHOOL_NUMBER YEAR SCORE
          1           100 2008    39
          1           200 2009    48
          2           100 2008    64
          2           200 2009    73
          3           100 2008    35
          3           200 2009    35
          4           100 2008    52
          4           200 2009    61
          5           100 2008    51
          5           200 2009    58
          6           300 2008    45
          6           400 2009    55
          7           300 2008    69
          7           400 2009    77
          8           300 2008    47
          8           400 2009    47
          9           300 2008    57
          9           400 2009    58
         10           300 2008    47
         10           400 2009    53")
dtxt <- data.table(txt)

 > dtxt$avScr <- dtxt[ , ave(SCORE, list(STUDENT_ID))]  # returns a  
vector as long as its input
 > dtxt

But now I am wondering if you wanted:

 > dtxt[ , tapply(SCORE, list(STUDENT_ID), mean)]  # returns vector  
only as long as product of category levels.
    1    2    3    4    5    6    7    8    9   10
43.5 68.5 35.0 56.5 54.5 50.0 73.0 47.0 57.5 50.0

>
>>
>>
>>      STUDENT_ID SCHOOL_NUMBER YEAR SCORE
>> [1,]          1           100 2008    39
>> [2,]          1           200 2009    48
>> [3,]          2           100 2008    64
>> [4,]          2           200 2009    73
>> [5,]          3           100 2008    35
>> [6,]          3           200 2009    35
>> [7,]          4           100 2008    52
>> [8,]          4           200 2009    61
>> [9,]          5           100 2008    51
>> [10,]          5           200 2009    58
>> [11,]          6           300 2008    45
>> [12,]          6           400 2009    55
>> [13,]          7           300 2008    69
>> [14,]          7           400 2009    77
>> [15,]          8           300 2008    47
>> [16,]          8           400 2009    47
>> [17,]          9           300 2008    57
>> [18,]          9           400 2009    58
>> [19,]         10           300 2008    47
>> [20,]         10           400 2009    53
>>
>>
>> Thanks,
>>
>> Damian
>


David Winsemius, MD
West Hartford, CT



More information about the datatable-help mailing list