[datatable-help] Data table syntax
David Winsemius
dwinsemius at comcast.net
Sun Sep 5 19:02:44 CEST 2010
On Sep 5, 2010, at 12:43 PM, David Winsemius wrote:
>
> On Sep 5, 2010, at 11:38 AM, Damian Betebenner wrote:
>
>> Thanks for the invaluable help on my previous questions. The speed
>> up in create summary tables has been immense and I’m enthused about
>> all the possibilities going forward.
>>
>> I’m currently stuck in trying to put together syntax for a “long”
>> for table. In the example below, each case is a unique Student by
>> Year combination. What I’m trying to do is take
>> such a table, aggregate on the student’s current year (i.e., 2009
>> in this data) SCHOOL_NUMBER, and calculate their mean score in the
>> previous year (i.e., 2008 in this data).
>>
>> If the file were “wide”, with each case representing a unique
>> student with separate variables for the year, then it would be easy
>> to break on the 2009 SCHOOL_NUMBER and take the
>> mean of the 2008 SCORE.
But there is only one 2008 SCORE for each student???
>>
>> Is conversion of long to wide necessary to do this?
>
> Probably not. Are you familiar with the "ave" function in base R?
I am having some difficulty understanding the structure of the desired
output. I initially thought it might be something like:
rd.txt <-
function(txt, header=TRUE, ...) {
rd <- read.table(textConnection(txt), header=header, ...)
closeAllConnections()
rd }
txt <- rd.txt("STUDENT_ID SCHOOL_NUMBER YEAR SCORE
1 100 2008 39
1 200 2009 48
2 100 2008 64
2 200 2009 73
3 100 2008 35
3 200 2009 35
4 100 2008 52
4 200 2009 61
5 100 2008 51
5 200 2009 58
6 300 2008 45
6 400 2009 55
7 300 2008 69
7 400 2009 77
8 300 2008 47
8 400 2009 47
9 300 2008 57
9 400 2009 58
10 300 2008 47
10 400 2009 53")
dtxt <- data.table(txt)
> dtxt$avScr <- dtxt[ , ave(SCORE, list(STUDENT_ID))] # returns a
vector as long as its input
> dtxt
But now I am wondering if you wanted:
> dtxt[ , tapply(SCORE, list(STUDENT_ID), mean)] # returns vector
only as long as product of category levels.
1 2 3 4 5 6 7 8 9 10
43.5 68.5 35.0 56.5 54.5 50.0 73.0 47.0 57.5 50.0
>
>>
>>
>> STUDENT_ID SCHOOL_NUMBER YEAR SCORE
>> [1,] 1 100 2008 39
>> [2,] 1 200 2009 48
>> [3,] 2 100 2008 64
>> [4,] 2 200 2009 73
>> [5,] 3 100 2008 35
>> [6,] 3 200 2009 35
>> [7,] 4 100 2008 52
>> [8,] 4 200 2009 61
>> [9,] 5 100 2008 51
>> [10,] 5 200 2009 58
>> [11,] 6 300 2008 45
>> [12,] 6 400 2009 55
>> [13,] 7 300 2008 69
>> [14,] 7 400 2009 77
>> [15,] 8 300 2008 47
>> [16,] 8 400 2009 47
>> [17,] 9 300 2008 57
>> [18,] 9 400 2009 58
>> [19,] 10 300 2008 47
>> [20,] 10 400 2009 53
>>
>>
>> Thanks,
>>
>> Damian
>
David Winsemius, MD
West Hartford, CT
More information about the datatable-help
mailing list