[datatable-help] Data table syntax
Damian Betebenner
dbetebenner at nciea.org
Sun Sep 5 21:02:46 CEST 2010
Hi David,
Thanks for the quick and thoughtful reply.
Sorry for not being clearer.
There are two schools that the students attended in 2009 (200 and 400). I'd like to break on those, and calculate the mean for all the students in those two schools but for their 2008 scores.
Thus, the output would have 2 rows:
YEAR, 2009_SCHOOL_NUMBER, 2008_SCORE_MEAN
2009 200 54.4
2009 400 53
Thanks for considering this,
Damian
Damian Betebenner
Center for Assessment
PO Box 351
Dover, NH 03821-0351
Phone (office): (603) 516-7900
Phone (cell): (857) 234-2474
Fax: (603) 516-7910
dbetebenner at nciea.org
www.nciea.org
-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net]
Sent: Sunday, September 05, 2010 1:03 PM
To: David Winsemius
Cc: Damian Betebenner; datatable-help at lists.r-forge.r-project.org
Subject: Re: [datatable-help] Data table syntax
On Sep 5, 2010, at 12:43 PM, David Winsemius wrote:
>
> On Sep 5, 2010, at 11:38 AM, Damian Betebenner wrote:
>
>> Thanks for the invaluable help on my previous questions. The speed
>> up in create summary tables has been immense and I'm enthused about
>> all the possibilities going forward.
>>
>> I'm currently stuck in trying to put together syntax for a "long"
>> for table. In the example below, each case is a unique Student by
>> Year combination. What I'm trying to do is take
>> such a table, aggregate on the student's current year (i.e., 2009
>> in this data) SCHOOL_NUMBER, and calculate their mean score in the
>> previous year (i.e., 2008 in this data).
>>
>> If the file were "wide", with each case representing a unique
>> student with separate variables for the year, then it would be easy
>> to break on the 2009 SCHOOL_NUMBER and take the
>> mean of the 2008 SCORE.
But there is only one 2008 SCORE for each student???
>>
>> Is conversion of long to wide necessary to do this?
>
> Probably not. Are you familiar with the "ave" function in base R?
I am having some difficulty understanding the structure of the desired
output. I initially thought it might be something like:
rd.txt <-
function(txt, header=TRUE, ...) {
rd <- read.table(textConnection(txt), header=header, ...)
closeAllConnections()
rd }
txt <- rd.txt("STUDENT_ID SCHOOL_NUMBER YEAR SCORE
1 100 2008 39
1 200 2009 48
2 100 2008 64
2 200 2009 73
3 100 2008 35
3 200 2009 35
4 100 2008 52
4 200 2009 61
5 100 2008 51
5 200 2009 58
6 300 2008 45
6 400 2009 55
7 300 2008 69
7 400 2009 77
8 300 2008 47
8 400 2009 47
9 300 2008 57
9 400 2009 58
10 300 2008 47
10 400 2009 53")
dtxt <- data.table(txt)
> dtxt$avScr <- dtxt[ , ave(SCORE, list(STUDENT_ID))] # returns a
vector as long as its input
> dtxt
But now I am wondering if you wanted:
> dtxt[ , tapply(SCORE, list(STUDENT_ID), mean)] # returns vector
only as long as product of category levels.
1 2 3 4 5 6 7 8 9 10
43.5 68.5 35.0 56.5 54.5 50.0 73.0 47.0 57.5 50.0
>
>>
>>
>> STUDENT_ID SCHOOL_NUMBER YEAR SCORE
>> [1,] 1 100 2008 39
>> [2,] 1 200 2009 48
>> [3,] 2 100 2008 64
>> [4,] 2 200 2009 73
>> [5,] 3 100 2008 35
>> [6,] 3 200 2009 35
>> [7,] 4 100 2008 52
>> [8,] 4 200 2009 61
>> [9,] 5 100 2008 51
>> [10,] 5 200 2009 58
>> [11,] 6 300 2008 45
>> [12,] 6 400 2009 55
>> [13,] 7 300 2008 69
>> [14,] 7 400 2009 77
>> [15,] 8 300 2008 47
>> [16,] 8 400 2009 47
>> [17,] 9 300 2008 57
>> [18,] 9 400 2009 58
>> [19,] 10 300 2008 47
>> [20,] 10 400 2009 53
>>
>>
>> Thanks,
>>
>> Damian
>
David Winsemius, MD
West Hartford, CT
More information about the datatable-help
mailing list