[datatable-help] Data table syntax
David Winsemius
dwinsemius at comcast.net
Sun Sep 5 22:43:02 CEST 2010
On Sep 5, 2010, at 3:02 PM, Damian Betebenner wrote:
> Hi David,
>
> Thanks for the quick and thoughtful reply.
>
> Sorry for not being clearer.
>
> There are two schools that the students attended in 2009 (200 and
> 400). I'd like to break on those, and calculate the mean for all the
> students in those two schools but for their 2008 scores.
>
> Thus, the output would have 2 rows:
>
> YEAR, 2009_SCHOOL_NUMBER, 2008_SCORE_MEAN
>
> 2009 200 54.4
> 2009 400 53
I dont see what sort of linkage you have between the 2008 and 2009
school-numbers but see if this satisfies:
> dtxt[YEAR==2008 & STUDENT_ID %in% dtxt[YEAR==2009, STUDENT_ID],
mean(SCORE), by=SCHOOL_NUMBER]
SCHOOL_NUMBER V1
[1,] 100 48.2
[2,] 300 53.0
>
> Thanks for considering this,
>
> Damian
>
>
> Damian Betebenner
> Center for Assessment
> PO Box 351
> Dover, NH 03821-0351
>
> Phone (office): (603) 516-7900
> Phone (cell): (857) 234-2474
> Fax: (603) 516-7910
>
> dbetebenner at nciea.org
> www.nciea.org
>
>
>
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Sunday, September 05, 2010 1:03 PM
> To: David Winsemius
> Cc: Damian Betebenner; datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Data table syntax
>
>
> On Sep 5, 2010, at 12:43 PM, David Winsemius wrote:
>
>>
>> On Sep 5, 2010, at 11:38 AM, Damian Betebenner wrote:
>>
>>> Thanks for the invaluable help on my previous questions. The speed
>>> up in create summary tables has been immense and I'm enthused about
>>> all the possibilities going forward.
>>>
>>> I'm currently stuck in trying to put together syntax for a "long"
>>> for table. In the example below, each case is a unique Student by
>>> Year combination. What I'm trying to do is take
>>> such a table, aggregate on the student's current year (i.e., 2009
>>> in this data) SCHOOL_NUMBER, and calculate their mean score in the
>>> previous year (i.e., 2008 in this data).
>>>
>>> If the file were "wide", with each case representing a unique
>>> student with separate variables for the year, then it would be easy
>>> to break on the 2009 SCHOOL_NUMBER and take the
>>> mean of the 2008 SCORE.
>
> But there is only one 2008 SCORE for each student???
>
>>>
>>> Is conversion of long to wide necessary to do this?
>>
>> Probably not. Are you familiar with the "ave" function in base R?
>
> I am having some difficulty understanding the structure of the desired
> output. I initially thought it might be something like:
> rd.txt <-
> function(txt, header=TRUE, ...) {
> rd <- read.table(textConnection(txt), header=header, ...)
> closeAllConnections()
> rd }
> txt <- rd.txt("STUDENT_ID SCHOOL_NUMBER YEAR SCORE
> 1 100 2008 39
> 1 200 2009 48
> 2 100 2008 64
> 2 200 2009 73
> 3 100 2008 35
> 3 200 2009 35
> 4 100 2008 52
> 4 200 2009 61
> 5 100 2008 51
> 5 200 2009 58
> 6 300 2008 45
> 6 400 2009 55
> 7 300 2008 69
> 7 400 2009 77
> 8 300 2008 47
> 8 400 2009 47
> 9 300 2008 57
> 9 400 2009 58
> 10 300 2008 47
> 10 400 2009 53")
> dtxt <- data.table(txt)
>
>> dtxt$avScr <- dtxt[ , ave(SCORE, list(STUDENT_ID))] # returns a
> vector as long as its input
>> dtxt
>
> But now I am wondering if you wanted:
>
>> dtxt[ , tapply(SCORE, list(STUDENT_ID), mean)] # returns vector
> only as long as product of category levels.
> 1 2 3 4 5 6 7 8 9 10
> 43.5 68.5 35.0 56.5 54.5 50.0 73.0 47.0 57.5 50.0
>
>>
>>>
>>>
>>> STUDENT_ID SCHOOL_NUMBER YEAR SCORE
>>> [1,] 1 100 2008 39
>>> [2,] 1 200 2009 48
>>> [3,] 2 100 2008 64
>>> [4,] 2 200 2009 73
>>> [5,] 3 100 2008 35
>>> [6,] 3 200 2009 35
>>> [7,] 4 100 2008 52
>>> [8,] 4 200 2009 61
>>> [9,] 5 100 2008 51
>>> [10,] 5 200 2009 58
>>> [11,] 6 300 2008 45
>>> [12,] 6 400 2009 55
>>> [13,] 7 300 2008 69
>>> [14,] 7 400 2009 77
>>> [15,] 8 300 2008 47
>>> [16,] 8 400 2009 47
>>> [17,] 9 300 2008 57
>>> [18,] 9 400 2009 58
>>> [19,] 10 300 2008 47
>>> [20,] 10 400 2009 53
>>>
>>>
>>> Thanks,
>>>
>>> Damian
>>
>
>
> David Winsemius, MD
> West Hartford, CT
>
David Winsemius, MD
West Hartford, CT
More information about the datatable-help
mailing list