[datatable-help] Data table syntax

David Winsemius dwinsemius at comcast.net
Sun Sep 5 22:43:02 CEST 2010


On Sep 5, 2010, at 3:02 PM, Damian Betebenner wrote:

> Hi David,
>
> Thanks for the quick and thoughtful reply.
>
> Sorry for not being clearer.
>
> There are two schools that the students attended in 2009 (200 and  
> 400). I'd like to break on those, and calculate the mean for all the  
> students in those two schools but for their 2008 scores.
>
> Thus, the output would have 2 rows:
>
> YEAR, 2009_SCHOOL_NUMBER, 2008_SCORE_MEAN
>
> 2009  200    54.4
> 2009  400    53

I dont see what sort of linkage you have between the 2008 and 2009  
school-numbers but see if this satisfies:

 > dtxt[YEAR==2008 & STUDENT_ID %in% dtxt[YEAR==2009, STUDENT_ID],  
mean(SCORE), by=SCHOOL_NUMBER]
      SCHOOL_NUMBER   V1
[1,]           100 48.2
[2,]           300 53.0

>
> Thanks for considering this,
>
> Damian
>
>
> Damian Betebenner
> Center for Assessment
> PO Box 351
> Dover, NH   03821-0351
>
> Phone (office): (603) 516-7900
> Phone (cell): (857) 234-2474
> Fax: (603) 516-7910
>
> dbetebenner at nciea.org
> www.nciea.org
>
>
>
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Sunday, September 05, 2010 1:03 PM
> To: David Winsemius
> Cc: Damian Betebenner; datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Data table syntax
>
>
> On Sep 5, 2010, at 12:43 PM, David Winsemius wrote:
>
>>
>> On Sep 5, 2010, at 11:38 AM, Damian Betebenner wrote:
>>
>>> Thanks for the invaluable help on my previous questions. The speed
>>> up in create summary tables has been immense and I'm enthused about
>>> all the possibilities going forward.
>>>
>>> I'm currently stuck in trying to put together syntax for a "long"
>>> for table. In the example below, each case is a unique Student by
>>> Year combination. What I'm trying to do is take
>>> such a table, aggregate on the student's  current year (i.e., 2009
>>> in this data) SCHOOL_NUMBER, and calculate their mean score in the
>>> previous year (i.e., 2008 in this data).
>>>
>>> If the file were "wide", with each case representing a unique
>>> student with separate variables for the year, then it would be easy
>>> to break on the 2009 SCHOOL_NUMBER and take the
>>> mean of the 2008 SCORE.
>
> But there is only one 2008 SCORE for each student???
>
>>>
>>> Is conversion of long to wide necessary to do this?
>>
>> Probably not. Are you familiar with the "ave" function in base R?
>
> I am having some difficulty understanding the structure of the desired
> output. I initially thought it might be something like:
>  rd.txt <-
> function(txt, header=TRUE, ...) {
>      rd <- read.table(textConnection(txt), header=header, ...)
>        closeAllConnections()
>      rd }
> txt <- rd.txt("STUDENT_ID SCHOOL_NUMBER YEAR SCORE
>          1           100 2008    39
>          1           200 2009    48
>          2           100 2008    64
>          2           200 2009    73
>          3           100 2008    35
>          3           200 2009    35
>          4           100 2008    52
>          4           200 2009    61
>          5           100 2008    51
>          5           200 2009    58
>          6           300 2008    45
>          6           400 2009    55
>          7           300 2008    69
>          7           400 2009    77
>          8           300 2008    47
>          8           400 2009    47
>          9           300 2008    57
>          9           400 2009    58
>         10           300 2008    47
>         10           400 2009    53")
> dtxt <- data.table(txt)
>
>> dtxt$avScr <- dtxt[ , ave(SCORE, list(STUDENT_ID))]  # returns a
> vector as long as its input
>> dtxt
>
> But now I am wondering if you wanted:
>
>> dtxt[ , tapply(SCORE, list(STUDENT_ID), mean)]  # returns vector
> only as long as product of category levels.
>    1    2    3    4    5    6    7    8    9   10
> 43.5 68.5 35.0 56.5 54.5 50.0 73.0 47.0 57.5 50.0
>
>>
>>>
>>>
>>>     STUDENT_ID SCHOOL_NUMBER YEAR SCORE
>>> [1,]          1           100 2008    39
>>> [2,]          1           200 2009    48
>>> [3,]          2           100 2008    64
>>> [4,]          2           200 2009    73
>>> [5,]          3           100 2008    35
>>> [6,]          3           200 2009    35
>>> [7,]          4           100 2008    52
>>> [8,]          4           200 2009    61
>>> [9,]          5           100 2008    51
>>> [10,]          5           200 2009    58
>>> [11,]          6           300 2008    45
>>> [12,]          6           400 2009    55
>>> [13,]          7           300 2008    69
>>> [14,]          7           400 2009    77
>>> [15,]          8           300 2008    47
>>> [16,]          8           400 2009    47
>>> [17,]          9           300 2008    57
>>> [18,]          9           400 2009    58
>>> [19,]         10           300 2008    47
>>> [20,]         10           400 2009    53
>>>
>>>
>>> Thanks,
>>>
>>> Damian
>>
>
>
> David Winsemius, MD
> West Hartford, CT
>

David Winsemius, MD
West Hartford, CT



More information about the datatable-help mailing list