[datatable-help] using paste function while grouping gives strange results

Steve Harman stvharman at gmail.com
Fri May 6 04:54:48 CEST 2011


Hello

I have a data table called dt in which each student can have multiple
records (created using data.table)

coursecode    student_id
----------------    ----------------
NA               1
NA               1
NA               1
....                1
....                1
NA                2
101               2
102               2
NA                2
103                2

I am trying to group by student id and concatenate the coursecode
strings in
student records. This string is mostly NA but it can also be real
course code
(because of messy real life data coursecode was not always entered)
There are 999999 records.

So, I thought I would get results like

1 NA NA NA .....
2 NA 101 102 NA 123 ....

However, as seen below, it  brings me a result with 999999 rows
and it fails to concatenate the coursecode's.

>  codes <- dt[,paste(coursecode),by=student_id]
> codes
     student_id V1
 [1,]          1 NA
 [2,]          1 NA
 [3,]          1 NA
 [4,]          1 NA
 [5,]          1 NA
 [6,]          1 NA
 [7,]          1 NA
 [8,]          1 NA
 [9,]          1 NA
[10,]          1 NA
First 10 rows of 999999 printed.

If I repeat the same example for a numeric attribute and use some math
aggregation functions such as sum, mean, etc., then the number of rows
returned is correct, it is indeed equal to the number of students.

I was wondering if the problem is with NA's or with the use of paste
as the aggregation function. I can alternatively use RMySQL with MySQL
to concatenate those strings but I would like to use data.table if
possible.

Thanks in advance,

Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20110505/85119df1/attachment.htm>


More information about the datatable-help mailing list