[datatable-help] data.table - grouping character values

Tue May 10 13:56:36 CEST 2011

Dear Datatable users,

I'm interested in using the data.table package to reduce and group high
dimensional data (from next generation sequencing)
I have my data.table which show me for each reads (features), the
associated annotation.
And because one feature can have several annotations, I would like to
group these annotations per feature
The numerical operations on the annotation work well (for instance
length()), but the functions on string values seem to not work (for
instance paste())
At the end, I would like to have something like :

                    reads         annot
[1,]  1279_1000_530_F3-ad Simple_repeat, LINE
[2,]  1279_1000_940_F3-ad         snRNA
...

Thanks
Regards,
Nicolas Servant

Do you have any suggestion to do that ?

>dt[1:20,]

                     reads         annot
 [1,]  1279_1000_530_F3-ad Simple_repeat
 [2,]  1279_1000_530_F3-ad          LINE
 [3,]  1279_1000_940_F3-ad         snRNA
 [4,]  1279_1000_940_F3-ad         snRNA
 [5,] 1279_1018_1051_F3-ad Simple_repeat

>g=dt[,length(annot), by=reads]
>head(g)

                    reads V1
[1,]  1279_1000_530_F3-ad  2
[2,]  1279_1000_940_F3-ad  2
[3,] 1279_1018_1051_F3-ad  1
[4,]   1279_1019_49_F3-ad 13
[5,]  1279_1019_571_F3-ad 14
[6,]  1279_1024_555_F3-ad  1

-- 
Nicolas Servant
Equipe Bioinformatique
Institut Curie
26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE

Email: Nicolas.Servant at curie.fr
Tel: 01 56 24 69 85
http://bioinfo.curie.fr/