[datatable-help] data.table - grouping character values

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue May 10 15:16:37 CEST 2011


Hi,

On Tue, May 10, 2011 at 7:56 AM, Nicolas Servant
<Nicolas.Servant at curie.fr> wrote:
> Dear Datatable users,
>
> I'm interested in using the data.table package to reduce and group high
> dimensional data (from next generation sequencing)
> I have my data.table which show me for each reads (features), the
> associated annotation.
> And because one feature can have several annotations, I would like to
> group these annotations per feature
> The numerical operations on the annotation work well (for instance
> length()), but the functions on string values seem to not work (for
> instance paste())
> At the end, I would like to have something like :
>
>                    reads         annot
> [1,]  1279_1000_530_F3-ad Simple_repeat, LINE
> [2,]  1279_1000_940_F3-ad         snRNA
> ...
> Do you have any suggestion to do that ?
>
>>dt[1:20,]
>
>                     reads         annot
>  [1,]  1279_1000_530_F3-ad Simple_repeat
>  [2,]  1279_1000_530_F3-ad          LINE
>  [3,]  1279_1000_940_F3-ad         snRNA
>  [4,]  1279_1000_940_F3-ad         snRNA
>  [5,] 1279_1018_1051_F3-ad Simple_repeat

A similar question using paste came up a few days ago ... this should
do the trick:

R> key(dt) <- 'reads'
R> dt[, paste(annot, collapse=','), by=reads]

Hope that helps,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list