[datatable-help] Idea/feature request

Andreas Borg andreas.borg at unimedizin-mainz.de
Tue May 10 15:38:22 CEST 2011


Hi all,

I support this proposal (original message below). One more suggestion on 
this: It might be useful if the proposed ".BY" object would have only a 
single row with the current values of the grouping variables instead of 
as much (duplicate) rows as the group. Whatever computation one wants to 
do with .BY would need to be executed only once and the result recycled 
for each row in the group.

Anyway, are there any news on this topic?

Andreas


> Hi,
>
> Now that data.table has dropped the "by" columns in .SD (since 1.5.3),
> I find myself wanting a way to figure out what subset I'm working on
> when I'm processing data in groups.
>
> The reason is that I want to (at times) fetch/retrieve relevant info
> from another data.table that relates to the current subest I'm
> operating on. I know I can join the two tables together first before
> processing the groups, but sometimes I work with large data.tables and
> don't want to merge them at once and then iterate over the new
> agglomerated/huge one.
>
> Would anyone else find it helpful to have data.table inject something
> like a ".by" (or .BY) variable into the scope of the current
> processing group so we can reference it as we see fit.
>
> For example:
>
> result <- some.big.data.table[, by=list(colA, colB), {
>   ## Sometimes I want to know what the current values of
>   ## colA and colB are in here to get some more info. Mabye
>   ## we can have .BY:
>
>   xref <- more.data[J(.BY[1], .BY[2]), mult='all'] ## or something
>   ## ...
> }]
>
> You know? Does anyone else find themselves in this boat too, or is
> there a better way to do what I'm after already?
>
> Thanks,
> -steve
>
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact <http://cbio.mskcc.org/%7Elianos/contact>
>   

-- 
Andreas Borg
Medizinische Informatik

UNIVERSITÄTSMEDIZIN
der Johannes Gutenberg-Universität
Institut für Medizinische Biometrie, Epidemiologie und Informatik
Obere Zahlbacher Straße 69, 55131 Mainz
www.imbei.uni-mainz.de

Telefon +49 (0) 6131 175062
E-Mail: borg at imbei.uni-mainz.de

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der
richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe
dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.



More information about the datatable-help mailing list