[datatable-help] Idea/feature request

Tue May 10 16:29:44 CEST 2011

Hi Andreas,

On Tue, May 10, 2011 at 9:38 AM, Andreas Borg
<andreas.borg at unimedizin-mainz.de> wrote:
> Hi all,
>
> I support this proposal (original message below). One more suggestion on
> this: It might be useful if the proposed ".BY" object would have only a
> single row with the current values of the grouping variables instead of as
> much (duplicate) rows as the group. Whatever computation one wants to do
> with .BY would need to be executed only once and the result recycled for
> each row in the group.
>
> Anyway, are there any news on this topic?

It is on the radar:

https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1313&group_id=240&atid=978

I was planning on letting Matthew crank this one out since it was in
the c-guts of data.table, but maybe I can take a look at it, too.

Although I initially proposed the .BY thing, I think Matthew's follow
up (I forget) might have questioned the reasoning behind using .BY
instead of just injecting the variable into the scope w/o .BY

Now that you've brought this back up, what do you think you would
prefer? For example, using my (admittedly contrived) original example:

result <- some.big.data.table[, by=list(colA, colB), {
 ## Sometimes I want to know what the current values of
 ## colA and colB are in here to get some more info. Mabye
 ## we can have .BY:

 xref <- more.data[J(.BY[1], .BY[2]), mult='all'] ## or something
 ## ...
}]

Should it be `J(.BY[1], .BY[2])` or is something like `J(colA, colB)`
more natural, you think?

I think I also agree with you that the length of the BY values only
needs to be 1 (and not, say, the same as what nrow(.SD) would be).

Thanks,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact