[datatable-help] Idea/feature request
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue May 10 16:29:44 CEST 2011
Hi Andreas,
On Tue, May 10, 2011 at 9:38 AM, Andreas Borg
<andreas.borg at unimedizin-mainz.de> wrote:
> Hi all,
>
> I support this proposal (original message below). One more suggestion on
> this: It might be useful if the proposed ".BY" object would have only a
> single row with the current values of the grouping variables instead of as
> much (duplicate) rows as the group. Whatever computation one wants to do
> with .BY would need to be executed only once and the result recycled for
> each row in the group.
>
> Anyway, are there any news on this topic?
It is on the radar:
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1313&group_id=240&atid=978
I was planning on letting Matthew crank this one out since it was in
the c-guts of data.table, but maybe I can take a look at it, too.
Although I initially proposed the .BY thing, I think Matthew's follow
up (I forget) might have questioned the reasoning behind using .BY
instead of just injecting the variable into the scope w/o .BY
Now that you've brought this back up, what do you think you would
prefer? For example, using my (admittedly contrived) original example:
result <- some.big.data.table[, by=list(colA, colB), {
## Sometimes I want to know what the current values of
## colA and colB are in here to get some more info. Mabye
## we can have .BY:
xref <- more.data[J(.BY[1], .BY[2]), mult='all'] ## or something
## ...
}]
Should it be `J(.BY[1], .BY[2])` or is something like `J(colA, colB)`
more natural, you think?
I think I also agree with you that the length of the BY values only
needs to be 1 (and not, say, the same as what nrow(.SD) would be).
Thanks,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the datatable-help
mailing list