[datatable-help] Idea/feature request
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Mar 1 20:41:52 CET 2011
Hi,
Now that data.table has dropped the "by" columns in .SD (since 1.5.3),
I find myself wanting a way to figure out what subset I'm working on
when I'm processing data in groups.
The reason is that I want to (at times) fetch/retrieve relevant info
from another data.table that relates to the current subest I'm
operating on. I know I can join the two tables together first before
processing the groups, but sometimes I work with large data.tables and
don't want to merge them at once and then iterate over the new
agglomerated/huge one.
Would anyone else find it helpful to have data.table inject something
like a ".by" (or .BY) variable into the scope of the current
processing group so we can reference it as we see fit.
For example:
result <- some.big.data.table[, by=list(colA, colB), {
## Sometimes I want to know what the current values of
## colA and colB are in here to get some more info. Mabye
## we can have .BY:
xref <- more.data[J(.BY[1], .BY[2]), mult='all'] ## or something
## ...
}]
You know? Does anyone else find themselves in this boat too, or is
there a better way to do what I'm after already?
Thanks,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the datatable-help
mailing list