[datatable-help] Idea/feature request

Tue Mar 1 20:41:52 CET 2011

Hi,

Now that data.table has dropped the "by" columns in .SD (since 1.5.3),
I find myself wanting a way to figure out what subset I'm working on
when I'm processing data in groups.

The reason is that I want to (at times) fetch/retrieve relevant info
from another data.table that relates to the current subest I'm
operating on. I know I can join the two tables together first before
processing the groups, but sometimes I work with large data.tables and
don't want to merge them at once and then iterate over the new
agglomerated/huge one.

Would anyone else find it helpful to have data.table inject something
like a ".by" (or .BY) variable into the scope of the current
processing group so we can reference it as we see fit.

For example:

result <- some.big.data.table[, by=list(colA, colB), {
  ## Sometimes I want to know what the current values of
  ## colA and colB are in here to get some more info. Mabye
  ## we can have .BY:

  xref <- more.data[J(.BY[1], .BY[2]), mult='all'] ## or something
  ## ...
}]

You know? Does anyone else find themselves in this boat too, or is
there a better way to do what I'm after already?

Thanks,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact