[datatable-help] Idea/feature request

Matthew Dowle mdowle at mdowle.plus.com
Wed Mar 2 10:22:50 CET 2011


Yes, good idea. I thought the grouping columns would already be in scope
actually, that was an oversight. Intention was just to change .SD. The
first item of each group needs to be installed in scope then. If you
feel like looking: internally .SD points to itself so you can use
variables alongside .SD. It's the upper level that needs a small loop to
set them. I didn't like the .BY on first glance, but thinking about it
maybe that could be useful too in cases when you write generic code that
works when the by column names vary. You could have .BY pointing to
itself, too. .BY would contain a set of 1-length vectors.  There's a FAQ
that answers why j needs grp[1] rather than grp, so that would be better
and no longer needed as grp would be length 1 anyway. Sorry I realise
that's all English and no R examples, will add later if needed.
Matthew

On Tue, 2011-03-01 at 14:41 -0500, Steve Lianoglou wrote:
> Hi,
> 
> Now that data.table has dropped the "by" columns in .SD (since 1.5.3),
> I find myself wanting a way to figure out what subset I'm working on
> when I'm processing data in groups.
> 
> The reason is that I want to (at times) fetch/retrieve relevant info
> from another data.table that relates to the current subest I'm
> operating on. I know I can join the two tables together first before
> processing the groups, but sometimes I work with large data.tables and
> don't want to merge them at once and then iterate over the new
> agglomerated/huge one.
> 
> Would anyone else find it helpful to have data.table inject something
> like a ".by" (or .BY) variable into the scope of the current
> processing group so we can reference it as we see fit.
> 
> For example:
> 
> result <- some.big.data.table[, by=list(colA, colB), {
>   ## Sometimes I want to know what the current values of
>   ## colA and colB are in here to get some more info. Mabye
>   ## we can have .BY:
> 
>   xref <- more.data[J(.BY[1], .BY[2]), mult='all'] ## or something
>   ## ...
> }]
> 
> You know? Does anyone else find themselves in this boat too, or is
> there a better way to do what I'm after already?
> 
> Thanks,
> -steve
> 




More information about the datatable-help mailing list