[datatable-help] Feature Idea

Mon Jul 11 21:42:06 CEST 2011

I've found that "by" does not need a key.  For example,

> temp <- data.table(Index1=1:4,Index2=c(4,2,2,1),Values=c(10,10,10,30))  # no key set here!
> temp[,sum(Values),by=Index2,bysameorder=TRUE]
     Index2 V1
[1,]      4 10
[2,]      2 20
[3,]      1 30
> temp[,sum(Values),by=Index2,bysameorder=FALSE]
     Index2 V1
[1,]      1 30
[2,]      2 20
[3,]      4 10

Nevertheless "bysameorder" changes the initial ordering.

But, more generally, is there a way to attach a key "on the fly" ?

Suppose I wanted to extract all table values where Index2 is equal to 1.  Is there a better way to do this than:
>setkey(temp,"Index2")
> temp[J(1),]
     Index2 Index1 Values
[1,]      1      4     30

Thanks,
Alex

-----Original Message-----
From: datatable-help-bounces at r-forge.wu-wien.ac.at [mailto:datatable-help-bounces at r-forge.wu-wien.ac.at] On Behalf Of Matthew Dowle
Sent: Saturday, July 09, 2011 3:54 AM
To: Steve Lianoglou
Cc: datatable-help at lists.r-forge.r-project.org
Subject: Re: [datatable-help] Feature Idea

(I think) it already does that. It's just that it sets a key on the result by default (which does the re-ordering of the grouped results at the end). If that's true, then could provide a way to not call setkey at the end. There is also the 'bysameorder' argument which might already be doing something similar.

Matthew 

On Fri, 2011-07-08 at 14:29 -0400, Steve Lianoglou wrote:
> Hi,
> 
> I find myself often wanting to use a data.table for its quick 
> aggregate&summary mojo, but I want to keep the ordering of my data as 
> I have it, and not as it would be if I set the appropriate keys for my 
> aggregation/summary.
> 
> How would you folks feel if I add a `by` (or dt.by) method for a data.table, eg:
> 
> result <- by(some.data.table, would.be.keys, {  ## stuff }, ...)
> 
> Which does the aggregate/summary encoded within { ... }, but the 
> result is returned in the same order as `some.data.table` was in when 
> it was passed into the function -- if { ... } returned as many rows as 
> were in the original data.table, then it's 1-for-1, but you are 
> summarizing groups of rows, the summary would be in the same
> (appearance) order as it is in `some.data.table`.
> 
> The { ... } block would essentially be anything you can put in the `j` 
> part of a data.table[i, j, ...].
> 
> The `...` dots after { ... } maybe extra params that can get passed 
> into a "normal" data.table[i,j,...] call (haven't thought about that 
> yet, tho).
> 
> If I can get some consensus on whether or not it's worthwhile to put 
> such a function into the data.table package, I'll go ahead and add an 
> initial implementation -- otherwise I can just keep it in my personal 
> utility belt whenever I need to use it.
> 
> Thanks,
> -steve
> 

_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help