[datatable-help] columns in .SD with grouping ad-hoc using "by"

Matthew Dowle mdowle at mdowle.plus.com
Sun May 12 12:58:08 CEST 2013


 

On 12.05.2013 09:12, Arunkumar Srinivasan wrote: 

> Hi, 
> Suppose
you've a data.table, say: 
> require(data.table) 
> DT <- data.table(x =
1:5, y = 6:10) 
> 
> Suppose you want to group by "x %/% 2" ( = 0, 1,1,
2,2) and then calculate the sum of each column for each group, then one
would do: 
> DT[, grp := x %/% 2] 
> DT[, list(x.sum=sum(x),
y.sum=sum(y)), by = grp] # avoid .SD in case of few columns

I know this
isn't the main point (keep scrolling down) but just as an aside : 
DT[,
lapply(.SD, sum), by = grp, .SDcols=c("x","y")] # intended way to avoid
.SD in case of a few columns 

> Now, assume that you've many many
columns which would make the use of `.SD` sensible. 
> DT[, lapply(.SD,
sum), by = grp] 
> 
> grp x y 
> 1: 0 1 6 
> 2: 1 5 15 
> 3: 2 9 19 
>
The issue is that if you create the grouping column ad-hoc, then the
column from which the ad-hoc grouping column is derived is not available
to .SD. Let me illustrate this: 
> 
> DT <- data.table(x = 1:5, y =
6:10) 
> DT[, lapply(.SD, sum), by = (grp=x %/% 2)] # ad-hoc creation of
grouping column 
> 
> grp y 
> 1: 0 6 
> 2: 1 15 
> 3: 2 19 
> I think
it'd be nice to have the column available to `.SD` so that we can save
creating a temporary column, grouping and then deleting it, as
"technically" it *is* a new column (meaning, "x" must still be
available). Any take on this?

.BY is available to j already for that
reason, does that work? .BY isn't a column of .SD because i) it's the
same value for every row of .SD i.e. .BY[[1]] is length 1 and contains
this particular group (replicating the same value would be wasteful) but
more significantly ii) it is often a character group name where running
an aggregation function like sum() would trip up on it. 

Arun 

&

> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130512/fefe39f5/attachment.html>


More information about the datatable-help mailing list