[datatable-help] columns in .SD with grouping ad-hoc using "by"
Arunkumar Srinivasan
aragorn168b at gmail.com
Sun May 12 10:12:17 CEST 2013
Hi,
Suppose you've a data.table, say:
require(data.table)
DT <- data.table(x = 1:5, y = 6:10)
Suppose you want to group by "x %/% 2" ( = 0, 1,1, 2,2) and then calculate the sum of each column for each group, then one would do:
DT[, grp := x %/% 2]
DT[, list(x.sum=sum(x), y.sum=sum(y)), by = grp] # avoid .SD in case of few columns
Now, assume that you've many many columns which would make the use of `.SD` sensible.
DT[, lapply(.SD, sum), by = grp]
grp x y
1: 0 1 6
2: 1 5 15
3: 2 9 19
The issue is that if you create the grouping column ad-hoc, then the column from which the ad-hoc grouping column is derived is not available to .SD. Let me illustrate this:
DT <- data.table(x = 1:5, y = 6:10)
DT[, lapply(.SD, sum), by = (grp=x %/% 2)] # ad-hoc creation of grouping column
grp y
1: 0 6
2: 1 15
3: 2 19
I think it'd be nice to have the column available to `.SD` so that we can save creating a temporary column, grouping and then deleting it, as "technically" it *is* a new column (meaning, "x" must still be available). Any take on this?
Arun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130512/137a24a9/attachment.html>
More information about the datatable-help
mailing list