[datatable-help] columns in .SD with grouping ad-hoc using "by"
Matthew Dowle
mdowle at mdowle.plus.com
Sun May 12 12:58:08 CEST 2013
On 12.05.2013 09:12, Arunkumar Srinivasan wrote:
> Hi,
> Suppose
you've a data.table, say:
> require(data.table)
> DT <- data.table(x =
1:5, y = 6:10)
>
> Suppose you want to group by "x %/% 2" ( = 0, 1,1,
2,2) and then calculate the sum of each column for each group, then one
would do:
> DT[, grp := x %/% 2]
> DT[, list(x.sum=sum(x),
y.sum=sum(y)), by = grp] # avoid .SD in case of few columns
I know this
isn't the main point (keep scrolling down) but just as an aside :
DT[,
lapply(.SD, sum), by = grp, .SDcols=c("x","y")] # intended way to avoid
.SD in case of a few columns
> Now, assume that you've many many
columns which would make the use of `.SD` sensible.
> DT[, lapply(.SD,
sum), by = grp]
>
> grp x y
> 1: 0 1 6
> 2: 1 5 15
> 3: 2 9 19
>
The issue is that if you create the grouping column ad-hoc, then the
column from which the ad-hoc grouping column is derived is not available
to .SD. Let me illustrate this:
>
> DT <- data.table(x = 1:5, y =
6:10)
> DT[, lapply(.SD, sum), by = (grp=x %/% 2)] # ad-hoc creation of
grouping column
>
> grp y
> 1: 0 6
> 2: 1 15
> 3: 2 19
> I think
it'd be nice to have the column available to `.SD` so that we can save
creating a temporary column, grouping and then deleting it, as
"technically" it *is* a new column (meaning, "x" must still be
available). Any take on this?
.BY is available to j already for that
reason, does that work? .BY isn't a column of .SD because i) it's the
same value for every row of .SD i.e. .BY[[1]] is length 1 and contains
this particular group (replicating the same value would be wasteful) but
more significantly ii) it is often a character group name where running
an aggregation function like sum() would trip up on it.
Arun
&
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130512/fefe39f5/attachment.html>
More information about the datatable-help
mailing list