<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html><body>
<p>On 12.05.2013 09:12, Arunkumar Srinivasan wrote:</p>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">
<div>Hi, </div>
<div>Suppose you've a data.table, say:</div>
<div>require(data.table)</div>
<div>DT <- data.table(x = 1:5, y = 6:10)</div>
<div>
<div>Suppose you want to group by "x %/% 2" ( = 0, 1,1, 2,2) and then calculate the sum of each column for each group, then one would do:</div>
<div>DT[, grp := x %/% 2]</div>
<div>DT[, list(x.sum=sum(x), y.sum=sum(y)), by = grp] # avoid .SD in case of few columns</div>
</div>
</blockquote>
<div>I know this isn't the main point (keep scrolling down) but just as an aside :</div>
<div>DT[, lapply(.SD, sum), by = grp, .SDcols=c("x","y")] # intended way to avoid .SD in case of a few columns</div>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">
<div>
<div>Now, assume that you've many many columns which would make the use of `.SD` sensible.</div>
<div>DT[, lapply(.SD, sum), by = grp]</div>
<div>
<div> grp x y</div>
<div>1: 0 1 6</div>
<div>2: 1 5 15</div>
<div>3: 2 9 19</div>
</div>
<div>The issue is that if you create the grouping column ad-hoc, then the column from which the ad-hoc grouping column is derived is not available to .SD. Let me illustrate this:</div>
<div>
<div>DT <- data.table(x = 1:5, y = 6:10)</div>
</div>
<div>DT[, lapply(.SD, sum), by = (grp=x %/% 2)] # ad-hoc creation of grouping column</div>
<div>
<div>
<div> grp y</div>
<div>1: 0 6</div>
<div>2: 1 15</div>
<div>3: 2 19</div>
</div>
</div>
<div>I think it'd be nice to have the column available to `.SD` so that we can save creating a temporary column, grouping and then deleting it, as "technically" it *is* a new column (meaning, "x" must still be available). Any take on this?</div>
</div>
</blockquote>
<div>
<div>.BY is available to j already for that reason, does that work? .BY isn't a column of .SD because i) it's the same value for every row of .SD i.e. .BY[[1]] is length 1 and contains this particular group (replicating the same value would be wasteful) but more significantly ii) it is often a character group name where running an aggregation function like sum() would trip up on it.</div>
</div>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">
<div>
<div>Arun</div>
</div>
</blockquote>
<p> </p>
<div> </div>
</body></html>