<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">

<html><body>

<p>On 12.05.2013 12:54, Arunkumar Srinivasan wrote:</p>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>I just realised that I sent it only to MatthewDowle. So, sending it again. Sorry @Matthew for the double email.</div>

<div>

<div>Matthew,</div>

<div>>> .BY is available to j already for that reason, does that work? .BY isn't a column of .SD because i) it's the same value for every row of .SD i.e. .BY[[1]] is length 1 and contains this particular group (replicating the same value would be wasteful)</div>

<div>DT[, print(.BY), by = list(grp = x %/% 2)]</div>

<div>

<div>$grp</div>

<div>[1] 0</div>

<div>$grp</div>

<div>[1] 1</div>

<div>$grp</div>

<div>[1] 2</div>

</div>

<div>

<div>DT[, print(.SD), by = list(grp = x %/% 2)] # no column "x"</div>

<div>

<div>   y</div>

<div>1: 6</div>

<div>   y</div>

<div>1: 7</div>

<div>2: 8</div>

<div>    y</div>

<div>1:  9</div>

<div>2: 10</div>

</div>

<div>My question is not as to why the BY column is not available in .SD. Rather, since .BY does not have column "x" in it (rather the result of x%/% 2), why does .SD not have "x"? It's as if grp = x%/%2 is a "new column". So, "x" should be available to .SD is my point.</div>

</div>

</div>

</blockquote>

<div>Oh I see now.  Yes data.table inspects the expressions used in 'by' and considers any columns used there as grouping columns and excludes those from .SD.  An example is a date column containing daily observations.  DT[, lapply(.SD,sum), by=month(date)] would not wish to sum() the "date" column.</div>

<div>In ?data.table I've just changed :</div>

<div><code>.SD</code><span> is a </span><code>data.table</code><span> containing the </span><strong>S</strong><span>ubset of </span><code>x</code><span>'s </span><strong>D</strong><span>ata for each group, excluding the group column(s).</span></div>

<div>to</div>

<div><code>.SD</code><span> is a </span><code>data.table</code><span> containing the </span><strong>S</strong><span>ubset of </span><code>x</code><span>'s </span><strong>D</strong><span>ata for each group, excluding any columns used in 'by' (or 'keyby').</span></div>

<div>Further answer below ...</div>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>

<div>

<div>>> but more significantly  ii) it is often a character group name where running an aggregation function like sum() would trip up on it.</div>

<div>Again, I don't think so because, I am not asking for .BY columns to be in .SD.</div>

<div>DT[, grp := x%/% = 2]</div>

<div>DT[, lapply(.SD, sum), by=grp]</div>

<div>must be equal to:</div>

<div>DT[, lapply(.SD, sum), by = list(grp = x%/%2)] # here, "x" should be available to .SD as it's not the grouping column</div>

</div>

</div>

</blockquote>

<div>This makes sense in this case because x can be sum()-ed,  but isn't true in general like the month(date) case above.</div>

<div>In these cases you can use .SDcols to include all columns, even the ones used by by :</div>

<div>

<pre>> DT[, lapply(.SD, sum), by=list(grp=x%/%2)]<br />   grp  y<br />1:   0  6<br />2:   1 15<br />3:   2 19<br />> DT[, lapply(.SD, sum), by=list(grp=x%/%2), .SDcols=names(DT)]<br />   grp x  y<br />1:   0 1  6<br />2:   1 5 15<br />3:   2 9 19<br />> DT[, print(.SD), by = list(grp = x %/% 2), .SDcols=names(DT)]</pre>

</div>

<div>

<pre>   x y<br />1: 1 6<br />   x y<br />1: 2 7<br />2: 3 8<br />   x  y<br />1: 4  9<br />2: 5 10</pre>

</div>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>

<div></div>

<div>Arun</div>

</div>

<div></div>

</blockquote>

<p> </p>

<div> </div>

</body></html>