<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><p>Hi everybody,</p>
<p><a href="https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2722&group_id=240&atid=978">FR #2722</a> is now implemented and committed recently. It'd be great if people who're used to using devel versions could test it out and let us know if things are alright. </p>
<p><strong>Here's an explanation of what the FR is and what's being optimised:</strong> <br>
Assuming a <code>data.table</code> with 4 columns <code>x,y,z,grp</code>, something like: </p>
<pre><code>DT[, c(sum(y), lapply(.SD, sum), .N .I, lapply(.SD, mean)), by=grp]
</code></pre>
<p>will usually be quite slow because of using <a href="http://stackoverflow.com/questions/20459519/apply-function-on-a-subset-of-columns-sdcols-whilst-applying-a-different-func/20460441#comment31613362_20460441"><code>eval</code> with <code>lapply</code></a>. This will now be optimised to:</p>
<pre><code>DT[, list(sum(y), sum(x), sum(y), sum(z), .N, .I, mean(x), mean(y), mean(z)), by=grp]
</code></pre>
<p>However, we don't optimise if <code>.SD</code> is present in <code>j</code> in the form <code>c(.)</code> in any other form other than <code>lapply(.SD, fun)</code>, because there are quite a few possibilities with <code>.SD</code>:</p>
<pre><code>DT[, c(.SD, .SD[1], .SD+a, .SD[x>1], .SD[J(.), .SD[.(.)], lapply(.SD, sum)), by=grp]
</code></pre>
<p>Also, consider the case <code>.SD[sample(.N, 1)]</code> - this can't be optimised to <code>list(x=x[sample(.)], y=y[sample(.)], z=y[sample(.)]</code> obviously. So, the expression inside <code>.SD</code> has to be evaluated first, checked for type - <code>logical, numeric, integer, data.table</code>? and then must be optimised accordingly. </p>
<p>Therefore, this'll be postponed, if at all possible in a clear way. However, we've not come across such a case here on the mailing list or on SO yet. I'm therefore assuming it's a very rare case, which is good. </p>
<p><em>Summary:</em> The most common cases should therefore be very fast. Here's a benchmark comparing the timings with and without optimisation:</p>
<pre><code>require(data.table)
set.seed(1L)
dt <- data.table(x=rep(1:1e6, each=10), y=sample(10), z=sample(2))
options(datatable.verbose=TRUE) # not pasting verbose messages here.
# without optimisation
options(datatable.optimize=0L)
system.time(ans1 <- dt[, c(bla = sum(y), lapply(.SD, mean)), by=x])
# user system elapsed
# 90.705 5.184 121.274
# with optimisation
options(datatable.optimize=Inf)
system.time(ans2 <- dt[, c(bla = sum(y), lapply(.SD, mean)), by=x])
# user system elapsed
# 0.450 0.128 0.690
</code></pre>
<p>Note that the case <code>DT[, c(sum(y), lapply(.SD, sum)), by=grp, .SDcols=..]</code> is still not implemented - <a href="https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5222&group_id=240&atid=975">FR #5222</a>. So the optimisation will also result in <code>object not found</code>. When this FR is taken care of, the optimisation will also work automatically.</p>
<p><style>body{font-family:Helvetica,Arial;font-size:13px}</style><style>body {
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
padding:1em;
margin:auto;
background:#fefefe;
}
h1, h2, h3, h4, h5, h6 {
font-weight: bold;
}
h1 {
color: #000000;
font-size: 28pt;
}
h2 {
border-bottom: 1px solid #CCCCCC;
color: #000000;
font-size: 24px;
}
h3 {
font-size: 18px;
}
h4 {
font-size: 16px;
}
h5 {
font-size: 14px;
}
h6 {
color: #777777;
background-color: inherit;
font-size: 14px;
}
hr {
height: 0.2em;
border: 0;
color: #CCCCCC;
background-color: #CCCCCC;
}
p, blockquote, ul, ol, dl, li, table, pre {
margin: 15px 0;
}
a, a:visited {
color: #4183C4;
background-color: inherit;
text-decoration: none;
}
#message {
border-radius: 6px;
border: 1px solid #ccc;
display:block;
width:100%;
height:60px;
margin:6px 0px;
}
button, #ws {
font-size: 12 pt;
padding: 4px 6px;
border-radius: 5px;
border: 1px solid #bbb;
background-color: #eee;
}
code, pre, #ws, #message {
font-family: Monaco;
font-size: 10pt;
border-radius: 3px;
background-color: #F8F8F8;
color: inherit;
}
code {
border: 1px solid #EAEAEA;
margin: 0 2px;
padding: 0 5px;
}
pre {
border: 1px solid #CCCCCC;
overflow: auto;
padding: 4px 8px;
}
pre > code {
border: 0;
margin: 0;
padding: 0;
}
#ws { background-color: #f8f8f8; }
.send { color:#77bb77; }
.server { color:#7799bb; }
.error { color:#AA0000; }</style></p><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><br><div id="bloop_sign_1395193339468618752" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px">Arun</div></div><p></p></body></html>