<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><p>Your example doesn’t work without <code>allow.cartesian=TRUE</code>.</p>
<p>You <em>shouldn’t</em> be using <code>by=.EACHI</code> here. This <code>by</code> was what was implicit in the earlier versions which made it slow. Please re-read the README.</p>
<p>Here’s the function I tested on 1.9.3:</p>
<pre><code>calc1 <- function(d) {
d[ hit==1][ d,list(hittime=time),roll=-20, allow.cartesian=TRUE][ !is.na(hittime)]
}
calc2 <- function(d) {
temp <- d[ hit==1][ d,list(time),roll=-20, allow.cartesian=TRUE]
setnames(temp,1,"hittime")
temp[!is.na(hittime)]
}
# Generate sample data
set.seed(12312391)
data <- data.table(
group = sample(1e3,1e7,replace=T),
time = ceiling(runif(1e7, 0, 1e5)),
hit = rbinom(1e7, 1, p = 0.1),
key=c("group","time"))
system.time(ans1 <- calc1(data))
# user system elapsed
# 2.083 0.189 2.344
system.time(ans2 <- calc2(data))
# user system elapsed
# 2.012 0.241 2.426
identical(ans1, ans2) # [1] TRUE
</code></pre>
<pre><code>You write:
I also don't see any way to refer to the different time vs. hittime without renaming the second time column.
</code></pre>
<p>I don’t quite follow what this means, but IIUC I think this is what you’re referring to: https://github.com/Rdatatable/data.table/issues/471</p>
<pre><code>You write:
You mention some FR's, but they're hard to find without the specific numbers.
</code></pre>
<p>I was mentioning the first two points under <strong>NEW FEATURES</strong> within <code>Changes in v1.9.3</code>. The one that starts with <code>by=.EACHI runs j for each group in x that each row of i joins to.</code> and the one that starts with <code>Accordingly, X[Y, j] now does what X[Y][, j] did.</code></p>
<p>Maybe we should start numbering the fixes for easy reference. Will note it down.</p>
<pre><code>You write: Where can I find the 1.9.3 reference manual?
</code></pre>
<p>This version is a development version. Necesary changes will be reflected in their corresponding <code>?...</code> entry. And when we find some time, the introduction and FAQs will be updated. But that’s not yet. </p>
<p>If you don’t wish to keep up-to-date by looking at the NEWS, you’ll have to wait until the next stable release on CRAN.</p>
<pre><code>You write: On my system (MacOSX), build_vignettes=TRUE gives an error in texi2dvi -- would that have generated the refman? If so, how do I fix that?
</code></pre>
<p>I’m guessing it’s a PDF latex error. If so, you’ll have to install what the error message says is missing on your system. Sorry, can’t help you much there.</p>
<p><style>body{font-family:Helvetica,Arial;font-size:13px}</style><style>body {
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
padding:1em;
margin:auto;
background:#fefefe;
}
h1, h2, h3, h4, h5, h6 {
font-weight: bold;
}
h1 {
color: #000000;
font-size: 28pt;
}
h2 {
border-bottom: 1px solid #CCCCCC;
color: #000000;
font-size: 24px;
}
h3 {
font-size: 18px;
}
h4 {
font-size: 16px;
}
h5 {
font-size: 14px;
}
h6 {
color: #777777;
background-color: inherit;
font-size: 14px;
}
hr {
height: 0.2em;
border: 0;
color: #CCCCCC;
background-color: #CCCCCC;
}
p, blockquote, ul, ol, dl, li, table, pre {
margin: 15px 0;
}
a, a:visited {
color: #4183C4;
background-color: inherit;
text-decoration: none;
}
#message {
border-radius: 6px;
border: 1px solid #ccc;
display:block;
width:100%;
height:60px;
margin:6px 0px;
}
button, #ws {
font-size: 12 pt;
padding: 4px 6px;
border-radius: 5px;
border: 1px solid #bbb;
background-color: #eee;
}
code, pre, #ws, #message {
font-family: Monaco;
font-size: 10pt;
border-radius: 3px;
background-color: #F8F8F8;
color: inherit;
}
code {
border: 1px solid #EAEAEA;
margin: 0 2px;
padding: 0 5px;
}
pre {
border: 1px solid #CCCCCC;
overflow: auto;
padding: 4px 8px;
}
pre > code {
border: 0;
margin: 0;
padding: 0;
}
#ws { background-color: #f8f8f8; }
table {
border-collapse: collapse;
font-family: Helvetica, arial, freesans, clean, sans-serif;
color: rgb(51, 51, 51);
font-size: 15px; line-height: 25px;
padding: 0; }
table tr {
border-top: 1px solid #cccccc;
background-color: white;
margin: 0;
padding: 0; }
table tr:nth-child(2n) {
background-color: #f8f8f8; }
table tr th {
font-weight: bold;
border: 1px solid #cccccc;
margin: 0;
padding: 6px 13px; }
table tr td {
border: 1px solid #cccccc;
margin: 0;
padding: 6px 13px; }
table tr th :first-child, table tr td :first-child {
margin-top: 0; }
table tr th :last-child, table tr td :last-child {
margin-bottom: 0; }
.send { color:#77bb77; }
.server { color:#7799bb; }
.error { color:#AA0000; }</style></p><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div> <div id="bloop_sign_1404163299649908992" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px">Arun</div></div> <div style="color:black"><br>From: <span style="color:black">Stavros Macrakis (Σταῦρος Μακράκης)</span> <a href="mailto:macrakis@alum.mit.edu">macrakis@alum.mit.edu</a><br>Reply: <span style="color:black">Stavros Macrakis (Σταῦρος Μακράκης)</span> <a href="mailto:macrakis@alum.mit.edu">macrakis@alum.mit.edu</a><br>Date: <span style="color:black">June 30, 2014 at 10:40:24 PM</span><br>To: <span style="color:black">Arunkumar Srinivasan</span> <a href="mailto:aragorn168b@gmail.com">aragorn168b@gmail.com</a><br>Cc: <span style="color:black">datatable-help@r-forge.wu-wien.ac.at</span> <a href="mailto:datatable-help@r-forge.wu-wien.ac.at">datatable-help@r-forge.wu-wien.ac.at</a><br>Subject: <span style="color:black"> Re: [datatable-help] Speeding up column references with roll <br></span></div><br> <blockquote type="cite" class="clean_bq"><span><div><div></div><div>
<title></title>
<div dir="ltr">
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">OK, I'm
retesting in 1.9.3, adding by=.EACHI. I don't see any significant
difference in the timings -- setnames is still 25% faster than
list(hittime=time). What exactly was fixed?</div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">
<br></div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">I also
don't see any way to refer to the different time vs. hittime
without renaming the second time column.</div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">
<br></div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">You
mention some FR's, but they're hard to find without the specific
numbers.</div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">
<br></div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">Where can
I find the 1.9.3 reference manual? I think it would be easier to
understand for me than the incremental changes in the New Features
listings. On my system (MacOSX), build_vignettes=TRUE gives an
error in texi2dvi -- would that have generated the refman? If so,
how do I fix that?</div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">
<br></div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">
Thanks,</div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">
<br></div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:#330000">
-s</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Jun 30, 2014 at 1:00 PM, Arunkumar
Srinivasan <span dir="ltr"><<a href="mailto:aragorn168b@gmail.com" target="_blank">aragorn168b@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">
<div style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">
Once again, has been fixed in 1.9.3. Now join requires `by=.EACHI`
(explicit) to perform a by-without-by.</div>
<div style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">
<a href="https://github.com/Rdatatable/data.table/blob/master/README.md" target="_blank">https://github.com/Rdatatable/data.table/blob/master/README.md</a></div>
<div style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">
Have a look at the first FR (by = .EACHI runs ...) that's been
fixed in 1.9.3 - there's some changes in the way join results in
due to these changes (which've been discussed since and for quite
sometime) to bring more consistency to the DT[i, j, by] syntax.
Also have a look at the second FR and the links it points to for
the discussions.</div>
<div style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">
<br></div>
<div style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">
In general, it's better to test with the devel version (and have a
look at README) for any bugs you may encounter.</div>
<div style="font-family:Helvetica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto">
<br></div>
<div>
<div style="font-family:helvetica,arial;font-size:13px">Arun</div>
</div>
<div style="color:black"><br>
From: <span style="color:black">Stavros Macrakis (Σταῦρος
Μακράκης)</span> <a href="mailto:macrakis@alum.mit.edu" target="_blank">macrakis@alum.mit.edu</a><br>
Reply: <span style="color:black">Stavros Macrakis (Σταῦρος
Μακράκης)</span> <a href="mailto:macrakis@alum.mit.edu" target="_blank">macrakis@alum.mit.edu</a><br>
Date: <span style="color:black">June 30, 2014 at 5:38:10
PM</span><br>
To: <span style="color:black"><a href="mailto:datatable-help@r-forge.wu-wien.ac.at" target="_blank">datatable-help@r-forge.wu-wien.ac.at</a></span> <a href="mailto:datatable-help@r-forge.wu-wien.ac.at" target="_blank">datatable-help@r-forge.wu-wien.ac.at</a><br>
Subject: <span style="color:black">[datatable-help] Speeding
up column references with roll<br></span></div>
<br>
<blockquote type="cite">
<div>
<div>
<div>
<div class="h5">
<div dir="ltr">
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:rgb(51,0,0)">
<span>In the following example, it is about 15-25% faster to use
setnames rather than j=list(name=var). Is there some better
approach to referencing the other joined column when using
roll?</span></div>
<div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:rgb(51,0,0)">
<span><br></span></div>
<div class="gmail_default">
<div class="gmail_default"><span><span style="color:rgb(51,0,0);font-family:'courier new',monospace"># Use
j=list(name=var)</span><br></span></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace">calc1 <- function(d) {</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"> d[ hit==1</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"> ][
d,list(hittime=time),roll=-20</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"> ][ !<a href="http://is.na" target="_blank">is.na</a>(hittime)</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"> ]</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace">}</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"><br></font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"># Use setnames</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace">calc2 <- function(d) {</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"> temp <- d[ hit==1</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace">
][ d,time,roll=-20</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace">
]</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace">
setnames(temp,3,"hittime")</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"> temp[!<a href="http://is.na" target="_blank">is.na</a>(hittime)]</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace">}</font></div>
<div class="gmail_default" style="color:rgb(51,0,0);font-family:georgia,serif;font-size:small">
<br></div>
</div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"># Generate sample data</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace">set.seed(12312391)</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace">data <- data.table(</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"> group =
sample(1e3,1e7,replace=T),</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"> time =
ceiling(runif(1e7, 0, 1e5)),</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"> hit =
rbinom(1e7, 1, p = 0.1),</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"> key=c("group","time"))</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"><br></font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"># Timing</font></div>
<div class="gmail_default"><font color="#330000" face="courier new, monospace"><br></font></div>
<div class="gmail_default"><span style="color:rgb(51,0,0);font-family:'courier new',monospace">system.time(replicate(10,{gc();calc1(data)}))
=> 69 sec system.time(replicate(10,{gc();calc2(data)})) => 52
sec</span><br></div>
</div>
</div>
</div>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br></div>
</div></div></span></blockquote><p></p></body></html>