<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html><body>
<p> </p>
<p>And this nice answer by Michael might be of interest too :</p>
<p>http://stackoverflow.com/a/13694673/403310</p>
<p> </p>
<p>On 22.03.2013 11:05, Matthew Dowle wrote:</p>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%"><!-- html ignored --><!-- head ignored --><!-- meta ignored -->
<p> </p>
<p>Whilst what Rick and Michael said is very true, I suspect that you've found that setting a key on a *numeric* type column is much slower than setkey on an *integer* column. There was an awful (but correct) benchmark on S.O. recently and that's what I replied, but I can't find it now. All I can think is that the OP deleted the question, which would be a shame. If that OP is watching, and that is what happened, please can they undelete it.</p>
<p>Also you have a setkey(DT) there, with no columns specified. In that case, it will key all the columns; think key only table. But if you have numeric value columns in there as well, or any non-key columns at all, then that will be wasteful.</p>
<p>Anyway, in the code you posted, try changing </p>
<pre> as.numeric(aa)</pre>
<pre> </pre>
<pre>to </pre>
<pre> </pre>
<pre> as.integer(aa)</pre>
<pre> </pre>
<pre>and you should see setkey run dramatically faster. Then what Rick and Michael said applies from there.</pre>
<pre> </pre>
<pre>Matthew</pre>
<pre> </pre>
<p>On 22.03.2013 04:31, Ricardo Saporta wrote:</p>
<blockquote style="padding-left: 5px; border-left: #1010ff 2px solid; margin-left: 5px; width: 100%;"><span><span style="line-height: normal; background-color: rgba;">When you set the key, it sorts the table -- this is part of what allows for the speed. <br />This initial sorting is what is slowing down your benchmarks. <br /><br />While it makes sense to compare the initial sort time if you are trying to get a 'full' comparison, in most practice applications, you will only be setting the key once. <br /><br />Therefore, if you want to see what sort of speed increases you are actually getting, create your DT's first, then benchmark the specific operations of interest. <br /><br />Also, searching stackoverflow for [r] data.table and benchmarks will produce several useful results <br /><br />Cheers<br />Rick<br /></span></span><br />On Thursday, March 21, 2013, ekbrown wrote:<br />
<blockquote class="gmail_quote" style="margin: 0 0 0 .8ex; border-left: 1px #ccc solid; padding-left: 1ex;">Hello. I'm new to data.table(). I am apparently not setting the keys<br /> correctly to get the increase in speed talked about in the vignettes, as I<br /> get a (much) quicker time *without* keys set. Take a look at the following<br /> benchmarking tests. Any ideas? Thanks. Earl Brown<br /><br /> > library("data.table")<br /> > library("rbenchmark")<br /> ><br /> > # generates random data<br /> > num.files > num.words > logical.vector > file.names ><br /> > # defines functions<br /> > benDTNoKey + dt + dt[,sum(V1), by = bb][,V1]<br /> + }<br /> ><br /> > benDTWithKey + dt + setkey(dt)<br /> + dt[,sum(V1), by = bb][,V1]<br /> + }<br /> ><br /> > benTapply ><br /> > # runs benchmarking<br /> > benchmark(benTapply(logical.vector, file.names),<br /> > benDTWithKey(logical.vector, file.names), benDTNoKey(logical.vector,<br /> > file.names), replications = 10, columns = c("test", "replications",<br /> > "elapsed"))<br /> test replications elapsed<br /> 3 benDTNoKey(logical.vector, file.names) 10 *0.753*<br /> 2 benDTWithKey(logical.vector, file.names) 10 *4.776*<br /> 1 benTapply(logical.vector, file.names) 10 6.218<br /> ><br /> > # tests for sameness among results<br /> > one > two > three > identical(as.integer(one), as.integer(two))<br /> [1] TRUE<br /> > identical(as.integer(two), as.integer(three))<br /> [1] TRUE<br /><br /><br /><br /> --<br /> View this message in context: <a href="http://r.789695.n4.nabble.com/Quicker-w-o-keys-set-tp4662157.html">http://r.789695.n4.nabble.com/Quicker-w-o-keys-set-tp4662157.html</a><br /> Sent from the datatable-help mailing list archive at Nabble.com.<br /> _______________________________________________<br /> datatable-help mailing list<br /><a>datatable-help@lists.r-forge.r-project.org</a><br /><a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></blockquote>
<br /><br />-- <br />
<div style="color: #222222; font-family: arial,sans-serif; font-size: 13px; background-color: #ffffff;">
<div style="font-size: 13px;">Ricardo Saporta</div>
<div style="font-size: 13px;">Graduate Student, Data Analytics</div>
<div style="font-size: 13px;"><span style="font-size: 13px;">Rutgers University, New Jersey</span></div>
<div style="font-size: 13px;"><span style="font-size: 13px;">e: </span><a style="color: #1155cc; font-size: 13px;" href="mailto:saporta@rutgers.edu">saporta@rutgers.edu</a></div>
</div>
</blockquote>
<p> </p>
<div> </div>
</blockquote>
<p> </p>
<div> </div>
</body></html>