<font><span style="line-height:normal;background-color:rgba(255,255,255,0)">When you set the key, it sorts the table -- this is part of what allows for the speed. <br>This initial sorting is what is slowing down your benchmarks. <br>
<br>While it makes sense to compare the initial sort time if you are trying to get a 'full' comparison, in most practice applications, you will only be setting the key once. <br><br>Therefore, if you want to see what sort of speed increases you are actually getting, create your DT's first, then benchmark the specific operations of interest. <br>
<br>Also, searching stackoverflow for [r] data.table and benchmarks will produce several useful results <br><br>Cheers<br>Rick<br></span></font><br>On Thursday, March 21, 2013, ekbrown wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello. I'm new to data.table(). I am apparently not setting the keys<br>
correctly to get the increase in speed talked about in the vignettes, as I<br>
get a (much) quicker time *without* keys set. Take a look at the following<br>
benchmarking tests. Any ideas? Thanks. Earl Brown<br>
<br>
> library("data.table")<br>
> library("rbenchmark")<br>
><br>
> # generates random data<br>
> num.files <- 2000<br>
> num.words <- 1000000<br>
> logical.vector <- sample(c(TRUE, FALSE), num.words, replace=T)<br>
> file.names <- rep(1:num.files, length.out=num.words)<br>
><br>
> # defines functions<br>
> benDTNoKey <- function(aa, bb) {<br>
+ dt <- data.table(as.numeric(aa), bb)<br>
+ dt[,sum(V1), by = bb][,V1]<br>
+ }<br>
><br>
> benDTWithKey <- function(aa, bb) {<br>
+ dt <- data.table(as.numeric(aa), bb)<br>
+ setkey(dt)<br>
+ dt[,sum(V1), by = bb][,V1]<br>
+ }<br>
><br>
> benTapply <- function(aa, bb) tapply(aa, bb, sum)<br>
><br>
> # runs benchmarking<br>
> benchmark(benTapply(logical.vector, file.names),<br>
> benDTWithKey(logical.vector, file.names), benDTNoKey(logical.vector,<br>
> file.names), replications = 10, columns = c("test", "replications",<br>
> "elapsed"))<br>
test replications elapsed<br>
3 benDTNoKey(logical.vector, file.names) 10 *0.753*<br>
2 benDTWithKey(logical.vector, file.names) 10 *4.776*<br>
1 benTapply(logical.vector, file.names) 10 6.218<br>
><br>
> # tests for sameness among results<br>
> one <- benTapply(logical.vector, file.names)<br>
> two <- benDTWithKey(logical.vector, file.names)<br>
> three <- benDTNoKey(logical.vector, file.names)<br>
> identical(as.integer(one), as.integer(two))<br>
[1] TRUE<br>
> identical(as.integer(two), as.integer(three))<br>
[1] TRUE<br>
<br>
<br>
<br>
--<br>
View this message in context: <a href="http://r.789695.n4.nabble.com/Quicker-w-o-keys-set-tp4662157.html" target="_blank">http://r.789695.n4.nabble.com/Quicker-w-o-keys-set-tp4662157.html</a><br>
Sent from the datatable-help mailing list archive at Nabble.com.<br>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="javascript:;" onclick="_e(event, 'cvml', 'datatable-help@lists.r-forge.r-project.org')">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
</blockquote><br><br>-- <br><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)"><div style="font-size:13px">Ricardo Saporta</div><div style="font-size:13px">Graduate Student, Data Analytics</div>
<div style="font-size:13px"><span style="font-size:13px">Rutgers University, New Jersey</span></div><div style="font-size:13px"><span style="font-size:13px">e: </span><a href="mailto:saporta@rutgers.edu" style="color:rgb(17,85,204);font-size:13px" target="_blank">saporta@rutgers.edu</a></div>
<div><br></div></div><br>