[datatable-help] Quicker w/o keys set

Michael Nelson michael.nelson at sydney.edu.au
Fri Mar 22 03:43:11 CET 2013


Don't include the key setting within the benchmark.



________________________________________
From: datatable-help-bounces at lists.r-forge.r-project.org [datatable-help-bounces at lists.r-forge.r-project.org] on behalf of ekbrown [ekbrown at ksu.edu]
Sent: Friday, 22 March 2013 1:39 PM
To: datatable-help at lists.r-forge.r-project.org
Subject: [datatable-help] Quicker w/o keys set

Hello. I'm new to data.table(). I am apparently not setting the keys
correctly to get the increase in speed talked about in the vignettes, as I
get a (much) quicker time *without* keys set. Take a look at the following
benchmarking tests. Any ideas? Thanks. Earl Brown

> library("data.table")
> library("rbenchmark")
>
> # generates random data
> num.files <- 2000
> num.words <- 1000000
> logical.vector <- sample(c(TRUE, FALSE), num.words, replace=T)
> file.names <- rep(1:num.files, length.out=num.words)
>
> # defines functions
> benDTNoKey <- function(aa, bb) {
+       dt <- data.table(as.numeric(aa), bb)
+       dt[,sum(V1), by = bb][,V1]
+ }
>
> benDTWithKey <- function(aa, bb) {
+       dt <- data.table(as.numeric(aa), bb)
+       setkey(dt)
+       dt[,sum(V1), by = bb][,V1]
+ }
>
> benTapply <- function(aa, bb) tapply(aa, bb, sum)
>
> # runs benchmarking
> benchmark(benTapply(logical.vector, file.names),
> benDTWithKey(logical.vector, file.names), benDTNoKey(logical.vector,
> file.names), replications = 10, columns = c("test", "replications",
> "elapsed"))
                                      test replications elapsed
3   benDTNoKey(logical.vector, file.names)           10   *0.753*
2 benDTWithKey(logical.vector, file.names)           10   *4.776*
1    benTapply(logical.vector, file.names)           10   6.218
>
> # tests for sameness among results
> one <- benTapply(logical.vector, file.names)
> two <- benDTWithKey(logical.vector, file.names)
> three <- benDTNoKey(logical.vector, file.names)
> identical(as.integer(one), as.integer(two))
[1] TRUE
> identical(as.integer(two), as.integer(three))
[1] TRUE



--
View this message in context: http://r.789695.n4.nabble.com/Quicker-w-o-keys-set-tp4662157.html
Sent from the datatable-help mailing list archive at Nabble.com.
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


More information about the datatable-help mailing list