[datatable-help] Quicker w/o keys set

ekbrown ekbrown at ksu.edu
Fri Mar 22 03:39:43 CET 2013


Hello. I'm new to data.table(). I am apparently not setting the keys
correctly to get the increase in speed talked about in the vignettes, as I
get a (much) quicker time *without* keys set. Take a look at the following
benchmarking tests. Any ideas? Thanks. Earl Brown

> library("data.table")
> library("rbenchmark")
> 
> # generates random data
> num.files <- 2000
> num.words <- 1000000
> logical.vector <- sample(c(TRUE, FALSE), num.words, replace=T)
> file.names <- rep(1:num.files, length.out=num.words)
> 
> # defines functions
> benDTNoKey <- function(aa, bb) {
+ 	dt <- data.table(as.numeric(aa), bb)
+ 	dt[,sum(V1), by = bb][,V1]
+ }
> 
> benDTWithKey <- function(aa, bb) {
+ 	dt <- data.table(as.numeric(aa), bb)
+ 	setkey(dt)
+ 	dt[,sum(V1), by = bb][,V1]
+ }
> 
> benTapply <- function(aa, bb) tapply(aa, bb, sum)
> 
> # runs benchmarking
> benchmark(benTapply(logical.vector, file.names),
> benDTWithKey(logical.vector, file.names), benDTNoKey(logical.vector,
> file.names), replications = 10, columns = c("test", "replications",
> "elapsed"))
                                      test replications elapsed
3   benDTNoKey(logical.vector, file.names)           10   *0.753*
2 benDTWithKey(logical.vector, file.names)           10   *4.776*
1    benTapply(logical.vector, file.names)           10   6.218
> 
> # tests for sameness among results
> one <- benTapply(logical.vector, file.names)
> two <- benDTWithKey(logical.vector, file.names)
> three <- benDTNoKey(logical.vector, file.names)
> identical(as.integer(one), as.integer(two))
[1] TRUE
> identical(as.integer(two), as.integer(three))
[1] TRUE



--
View this message in context: http://r.789695.n4.nabble.com/Quicker-w-o-keys-set-tp4662157.html
Sent from the datatable-help mailing list archive at Nabble.com.


More information about the datatable-help mailing list