[datatable-help] Quicker w/o keys set
ekbrown
ekbrown at ksu.edu
Fri Mar 22 03:39:43 CET 2013
Hello. I'm new to data.table(). I am apparently not setting the keys
correctly to get the increase in speed talked about in the vignettes, as I
get a (much) quicker time *without* keys set. Take a look at the following
benchmarking tests. Any ideas? Thanks. Earl Brown
> library("data.table")
> library("rbenchmark")
>
> # generates random data
> num.files <- 2000
> num.words <- 1000000
> logical.vector <- sample(c(TRUE, FALSE), num.words, replace=T)
> file.names <- rep(1:num.files, length.out=num.words)
>
> # defines functions
> benDTNoKey <- function(aa, bb) {
+ dt <- data.table(as.numeric(aa), bb)
+ dt[,sum(V1), by = bb][,V1]
+ }
>
> benDTWithKey <- function(aa, bb) {
+ dt <- data.table(as.numeric(aa), bb)
+ setkey(dt)
+ dt[,sum(V1), by = bb][,V1]
+ }
>
> benTapply <- function(aa, bb) tapply(aa, bb, sum)
>
> # runs benchmarking
> benchmark(benTapply(logical.vector, file.names),
> benDTWithKey(logical.vector, file.names), benDTNoKey(logical.vector,
> file.names), replications = 10, columns = c("test", "replications",
> "elapsed"))
test replications elapsed
3 benDTNoKey(logical.vector, file.names) 10 *0.753*
2 benDTWithKey(logical.vector, file.names) 10 *4.776*
1 benTapply(logical.vector, file.names) 10 6.218
>
> # tests for sameness among results
> one <- benTapply(logical.vector, file.names)
> two <- benDTWithKey(logical.vector, file.names)
> three <- benDTNoKey(logical.vector, file.names)
> identical(as.integer(one), as.integer(two))
[1] TRUE
> identical(as.integer(two), as.integer(three))
[1] TRUE
--
View this message in context: http://r.789695.n4.nabble.com/Quicker-w-o-keys-set-tp4662157.html
Sent from the datatable-help mailing list archive at Nabble.com.
More information about the datatable-help
mailing list