[datatable-help] Quicker w/o keys set

Ricardo Saporta saporta at scarletmail.rutgers.edu
Fri Mar 22 05:31:01 CET 2013


When you set the key, it sorts the table -- this is part of what allows for
the speed.
This initial sorting is what is slowing down your benchmarks.

While it makes sense to compare the initial sort time if you are trying to
get a 'full' comparison, in most practice applications, you will only be
setting the key once.

Therefore, if you want to see what sort of speed increases you are actually
getting, create your DT's first, then benchmark the specific operations of
interest.

Also, searching stackoverflow for [r] data.table and benchmarks will
produce several useful results

Cheers
Rick

On Thursday, March 21, 2013, ekbrown wrote:

> Hello. I'm new to data.table(). I am apparently not setting the keys
> correctly to get the increase in speed talked about in the vignettes, as I
> get a (much) quicker time *without* keys set. Take a look at the following
> benchmarking tests. Any ideas? Thanks. Earl Brown
>
> > library("data.table")
> > library("rbenchmark")
> >
> > # generates random data
> > num.files <- 2000
> > num.words <- 1000000
> > logical.vector <- sample(c(TRUE, FALSE), num.words, replace=T)
> > file.names <- rep(1:num.files, length.out=num.words)
> >
> > # defines functions
> > benDTNoKey <- function(aa, bb) {
> +       dt <- data.table(as.numeric(aa), bb)
> +       dt[,sum(V1), by = bb][,V1]
> + }
> >
> > benDTWithKey <- function(aa, bb) {
> +       dt <- data.table(as.numeric(aa), bb)
> +       setkey(dt)
> +       dt[,sum(V1), by = bb][,V1]
> + }
> >
> > benTapply <- function(aa, bb) tapply(aa, bb, sum)
> >
> > # runs benchmarking
> > benchmark(benTapply(logical.vector, file.names),
> > benDTWithKey(logical.vector, file.names), benDTNoKey(logical.vector,
> > file.names), replications = 10, columns = c("test", "replications",
> > "elapsed"))
>                                       test replications elapsed
> 3   benDTNoKey(logical.vector, file.names)           10   *0.753*
> 2 benDTWithKey(logical.vector, file.names)           10   *4.776*
> 1    benTapply(logical.vector, file.names)           10   6.218
> >
> > # tests for sameness among results
> > one <- benTapply(logical.vector, file.names)
> > two <- benDTWithKey(logical.vector, file.names)
> > three <- benDTNoKey(logical.vector, file.names)
> > identical(as.integer(one), as.integer(two))
> [1] TRUE
> > identical(as.integer(two), as.integer(three))
> [1] TRUE
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Quicker-w-o-keys-set-tp4662157.html
> Sent from the datatable-help mailing list archive at Nabble.com.
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org <javascript:;>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


-- 
Ricardo Saporta
Graduate Student, Data Analytics
Rutgers University, New Jersey
e: saporta at rutgers.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130322/0f925f09/attachment.html>


More information about the datatable-help mailing list