[datatable-help] Bigger table , smaller access time-how is this possible?

Matthew Dowle mdowle at mdowle.plus.com
Mon Nov 28 12:23:46 CET 2011


Hi,
Welcome to the list. Quick first response..

Comparing differences of 4ms of single runs is not usually very robust due
to overhead and cache effects. We usually prefer differences of many
seconds or minutes and even then take the minimum of 3 repeated runs,
using something like packages rbenchmark or microbenchmark.

as.character(as.hexmode()) will install those strings in R's global string
cache. The 2nd time will be faster as all those strings are already
cached.  Whether that explains this case I don't know, seems plausible as
it's only 4ms. That part could be split out, repeated and timed
separately.

Think a simpler example would be possible, too. I missed the reason why
it's in a loop through 0:1 and for 4ms something like that might be making
a tiny difference.

HTH, Matthew

> Dear all,
>
> Please see my reproducible example below. My question is why does the 2nd
> table,which is bigger have a smaller access time ?
>
>> library(xtable)
>> library(data.table)
> data.table 1.7.2  For help type: help("data.table")
>> start.size<-6e+5
>>
>> time.data.table<-list()
>>
>> for (i in 0:1){
> + n<-start.size*10^i
> + n1<-n/5000
> +
> my.data.table<-data.table(index=1:n,seriesname=rep(as.character(as.hexmode(1:n1)),each=5000),value=rnorm(n))
> + setkey(my.data.table,"seriesname")
> +
> time.data.table[[i+1]]<-system.time(my.data.table[J(as.character(as.hexmode(n1/4))),])
> + }
>
>>
>> rbind(time.data.table[[1]],time.data.table[[2]])
>      user.self sys.self elapsed user.child sys.child
> [1,]     0.008        0   0.008          0         0
> [2,]     0.004        0   0.004          0         0
>> time.data.table[[1]]
>    user  system elapsed
>   0.008   0.000   0.008
>> time.data.table[[2]]
>    user  system elapsed
>   0.004   0.000   0.004
>>
>
> Many thanks,
> Ashim
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list