[datatable-help] Bigger table , smaller access time-how is this possible?

Ashim Kapoor ashimkapoor at gmail.com
Tue Nov 29 06:26:48 CET 2011


Dear Matthew,

Many thanks for your email.

Following your advice I split out the as.character(as.hexmode( )) and ran
it many times. The results swing both ways.



> library(xtable)
> library(data.table)
> start.size<-6e+5
>
> time.data.table<-list()
>
> for (i in 0:1){
+ n<-start.size*10^i
+ n1<-n/5000
+
my.data.table<-data.table(index=1:n,seriesname=rep(as.character(as.hexmode(1:n1)),each=5000),value=rnorm(n))
+ setkey(my.data.table,"seriesname")
+ searchitem<-as.character(as.hexmode(n1))
+ time.data.table[[i+1]]<-system.time(my.data.table[J(searchitem)])
+ }
>
> rbind(time.data.table[[1]],time.data.table[[2]])
     user.self sys.self elapsed user.child sys.child
[1,]     0.008        0   0.005          0         0
[2,]     0.008        0   0.005          0         0

> rbind(time.data.table[[1]],time.data.table[[2]])
     user.self sys.self elapsed user.child sys.child
[1,]     0.008        0   0.005          0         0
[2,]     0.004        0   0.005          0         0

> rbind(time.data.table[[1]],time.data.table[[2]])
     user.self sys.self elapsed user.child sys.child
[1,]     0.004        0   0.005          0         0
[2,]     0.004        0   0.005          0         0

> rbind(time.data.table[[1]],time.data.table[[2]])
     user.self sys.self elapsed user.child sys.child
[1,]     0.008        0   0.005          0         0
[2,]     0.008        0   0.005          0         0

> rbind(time.data.table[[1]],time.data.table[[2]])
     user.self sys.self elapsed user.child sys.child
[1,]     0.004    0.004   0.005          0         0
[2,]     0.009    0.000   0.005          0         0

Thank you,
Ashim


On Mon, Nov 28, 2011 at 4:53 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

>
> Hi,
> Welcome to the list. Quick first response..
>
> Comparing differences of 4ms of single runs is not usually very robust due
> to overhead and cache effects. We usually prefer differences of many
> seconds or minutes and even then take the minimum of 3 repeated runs,
> using something like packages rbenchmark or microbenchmark.
>
> as.character(as.hexmode()) will install those strings in R's global string
> cache. The 2nd time will be faster as all those strings are already
> cached.  Whether that explains this case I don't know, seems plausible as
> it's only 4ms. That part could be split out, repeated and timed
> separately.
>
> Think a simpler example would be possible, too. I missed the reason why
> it's in a loop through 0:1 and for 4ms something like that might be making
> a tiny difference.
>
> HTH, Matthew
>
> > Dear all,
> >
> > Please see my reproducible example below. My question is why does the 2nd
> > table,which is bigger have a smaller access time ?
> >
> >> library(xtable)
> >> library(data.table)
> > data.table 1.7.2  For help type: help("data.table")
> >> start.size<-6e+5
> >>
> >> time.data.table<-list()
> >>
> >> for (i in 0:1){
> > + n<-start.size*10^i
> > + n1<-n/5000
> > +
> >
> my.data.table<-data.table(index=1:n,seriesname=rep(as.character(as.hexmode(1:n1)),each=5000),value=rnorm(n))
> > + setkey(my.data.table,"seriesname")
> > +
> >
> time.data.table[[i+1]]<-system.time(my.data.table[J(as.character(as.hexmode(n1/4))),])
> > + }
> >
> >>
> >> rbind(time.data.table[[1]],time.data.table[[2]])
> >      user.self sys.self elapsed user.child sys.child
> > [1,]     0.008        0   0.008          0         0
> > [2,]     0.004        0   0.004          0         0
> >> time.data.table[[1]]
> >    user  system elapsed
> >   0.008   0.000   0.008
> >> time.data.table[[2]]
> >    user  system elapsed
> >   0.004   0.000   0.004
> >>
> >
> > Many thanks,
> > Ashim
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20111129/4c768930/attachment.htm>


More information about the datatable-help mailing list