[datatable-help] Bigger table , smaller access time-how is this possible?

Ashim Kapoor ashimkapoor at gmail.com
Thu Dec 1 11:34:13 CET 2011


Dear Matthew,

I did do some research on the internet about,interpreting the statistics
returned by time.

In this discussion [1],I read the following paragraph :-

User+Sys will tell you how much actual CPU time your process used. Note
that this is across all CPUs, so if the process has multiple threads it
could potentially exceed the wall clock time reported by Real. Note that in
the output these figures include the User and Sys time of all child
processes as well, although the underlying system calls return the
statistics for the process and its children separately.



So when I compute user + sys for the following ( from my prev email ) 2
items

> rbind(time.data.table[[1]],
time.data.table[[2]])
     user.self sys.self elapsed user.child sys.child
[1,]     0.008        0   0.005          0         0
[2,]     0.004        0   0.005          0         0

> rbind(time.data.table[[1]],
time.data.table[[2]])
     user.self sys.self elapsed user.child sys.child
[1,]     0.004    0.004   0.005          0         0
[2,]     0.009    0.000   0.005          0         0

I see it "swinging" both ways.


[1]
http://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1

Thank you,
Ashim

On Tue, Nov 29, 2011 at 1:42 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

>
> I don't follow. The elapsed time is 0.005 seconds in all cases. The
> times are extremely small anyway (5ms), it seems to be just noise.
>
> We're used to seeing examples like the one in the examples section of
> help(":=") where 591s is reduced to 1.1s. A 500 times speedup. But, more
> importantly, where the wall clock time (10 minutes) is meaningful, worth
> saving, and (hopefully) the readers understand the saving scales; i.e.,
> 10 minutes saving can easily be hours with larger data.
>
> We can talk on the 5ms scale, too, but you'll need to be much more
> precise and read up on the subject first, please.
>
>
> On Tue, 2011-11-29 at 10:56 +0530, Ashim Kapoor wrote:
> > Dear Matthew,
> >
> > Many thanks for your email.
> >
> > Following your advice I split out the as.character(as.hexmode( )) and
> > ran it many times. The results swing both ways.
> >
> >
> >
> > > library(xtable)
> > > library(data.table)
> > > start.size<-6e+5
> > >
> > > time.data.table<-list()
> > >
> > > for (i in 0:1){
> > + n<-start.size*10^i
> > + n1<-n/5000
> > +
> >
> my.data.table<-data.table(index=1:n,seriesname=rep(as.character(as.hexmode(1:n1)),each=5000),value=rnorm(n))
> > + setkey(my.data.table,"seriesname")
> > + searchitem<-as.character(as.hexmode(n1))
> > + time.data.table[[i+1]]<-system.time(my.data.table[J(searchitem)])
> > + }
> > >
> > > rbind(time.data.table[[1]],time.data.table[[2]])
> >      user.self sys.self elapsed user.child sys.child
> > [1,]     0.008        0   0.005          0         0
> > [2,]     0.008        0   0.005          0         0
> >
> > > rbind(time.data.table[[1]],time.data.table[[2]])
> >      user.self sys.self elapsed user.child sys.child
> > [1,]     0.008        0   0.005          0         0
> > [2,]     0.004        0   0.005          0         0
> >
> > > rbind(time.data.table[[1]],time.data.table[[2]])
> >      user.self sys.self elapsed user.child sys.child
> > [1,]     0.004        0   0.005          0         0
> > [2,]     0.004        0   0.005          0         0
> >
> > > rbind(time.data.table[[1]],time.data.table[[2]])
> >      user.self sys.self elapsed user.child sys.child
> > [1,]     0.008        0   0.005          0         0
> > [2,]     0.008        0   0.005          0         0
> >
> > > rbind(time.data.table[[1]],time.data.table[[2]])
> >      user.self sys.self elapsed user.child sys.child
> > [1,]     0.004    0.004   0.005          0         0
> > [2,]     0.009    0.000   0.005          0         0
> >
> > Thank you,
> > Ashim
> >
> >
> > On Mon, Nov 28, 2011 at 4:53 PM, Matthew Dowle
> > <mdowle at mdowle.plus.com> wrote:
> >
> >         Hi,
> >         Welcome to the list. Quick first response..
> >
> >         Comparing differences of 4ms of single runs is not usually
> >         very robust due
> >         to overhead and cache effects. We usually prefer differences
> >         of many
> >         seconds or minutes and even then take the minimum of 3
> >         repeated runs,
> >         using something like packages rbenchmark or microbenchmark.
> >
> >         as.character(as.hexmode()) will install those strings in R's
> >         global string
> >         cache. The 2nd time will be faster as all those strings are
> >         already
> >         cached.  Whether that explains this case I don't know, seems
> >         plausible as
> >         it's only 4ms. That part could be split out, repeated and
> >         timed
> >         separately.
> >
> >         Think a simpler example would be possible, too. I missed the
> >         reason why
> >         it's in a loop through 0:1 and for 4ms something like that
> >         might be making
> >         a tiny difference.
> >
> >         HTH, Matthew
> >
> >         > Dear all,
> >         >
> >         > Please see my reproducible example below. My question is why
> >         does the 2nd
> >         > table,which is bigger have a smaller access time ?
> >         >
> >         >> library(xtable)
> >         >> library(data.table)
> >         > data.table 1.7.2  For help type: help("data.table")
> >         >> start.size<-6e+5
> >         >>
> >         >> time.data.table<-list()
> >         >>
> >         >> for (i in 0:1){
> >         > + n<-start.size*10^i
> >         > + n1<-n/5000
> >         > +
> >         >
> >
> my.data.table<-data.table(index=1:n,seriesname=rep(as.character(as.hexmode(1:n1)),each=5000),value=rnorm(n))
> >         > + setkey(my.data.table,"seriesname")
> >         > +
> >         > time.data.table[[i
> >
> +1]]<-system.time(my.data.table[J(as.character(as.hexmode(n1/4))),])
> >         > + }
> >         >
> >         >>
> >         >> rbind(time.data.table[[1]],time.data.table[[2]])
> >         >      user.self sys.self elapsed user.child sys.child
> >         > [1,]     0.008        0   0.008          0         0
> >         > [2,]     0.004        0   0.004          0         0
> >         >> time.data.table[[1]]
> >         >    user  system elapsed
> >         >   0.008   0.000   0.008
> >         >> time.data.table[[2]]
> >         >    user  system elapsed
> >         >   0.004   0.000   0.004
> >         >>
> >         >
> >         > Many thanks,
> >         > Ashim
> >
> >         > _______________________________________________
> >         > datatable-help mailing list
> >         > datatable-help at lists.r-forge.r-project.org
> >         >
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20111201/710bf26f/attachment.htm>


More information about the datatable-help mailing list