[datatable-help] Slow execution: Extracting last value in each group

Arunkumar Srinivasan aragorn168b at gmail.com
Fri Aug 16 15:47:07 CEST 2013


Frank, 
Great, thank you. So, basically it's the call to "C" that's taking the time.. Probably version of C? I still have trouble using gdb with R. Can't help much to debug there. Hopefully someone else could lend a hand.

Arun


On Friday, August 16, 2013 at 3:43 PM, Frank Erickson wrote:

> Hi Arun,
> 
> Yup, windows (see below).
> 
> I tried debugonce, but didn't really know what I was looking for. Every step was instantaneous except this one:
> 
> debug: ans = .Call(Cdogroups, x, xcols, groups, grpcols, jiscols, grporder, 
>     o__, f__, len__, jsub, SDenv, cols, newnames, verbose)
> 
> 
> --Frank
> 
> sessionInfo() 
> 
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
> [5] LC_TIME=English_United States.1252    
> 
> attached base packages: 
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages: 
> [1] rbenchmark_1.0.0 data.table_1.8.8
> 
> loaded via a namespace (and not attached): 
> [1] tools_3.0.1
> 
> 
> 
> 
> On Fri, Aug 16, 2013 at 5:37 AM, Arunkumar Srinivasan <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > Frank, 
> > Is it a windows machine as well?
> > And could you try to use `debugonce` to find out the line(s) where it's slow?
> > 
> > Arun
> > 
> > 
> > On Friday, August 16, 2013 at 12:34 PM, Frank Erickson wrote:
> > 
> > > I get similar timings to arun, with the data.table call being a lot slower than the other timings. If data.table is not optimized for that .SD expression, perhaps that is okay because, as Arun pointed out, there are alternatives.. I can't guess why it would perform differently on different hardware, though...
> > > 
> > > # alternatives:
> > > a <- dt1[dt1[, .I[.N], by='Date']$V1]
> > > b <- dt1[J(unique(Date)),,mult='last'] # a little slower
> > > d <- dt1[, .SD[.N], by='Date'] # 600x slower; it would take ages to benchmark
> > > identical(a,b) # true
> > > identical(a,d) # false
> > > identical(as.data.frame(d),as.data.frame(a)) # true
> > > 
> > > --Frank
> > > 
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130816/1e7b431d/attachment-0001.html>


More information about the datatable-help mailing list