[datatable-help] Slow execution: Extracting last value in each group

Frank Erickson FErickson at psu.edu
Fri Aug 16 15:43:52 CEST 2013


Hi Arun,

Yup, windows (see below).

I tried debugonce, but didn't really know what I was looking for. Every
step was instantaneous except this one:

debug: ans = .Call(Cdogroups, x, xcols, groups, grpcols, jiscols, grporder,
    o__, f__, len__, jsub, SDenv, cols, newnames, verbose)

--Frank

sessionInfo()

R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rbenchmark_1.0.0 data.table_1.8.8

loaded via a namespace (and not attached):
[1] tools_3.0.1



On Fri, Aug 16, 2013 at 5:37 AM, Arunkumar Srinivasan <aragorn168b at gmail.com
> wrote:

>  Frank,
> Is it a windows machine as well?
> And could you try to use `debugonce` to find out the line(s) where it's
> slow?
>
> Arun
>
> On Friday, August 16, 2013 at 12:34 PM, Frank Erickson wrote:
>
> I get similar timings to arun, with the data.table call being a lot slower
> than the other timings. If data.table is not optimized for that .SD
> expression, perhaps that is okay because, as Arun pointed out, there are
> alternatives.. I can't guess why it would perform differently on different
> hardware, though...
>
> # alternatives:
> a <- dt1[dt1[, .I[.N], by='Date']$V1]
> b <- dt1[J(unique(Date)),,mult='last'] # a little slower
> d <- dt1[, .SD[.N], by='Date'] # 600x slower; it would take ages to
> benchmark
> identical(a,b) # true
> identical(a,d) # false
> identical(as.data.frame(d),as.data.frame(a)) # true
>
> --Frank
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130816/3e85de44/attachment.html>


More information about the datatable-help mailing list