[datatable-help] Speeding up column references with roll

Stavros Macrakis (Σταῦρος Μακράκης) macrakis at alum.mit.edu
Mon Jun 30 22:40:24 CEST 2014


OK, I'm retesting in 1.9.3, adding by=.EACHI. I don't see any significant
difference in the timings -- setnames is still 25% faster than
list(hittime=time). What exactly was fixed?

I also don't see any way to refer to the different time vs. hittime without
renaming the second time column.

You mention some FR's, but they're hard to find without the specific
numbers.

Where can I find the 1.9.3 reference manual? I think it would be easier to
understand for me than the incremental changes in the New Features
listings. On my system (MacOSX), build_vignettes=TRUE gives an error in
texi2dvi -- would that have generated the refman? If so, how do I fix that?

Thanks,

               -s


On Mon, Jun 30, 2014 at 1:00 PM, Arunkumar Srinivasan <aragorn168b at gmail.com
> wrote:

> Once again, has been fixed in 1.9.3. Now join requires `by=.EACHI`
> (explicit) to perform a by-without-by.
> https://github.com/Rdatatable/data.table/blob/master/README.md
> Have a look at the first FR (by = .EACHI runs ...) that's been fixed in
> 1.9.3 - there's some changes in the way join results in due to these
> changes (which've been discussed since and for quite sometime) to bring
> more consistency to the DT[i, j, by] syntax. Also have a look at the second
> FR and the links it points to for the discussions.
>
> In general, it's better to test with the devel version (and have a look at
> README) for any bugs you may encounter.
>
> Arun
>
> From: Stavros Macrakis (Σταῦρος Μακράκης) macrakis at alum.mit.edu
> Reply: Stavros Macrakis (Σταῦρος Μακράκης) macrakis at alum.mit.edu
> Date: June 30, 2014 at 5:38:10 PM
> To: datatable-help at r-forge.wu-wien.ac.at
> datatable-help at r-forge.wu-wien.ac.at
> Subject:  [datatable-help] Speeding up column references with roll
>
>  In the following example, it is about 15-25% faster to use setnames
> rather than j=list(name=var). Is there some better approach to referencing
> the other joined column when using roll?
>
>  # Use j=list(name=var)
> calc1 <- function(d) {
>   d[ hit==1
>    ][ d,list(hittime=time),roll=-20
>    ][ !is.na(hittime)
>    ]
> }
>
> # Use setnames
> calc2 <- function(d) {
>   temp <- d[ hit==1
>            ][ d,time,roll=-20
>            ]
>   setnames(temp,3,"hittime")
>   temp[!is.na(hittime)]
> }
>
>  # Generate sample data
> set.seed(12312391)
> data <- data.table(
>           group = sample(1e3,1e7,replace=T),
>           time = ceiling(runif(1e7, 0, 1e5)),
>           hit = rbinom(1e7, 1, p = 0.1),
>   key=c("group","time"))
>
> # Timing
>
> system.time(replicate(10,{gc();calc1(data)})) => 69 sec
> system.time(replicate(10,{gc();calc2(data)})) => 52 sec
>  _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140630/22611307/attachment-0001.html>


More information about the datatable-help mailing list