[datatable-help] Faster "CJ"
Arunkumar Srinivasan
aragorn168b at gmail.com
Fri Aug 23 12:21:59 CEST 2013
Filed this as FR #4849 here:
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=4849&group_id=240&atid=978
Arun
On Friday, August 23, 2013 at 11:49 AM, Arunkumar Srinivasan wrote:
> Hi everybody,
>
> I think there's a faster version of "CJ" function that's possible. The issue currently is that the "sort" is done at the very end by using `setkey` which will work on the data *after* getting all the combinations, and therefore sorting a huge amount of entries.
>
> However, a faster way would be to get it first sorted (even before working out all combinations) and then use the hack:
>
> setattr(l, 'sorted', names(l))
>
> Basically there are just 2 lines that need change (see bottom of the post).
>
> ---------
> Here's first some benchmarks on `CJ_fast` (see below) and `CJ` on a relatively big data:
>
> w <- sample(1e4, 1e3)
> x <- sample(letters, 12)
> y <- sample(letters, 12)
> z <- sample(letters, 12)
>
> system.time(t1 <- do.call(CJ, list(w,x,y,z)))
> user system elapsed
> 0.775 0.052 0.835
>
> system.time(t2 <- do.call(CJ_fast, list(w,x,y,z)))
> user system elapsed
> 0.220 0.001 0.221
>
>
> identical(t1, t2)
> [1] TRUE
> ---------
>
> The function: (there are only two changes)
>
> CJ_fast <- function (...)
> {
> l = list(...)
> if (length(l) > 1) {
> n = sapply(l, length)
> nrow = prod(n)
> x = c(rev(data.table:::take(cumprod(rev(n)))), 1L)
> # 1) SORT HERE
> for (i in seq(along = x)) l[[i]] = rep(sort(l[[i]], na.last = TRUE), each = x[i],
> length = nrow)
> }
> setattr(l, "row.names", .set_row_names(length(l[[1]])))
> setattr(l, "class", c("data.table", "data.frame"))
> vnames = names(l)
> if (is.null(vnames))
> vnames = rep("", length(l))
> tt = vnames == ""
> if (any(tt)) {
> vnames[tt] = paste("V", which(tt), sep = "")
> setattr(l, "names", vnames)
> }
> data.table:::settruelength(l, 0L)
> l = alloc.col(l)
> # 2) REPLACE SETKEY WITH ATTRIBUTE "SORTED"
> setattr(l, 'sorted', names(l))
> l
> }
>
>
> Arun
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130823/1e5c8de4/attachment.html>
More information about the datatable-help
mailing list