[datatable-help] data.table and aggregating out-of-order columns in result from by

Steve Lianoglou lianoglou.steve at gene.com
Wed Apr 16 19:11:09 CEST 2014


Hi,

On Wed, Apr 16, 2014 at 9:41 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Clayton,
>
> Thanks for posting it here. Here's the first follow-up. Here's an example:
>
> require(data.table) ## 1.9.3 comm 1263
> dt <- data.table(x=1:1e7, y=1:1e7)
>
> ## data.table optimisation removes names
> system.time(ans1 <- dt[, list(z=y), by=x])
>
> #   user  system elapsed
> #  7.193   0.275   7.859
>
> ## data.table can't optimise to remove names
> foo <- function(x) list(z=x)
> system.time(ans2 <- dt[, foo(y), by=x])
> #   user  system elapsed
> # 16.020   0.179  16.411
>
>> identical(ans1, ans2)
> [1] TRUE
>
> This is without checking for names, for each of the 1e7 groups.

Do you think the ~2x difference in speed is really a result of an
optimization based on the "names" thing, or is it due to the mechanics
required to invoke a function within each grouping of the second
example?

-steve

-- 
Steve Lianoglou
Computational Biologist
Genentech


More information about the datatable-help mailing list