[datatable-help] data.table and aggregating out-of-order columns in result from by
Steve Lianoglou
lianoglou.steve at gene.com
Wed Apr 16 19:11:09 CEST 2014
Hi,
On Wed, Apr 16, 2014 at 9:41 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Clayton,
>
> Thanks for posting it here. Here's the first follow-up. Here's an example:
>
> require(data.table) ## 1.9.3 comm 1263
> dt <- data.table(x=1:1e7, y=1:1e7)
>
> ## data.table optimisation removes names
> system.time(ans1 <- dt[, list(z=y), by=x])
>
> # user system elapsed
> # 7.193 0.275 7.859
>
> ## data.table can't optimise to remove names
> foo <- function(x) list(z=x)
> system.time(ans2 <- dt[, foo(y), by=x])
> # user system elapsed
> # 16.020 0.179 16.411
>
>> identical(ans1, ans2)
> [1] TRUE
>
> This is without checking for names, for each of the 1e7 groups.
Do you think the ~2x difference in speed is really a result of an
optimization based on the "names" thing, or is it due to the mechanics
required to invoke a function within each grouping of the second
example?
-steve
--
Steve Lianoglou
Computational Biologist
Genentech
More information about the datatable-help
mailing list