[datatable-help] 'by' on a numeric column produces inconsistent output
Kevin Ushey
kevinushey at gmail.com
Thu Dec 19 02:54:57 CET 2013
I'm cross-posting this from the GitHub mirror:
https://github.com/arunsrinivasan/datatable/issues/2
For reference, I only see this with the latest RForge version of
data.table (1.8.11), not the CRAN version of data.table.
-----
library(data.table, lib="/Users/kevinushey/Library/R/3.1/library")
set.seed(32)
n <- 3
dt <- data.table(
y=rnorm(n),
by=round( rnorm(n), 1)
)
dt[,
list(max=max(y, na.rm=TRUE)),
by=list(by)
]
dt[,
list(max=max(y, na.rm=TRUE)),
by=list(by)
]
produces the output
> dt[,
+ list(max=max(y, na.rm=TRUE)),
+ by=list(by)
+ ]
by max
1: 0.4 0.01464054
2: 0.4 0.87328871
3: 0.7 -1.02794620
>
> dt[,
+ list(max=max(y, na.rm=TRUE)),
+ by=list(by)
+ ]
by max
1: 0.4 0.8732887
2: 0.7 -1.0279462
For some reason, the first return is wrong, while the second (and all
subsequent) output is correct. Any idea what's going on?
> sessionInfo()
R Under development (unstable) (2013-12-12 r64453)
Platform: x86_64-apple-darwin13.0.0 (64-bit)
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.8.11 knitr_1.5 devtools_1.4.1.99
BiocInstaller_1.13.3
loaded via a namespace (and not attached):
[1] compiler_3.1.0 digest_0.6.4 evaluate_0.5.1 formatR_0.10
httr_0.2 memoise_0.1
[7] parallel_3.1.0 plyr_1.8 RCurl_1.95-4.1 reshape2_1.2.2
stringr_0.6.2 tools_3.1.0
[13] whisker_0.3-2
---
Kevin
More information about the datatable-help
mailing list