[datatable-help] Potential bug with sorting/summarizing by POSIXct and logical column
Michael Nelson
michael.nelson at sydney.edu.au
Tue Feb 26 01:40:02 CET 2013
I can't replicate this problem using data.table 1.8.7 (installed about 3 weeks ago) on
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)
Michael
________________________________
From: datatable-help-bounces at lists.r-forge.r-project.org [datatable-help-bounces at lists.r-forge.r-project.org] on behalf of Victor Kryukov [victor.kryukov at gmail.com]
Sent: Tuesday, 26 February 2013 9:26 AM
To: datatable-help at lists.r-forge.r-project.org
Subject: [datatable-help] Potential bug with sorting/summarizing by POSIXct and logical column
Hello,
I've encounted what looks like a bug while sorting by POSIXct and logical column, which may or may not be related to the following bug:
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2552&group_id=240&atid=975
Here are all the details: http://stackoverflow.com/questions/15077232/data-table-not-summarizing-properly-by-two-columns
Here is the test case:
# First some data
data <- data.table(structure(list(
month = structure(c(1356998400, 1356998400, 1356998400,
1359676800, 1354320000, 1359676800, 1359676800, 1356998400, 1356998400,
1354320000, 1354320000, 1354320000, 1359676800, 1359676800, 1359676800,
1356998400, 1359676800, 1359676800, 1356998400, 1359676800, 1359676800,
1359676800, 1359676800, 1354320000, 1354320000), class = c("POSIXct",
"POSIXt"), tzone = "UTC"),
portal = c(TRUE, TRUE, FALSE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
),
satisfaction = c(10L, 10L, 10L, 9L, 10L, 10L, 9L, 10L, 10L,
9L, 2L, 8L, 10L, 9L, 10L, 10L, 9L, 10L, 10L, 10L, 9L, 10L, 9L,
10L, 10L)),
.Names = c("month", "portal", "satisfaction"),
row.names = c(NA, -25L), class = "data.frame"))
# Summarizing by month, portal with tapply works:
> tapply(data$satisfaction, list(data$month, data$portal), mean)
FALSE TRUE
2012-12-01 8.5 8.000000
2013-01-01 10.0 10.000000
2013-02-01 9.0 9.545455
# Summarizing with 'by' argument of data.table does not:
> data[, mean(satisfaction), by = 'month,portal']>
data[, mean(satisfaction), by = list(month, portal)]
month portal V1
1: 2013-01-01 FALSE 10.000000
2: 2013-02-01 TRUE 9.000000
3: 2013-01-01 TRUE 10.000000
4: 2012-12-01 FALSE 8.500000
5: 2012-12-01 TRUE 7.333333
6: 2013-02-01 TRUE 9.666667
7: 2013-02-01 FALSE 9.000000
8: 2012-12-01 TRUE 10.000000
# Summarizing only this year's data works:
data[month >= ymd(20130101), mean(satisfaction), by = 'month,portal']
month portal V1
1: 2013-01-01 TRUE 10.000000
2: 2013-01-01 FALSE 10.000000
3: 2013-02-01 TRUE 9.545455
4: 2013-02-01 FALSE 9.000000
Yours Sincerely,
Victor Kryukov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130226/c1945761/attachment-0001.html>
More information about the datatable-help
mailing list