[datatable-help] NA in character column interfering with join in multi-column key?
J R
fe292a at gmail.com
Mon Jan 28 22:10:04 CET 2013
My understanding is that NA values may not be joined to but are still
allowed in the key.
Here's a minimal example of some unexpected behavior I ran into while
trying to look up complete rows against a 3-column character key
(using Rev. 800). ## denotes output
# Trying to look up the first row in X. Second column has NA in second row.
X <- data.table(response = c("BL", "BL", "BL"), predictor = c("AN",
NA, "PT"), dataset = c("cd","hi", "cd"), fitid = 1:3, key =
"response,predictor,dataset")
Y <- data.table(response = "BL", predictor = "AN", dataset = "cd")
X[Y]
# fitid = NA ==> no match was found
## response predictor dataset fitid
##1: BL AN cd NA
# Merge doesn't work in this order:
merge(Y, X, by = c("response", "predictor", "dataset"))
## Empty data.table (0 rows) of 4 cols: response,predictor,dataset,fitid
# but does in this order
merge(X, Y, by = c("response", "predictor", "dataset"))
## response predictor dataset fitid
## 1: BL AN cd 1
# merge.data.frame gives expected answer:
merge.data.frame(Y, X)
## response predictor dataset fitid
## 1 BL AN cd 1
# Different key order gives expected behavior for both X[Y] and merge:
setkey(X, dataset, predictor, response)
setcolorder(Y, key(X))
X[Y]
## dataset predictor response fitid
## 1: cd AN BL 1
merge(Y, X, by = c("dataset", "predictor", "response"))
## dataset predictor response fitid
##1: cd AN BL 1
sessionInfo()
##R version 2.15.2 (2012-10-26)
##Platform: x86_64-w64-mingw32/x64 (64-bit)
##locale:
##[1] LC_COLLATE=English_United States.1252
##[2] LC_CTYPE=English_United States.1252
##[3] LC_MONETARY=English_United States.1252
##[4] LC_NUMERIC=C
##[5] LC_TIME=English_United States.1252
##attached base packages:
##[1] stats graphics grDevices utils datasets methods base
##other attached packages:
##[1] data.table_1.8.7
More information about the datatable-help
mailing list