[datatable-help] NA in character column interfering with join in multi-column key?

J R fe292a at gmail.com
Mon Jan 28 22:10:04 CET 2013


My understanding is that NA values may not be joined to but are still
allowed in the key.
Here's a minimal example of some unexpected behavior I ran into while
trying to look up complete rows against a 3-column character key
(using Rev. 800).  ## denotes output


# Trying to look up the first row in X. Second column has NA in second row.

X <- data.table(response = c("BL", "BL", "BL"), predictor = c("AN",
NA, "PT"), dataset = c("cd","hi", "cd"), fitid = 1:3, key =
"response,predictor,dataset")
Y <- data.table(response = "BL", predictor = "AN", dataset = "cd")

X[Y]
#  fitid = NA ==> no match was found
##   response predictor dataset fitid
##1:       BL        AN      cd    NA

# Merge doesn't work in this order:
merge(Y, X, by = c("response", "predictor", "dataset"))
## Empty data.table (0 rows) of 4 cols: response,predictor,dataset,fitid

# but does in this order
 merge(X, Y, by = c("response", "predictor", "dataset"))
##   response predictor dataset fitid
##  1:       BL        AN      cd     1

# merge.data.frame gives expected answer:
merge.data.frame(Y, X)
##  response predictor dataset fitid
## 1       BL        AN      cd     1

# Different key order gives expected behavior for both X[Y] and merge:
setkey(X, dataset, predictor, response)
setcolorder(Y, key(X))

X[Y]
##   dataset predictor response fitid
##  1:      cd        AN       BL     1
merge(Y, X, by = c("dataset", "predictor", "response"))
##   dataset predictor response fitid
##1:      cd        AN       BL     1

sessionInfo()
##R version 2.15.2 (2012-10-26)
##Platform: x86_64-w64-mingw32/x64 (64-bit)

##locale:
##[1] LC_COLLATE=English_United States.1252
##[2] LC_CTYPE=English_United States.1252
##[3] LC_MONETARY=English_United States.1252
##[4] LC_NUMERIC=C
##[5] LC_TIME=English_United States.1252

##attached base packages:
##[1] stats     graphics  grDevices utils     datasets  methods   base

##other attached packages:
##[1] data.table_1.8.7


More information about the datatable-help mailing list