[datatable-help] FR #5072 reg.
Arunkumar Srinivasan
aragorn168b at gmail.com
Wed Nov 13 22:24:41 CET 2013
Hi everybody,
Regarding FR #5072 here: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5072&group_id=240&atid=975
Let's take two data.tables X and Y with key set to one column, "V1". data.table currently deals with Y[X] differently when Y is a factor and 1) X is a factor and 2) X is not a factor. Let me illustrate this:
case 1:
# X and Y are factors
require(data.table)
X <- data.table(V1=factor(c("A", "B", "C")))
Y <- data.table(V1=factor(c("B", "D", "E")), key="V1")
> Y[X] # X is a factor
V1
1: A
2: B
3: C
> Y[X]$V1
[1] A B C
Levels: A B C
** Note that when both X and Y are factors, only the levels of X are in the join'd result (no D/E).
case 2:
# X is **not** a factor
require(data.table)
X <- data.table(V1=c("A", "B", "C"))
Y <- data.table(V1=factor(c("B", "D", "E")), key="V1")
> Y[X] # x is not a factor
V1
1: NA
2: B
3: NA
> Y[X]$V1
[1] <NA> B <NA>
Levels: B D E
** Note that the results have "NA" in them as the join is concerned with retaining levels from "Y".
The first question is: Why this difference? Should there be a difference between when X is or is not a factor? What do you guys think should be the intended result?
The side-effect comes during "merge" as it internally uses this principle (and hence FR #5072). For example:
merge(X, Y, by="V1", all=TRUE)
V1
1: NA
2: NA
3: B
4: D
5: E
> merge(X, Y, by="V1", all=TRUE)$V1
[1] <NA> <NA> B D E
Levels: B D E
The second question is: Is this intended result?
Arun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131113/46a09be5/attachment.html>
More information about the datatable-help
mailing list