[datatable-help] FR #5072 reg.

Wed Nov 13 22:24:41 CET 2013

Hi everybody,  
Regarding FR #5072 here: https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5072&group_id=240&atid=975

Let's take two data.tables X and Y with key set to one column, "V1". data.table currently deals with Y[X] differently when Y is a factor and 1) X is a factor and 2) X is not a factor. Let me illustrate this:

case 1:
# X and Y are factors
require(data.table)
X <- data.table(V1=factor(c("A", "B", "C")))
Y <- data.table(V1=factor(c("B", "D", "E")), key="V1")

> Y[X] # X is a factor
  V1
1:  A
2:  B
3:  C

> Y[X]$V1
[1] A B C
Levels: A B C

** Note that when both X and Y are factors, only the levels of X are in the join'd result (no D/E).

case 2:
# X is **not** a factor
require(data.table)
X <- data.table(V1=c("A", "B", "C"))
Y <- data.table(V1=factor(c("B", "D", "E")), key="V1")

> Y[X] # x is not a factor
   V1
1: NA
2:  B
3: NA

> Y[X]$V1
[1] <NA> B    <NA>
Levels: B D E

** Note that the results have "NA" in them as the join is concerned with retaining levels from "Y".

The first question is: Why this difference? Should there be a difference between when X is or is not a factor? What do you guys think should be the intended result?

The side-effect comes during "merge" as it internally uses this principle (and hence FR #5072). For example:

merge(X, Y, by="V1", all=TRUE)
   V1
1: NA
2: NA
3:  B
4:  D
5:  E

> merge(X, Y, by="V1", all=TRUE)$V1
[1] <NA> <NA> B    D    E
Levels: B D E

The second question is: Is this intended result?

Arun

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131113/46a09be5/attachment.html>