[datatable-help] FR #5072 reg.

Eduard Antonyan eduard.antonyan at gmail.com
Wed Nov 13 22:55:52 CET 2013


I think case 1 and case 2 should have same output and I think that the
merge should combine factor levels similar to how rbind does.

Btw another issue about factors exists in rbind'ing the j-expression:

dt = data.table(a = 1:2)

dt[, factor('a', levels = letters[1:.I]), by = a]$V1
#[1] a a
#Levels: a

but if you print out the j-expression it's evident that factor information
gets lost:

dt[, print(factor('a', levels = letters[1:.I])), by = a]
#[1] a
#Levels: a
#[1] a
#Levels: a b




On Wed, Nov 13, 2013 at 3:24 PM, Arunkumar Srinivasan <aragorn168b at gmail.com
> wrote:

>  Hi everybody,
> Regarding FR #5072 here:
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5072&group_id=240&atid=975
>
> Let's take two data.tables X and Y with key set to one column, "V1".
> data.table currently deals with Y[X] differently when Y is a factor and 1)
> X is a factor and 2) X is not a factor. Let me illustrate this:
>
> case 1:
> # X and Y are factors
> require(data.table)
> X <- data.table(V1=factor(c("A", "B", "C")))
> Y <- data.table(V1=factor(c("B", "D", "E")), key="V1")
>
> > Y[X] # X is a factor
>   V1
> 1:  A
> 2:  B
> 3:  C
> > Y[X]$V1
> [1] A B C
> Levels: A B C
>
> ** Note that when both X and Y are factors, only the levels of X are in
> the join'd result (no D/E).
>
> case 2:
> # X is **not** a factor
> require(data.table)
> X <- data.table(V1=c("A", "B", "C"))
> Y <- data.table(V1=factor(c("B", "D", "E")), key="V1")
> > Y[X] # x is not a factor
>    V1
> 1: NA
> 2:  B
> 3: NA
>
> > Y[X]$V1
> [1] <NA> B    <NA>
> Levels: B D E
>
> ** Note that the results have "NA" in them as the join is concerned with
> retaining levels from "Y".
>
> The first question is: Why this difference? Should there be a difference
> between when X is or is not a factor? What do you guys think should be the
> intended result?
>
> The side-effect comes during "merge" as it internally uses this principle
> (and hence FR #5072). For example:
>
> merge(X, Y, by="V1", all=TRUE)
>    V1
> 1: NA
> 2: NA
> 3:  B
> 4:  D
> 5:  E
>
> > merge(X, Y, by="V1", all=TRUE)$V1
> [1] <NA> <NA> B    D    E
> Levels: B D E
>
> The second question is: Is this intended result?
>
> Arun
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131113/254b9ec1/attachment.html>


More information about the datatable-help mailing list