<div dir="ltr">I think case 1 and case 2 should have same output and I think that the merge should combine factor levels similar to how rbind does.<div><br></div><div>Btw another issue about factors exists in rbind'ing the j-expression:</div>
<div><br></div><div><div>dt = data.table(a = 1:2)</div><div><div><br></div><div>dt[, factor('a', levels = letters[1:.I]), by = a]$V1</div><div>#[1] a a</div><div>#Levels: a</div></div><div><br></div><div>but if you print out the j-expression it's evident that factor information gets lost:</div>
<div><br></div><div><span style="font-family:arial,sans-serif;font-size:13px"><div>dt[, print(factor('a', levels = letters[1:.I])), by = a]</div><div>#[1] a</div><div>#Levels: a</div><div>#[1] a</div><div>#Levels: a b</div>
<div><br></div></span></div><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Nov 13, 2013 at 3:24 PM, Arunkumar Srinivasan <span dir="ltr"><<a href="mailto:aragorn168b@gmail.com" target="_blank">aragorn168b@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
Hi everybody,
</div><div>Regarding FR #5072 here: <a href="https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5072&group_id=240&atid=975" target="_blank">https://r-forge.r-project.org/tracker/index.php?func=detail&aid=5072&group_id=240&atid=975</a></div>
<div><br></div><div>Let's take two data.tables X and Y with key set to one column, "V1". data.table currently deals with Y[X] differently when Y is a factor and 1) X is a factor and 2) X is not a factor. Let me illustrate this:</div>
<div><br></div><div>case 1:</div><div># X and Y are factors</div><div>require(data.table)</div><div>X <- data.table(V1=factor(c("A", "B", "C")))</div><div>Y <- data.table(V1=factor(c("B", "D", "E")), key="V1")</div>
<div><br></div><div>> Y[X] # X is a factor</div><div><div> V1</div><div>1: A</div><div>2: B</div><div>3: C</div></div><div><div>> Y[X]$V1</div><div>[1] A B C</div><div>Levels: A B C</div></div>
<div><div><br></div><div>** Note that when both X and Y are factors, only the levels of X are in the join'd result (no D/E).</div><div><br></div><div><div>case 2:</div><div># X is **not** a factor</div>
<div>require(data.table)</div><div>X <- data.table(V1=c("A", "B", "C"))</div><div>Y <- data.table(V1=factor(c("B", "D", "E")), key="V1")</div></div>
<div><div>> Y[X] # x is not a factor</div><div> V1</div><div>1: NA</div><div>2: B</div><div>3: NA</div></div><div><br></div><div><div>> Y[X]$V1</div><div>[1] <NA> B <NA></div><div>Levels: B D E</div>
</div><div><br></div><div>** Note that the results have "NA" in them as the join is concerned with retaining levels from "Y".</div><div><br></div><div>The first question is: Why this difference? Should there be a difference between when X is or is not a factor? What do you guys think should be the intended result?</div>
<div><br></div><div>The side-effect comes during "merge" as it internally uses this principle (and hence FR #5072). For example:</div><div><br></div><div>merge(X, Y, by="V1", all=TRUE)</div><div><div>
V1</div>
<div>1: NA</div><div>2: NA</div><div>3: B</div><div>4: D</div><div>5: E</div></div><div><br></div><div><div>> merge(X, Y, by="V1", all=TRUE)$V1</div><div>[1] <NA> <NA> B D E</div><div>Levels: B D E</div>
</div><div><br></div><div>The second question is: Is this intended result?</div><div><br></div><div>Arun</div><div><br></div></div>
<br>_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br></blockquote></div><br></div>