I have raised a bug report under, which can be found under #1922.<br><br><div class="gmail_quote">On Mon, Apr 2, 2012 at 2:17 AM, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com">mdowle@mdowle.plus.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Thanks for example and data, very clear.<br>
<br>
Yes, problem looks to be factors with unused levels, when joined to a<br>
character column, as you suggested. Work arounds are to drop the unused<br>
levels or convert to character, as you found.<br>
<br>
A fix is a bit more involved and won't have time for a while. Please<br>
could you file a bug report so it doesn't get forgotten.<br>
<br>
Thanks.<br>
<div class="HOEnZb"><div class="h5"><br>
On Fri, 2012-03-30 at 16:51 +0100, Matthew Dowle wrote:<br>
> Hi. A quick read suggests it's not intended and that's a bug. Just convert<br>
> the columns to character for now, and it should work. Character columns<br>
> are now preferred going forward, so I'd be suggesting that anyway even if<br>
> it worked.<br>
><br>
> > So in case this is expected behavior, should data.table<br>
> > give at least a warning that says something like "You join two<br>
> > data.tables whereby one keyed column is a factor, one is a character.<br>
> > That is probably not your intention. Convert the factor column to<br>
> > character or vice versa."?<br>
><br>
> Yes. It should be converting to character (with a warning) in this case.<br>
> Thought that's what I coded and tested. Will investigate...<br>
><br>
> > Hi together,<br>
> ><br>
> > here is the problem I needed dput for: <a href="http://www.fileuploadx.de/287440" target="_blank">http://www.fileuploadx.de/287440</a><br>
> > (sorry,<br>
> > I know that this filehoster is annoying because you have to wait until you<br>
> > can download the file; I hope you have a coffee machine close by ;-)<br>
> ><br>
> > In this attachment, I basically load in two data.tables DT1 and DT2 that I<br>
> > want to join, i.e. DT2[DT1], according to the keyed columns "Company_Code"<br>
> > and "intDatum" in DT1 and "DSCD" and "intDatum" in DT2. However, while<br>
> > "DSCD" is formatted as a character-column, "Company_Code" is formatted as<br>
> > a<br>
> > factor-column. As you can see from the long structure-object, there are<br>
> > plenty of levels here (the actual data.tables are very small).<br>
> ><br>
> > Now, when I try to join those with DT2[DT1], I get:<br>
> ><br>
> ><br>
> > DSCD intDatum MONTH MV SICClass<br>
> > [1,] 997859 151 <NA> NA 44<br>
> > [2,] 997859 152 <NA> NA 44<br>
> > [3,] 998064 151 <NA> NA 15<br>
> > [4,] 998064 152 <NA> NA 15<br>
> > [5,] 142268 151 <NA> NA 53<br>
> > [6,] 142268 152 <NA> NA 53<br>
> > [7,] 142859 151 <NA> NA 56<br>
> > [8,] 142859 152 <NA> NA 56<br>
> > [9,] 143415 151 <NA> NA 63<br>
> > [10,] 143415 152 <NA> NA 63<br>
> > [11,] 307045 151 <NA> NA 15<br>
> > [12,] 307045 152 <NA> NA 15<br>
> ><br>
> ><br>
> ><br>
> > Basically, data.table finds no values for MV and MONTH for any DSCD<br>
> > and intDatum combination. However, as DT2[DSCD=="142268"] clearly<br>
> > shows, there are values for that DSCD:<br>
> ><br>
> ><br>
> ><br>
> > DSCD MONTH MV intDatum<br>
> > [1,] 142268 1997-08-28 1901.12 151<br>
> > [2,] 142268 1997-09-28 1829.00 152<br>
> ><br>
> ><br>
> ><br>
> > Those, however, only show up in the join after i get rid of all the<br>
> > unused levels (equivalently, I can also transform the Company_Code to<br>
> > a character column):<br>
> ><br>
> ><br>
> > DT1[, Company_Code := factor(Company_Code)]<br>
> > DT2[DT1]<br>
> ><br>
> ><br>
> > DSCD intDatum MONTH MV SICClass<br>
> > [1,] 997859 151 <NA> NA 44<br>
> > [2,] 997859 152 <NA> NA 44<br>
> > [3,] 998064 151 <NA> NA 15<br>
> > [4,] 998064 152 <NA> NA 15<br>
> > [5,] 142268 151 1997-08-28 1901.12 53<br>
> > [6,] 142268 152 1997-09-28 1829.00 53<br>
> > [7,] 142859 151 <NA> NA 56<br>
> > [8,] 142859 152 <NA> NA 56<br>
> > [9,] 143415 151 <NA> NA 63<br>
> > [10,] 143415 152 <NA> NA 63<br>
> > [11,] 307045 151 <NA> NA 15<br>
> > [12,] 307045 152 <NA> NA 15<br>
> ><br>
> ><br>
> ><br>
> > I'm pretty sure this behaviour occurred only with version 1.8.0,<br>
> > probably because data.table coerced every key to factor before (see<br>
> > the NEWS to 1.8.0). So my question is: Is what happens here intended<br>
> > behavior? I'm honest with you: I'm working now for a while with R and<br>
> > factors are one of those things that I never got. I just don't see<br>
> > their use and every so often they cause me huge problems (as in this<br>
> > case). So I'm probably making something stupid here. The nasty thing<br>
> > about this issue here is that mostly, however, the joins just work as<br>
> > expected (believe me, I tried to produce a simple example with one<br>
> > column factor and one character that would reproduce this behavior,<br>
> > but no matter what I did, the joins afterwards always worked as<br>
> > expected). So in case this is expected behavior, should data.table<br>
> > give at least a warning that says something like "You join two<br>
> > data.tables whereby one keyed column is a factor, one is a character.<br>
> > That is probably not your intention. Convert the factor column to<br>
> > character or vice versa."?<br>
> ><br>
> ><br>
> ><br>
> > Thanks for your help!<br>
> ><br>
> ><br>
> ><br>
> > Christoph<br>
> > _______________________________________________<br>
> > datatable-help mailing list<br>
> > <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
> > <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
><br>
><br>
> _______________________________________________<br>
> datatable-help mailing list<br>
> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
<br>
<br>
</div></div></blockquote></div><br><br>