<div class="gmail_quote">Hi together,
<div><br></div><div>here is the problem I needed dput for: <a href="http://www.fileuploadx.de/287440">http://www.fileuploadx.de/287440</a> (sorry, I know that this filehoster is annoying because you have to wait until you can download the file; I hope you have a coffee machine close by ;-)</div>
<div><br></div><div>In this attachment, I basically load in two data.tables DT1 and DT2 that I want to join, i.e. DT2[DT1], according to the keyed columns "Company_Code" and "intDatum" in DT1 and "DSCD" and "intDatum" in DT2. However, while "DSCD" is formatted as a character-column, "Company_Code" is formatted as a factor-column. As you can see from the long structure-object, there are plenty of levels here (the actual data.tables are very small).</div>
<div><br></div><div>Now, when I try to join those with DT2[DT1], I get:</div><div><br></div><div><span style="border-collapse:separate"><span style="text-align:-webkit-left"><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
DSCD intDatum MONTH MV SICClass
[1,] 997859 151 <NA> NA 44
[2,] 997859 152 <NA> NA 44
[3,] 998064 151 <NA> NA 15
[4,] 998064 152 <NA> NA 15
[5,] 142268 151 <NA> NA 53
[6,] 142268 152 <NA> NA 53
[7,] 142859 151 <NA> NA 56
[8,] 142859 152 <NA> NA 56
[9,] 143415 151 <NA> NA 63
[10,] 143415 152 <NA> NA 63
[11,] 307045 151 <NA> NA 15
[12,] 307045 152 <NA> NA 15</pre><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
<br></pre><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
Basically, data.table finds no values for MV and MONTH for any DSCD and intDatum combination. However, as DT2[DSCD=="142268"] clearly shows, there are values for that DSCD:</pre><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
<br></pre><pre style="outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;margin-top:0px;margin-bottom:0px">
<span style="border-collapse:separate"><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
DSCD MONTH MV intDatum
[1,] 142268 1997-08-28 1901.12 151
[2,] 142268 1997-09-28 1829.00 152</pre><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
<br></pre><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
Those, however, only show up in the join after i get rid of all the unused levels (equivalently, I can also transform the Company_Code to a character column):</pre><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
<br></pre><pre style="outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;margin-top:0px;margin-bottom:0px">
<font face="'Lucida Console'"><span style="font-size:12px;line-height:14px;white-space:pre-wrap">DT1[, Company_Code := factor(Company_Code)]
DT2[DT1]</span></font></pre><pre style="outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;margin-top:0px;margin-bottom:0px">
<span style="border-collapse:separate;font-family:Arial;white-space:normal;font-size:medium"><span style="font-family:'Lucida Console';font-size:12px;line-height:14px;white-space:pre-wrap"><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
DSCD intDatum MONTH MV SICClass
[1,] 997859 151 <NA> NA 44
[2,] 997859 152 <NA> NA 44
[3,] 998064 151 <NA> NA 15
[4,] 998064 152 <NA> NA 15
[5,] 142268 151 1997-08-28 1901.12 53
[6,] 142268 152 1997-09-28 1829.00 53
[7,] 142859 151 <NA> NA 56
[8,] 142859 152 <NA> NA 56
[9,] 143415 151 <NA> NA 63
[10,] 143415 152 <NA> NA 63
[11,] 307045 151 <NA> NA 15
[12,] 307045 152 <NA> NA 15</pre><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
<br></pre><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
<pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
I'm pretty sure this behaviour occurred only with version 1.8.0, probably because data.table coerced every key to factor before (see the NEWS to 1.8.0). So my question is: Is what happens here intended behavior? I'm honest with you: I'm working now for a while with R and factors are one of those things that I never got. I just don't see their use and every so often they cause me huge problems (as in this case). So I'm probably making something stupid here. The nasty thing about this issue here is that mostly, however, the joins just work as expected (believe me, I tried to produce a simple example with one column factor and one character that would reproduce this behavior, but no matter what I did, the joins afterwards always worked as expected). So in case this is expected behavior, should data.table give at least a warning that says something like "You join two data.tables whereby one keyed column is a factor, one is a character. That is probably not your intention. Convert the factor column to character or vice versa."?</pre>
<pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
<br></pre><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
Thanks for your help!</pre><span class="HOEnZb"><font color="#888888"><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
<br></pre><pre style="font-family:'Lucida Console';font-size:9pt!important;outline-style:none;outline-width:initial;outline-color:initial;border-top-style:none;border-right-style:none;border-bottom-style:none;border-left-style:none;border-width:initial;border-color:initial;white-space:pre-wrap!important;margin-top:0px;margin-bottom:0px;line-height:1.2">
Christoph</pre></font></span></pre></span></span></pre></span></pre></span></span></div></div><br>