[datatable-help] bug in merge when a table is keyed?

Arunkumar Srinivasan aragorn168b at gmail.com
Tue Feb 24 00:23:06 CET 2015


Hi Carlos,

It’d be helpful to generate a MRE as to how you ended up with the data.table having a key set when it’s not really ordered properly.. Also, could you please test on level version as well (I don’t know the version you’re running on)?

-- 
Arun

On 22 Feb 2015 at 00:41:51, Carlos Alberto Arnillas (carlosalberto.arnillas at gmail.com) wrote:

Hello  
I am running the last version of R and data.table, however, I found a  
problem that I think has been reported for previous versions and I  
assumed it was fixed.  

Here is the data (as obtained from dput from a larger code)  
yy1 <- structure(list(Spp = c("vicr", "festuca"),  
rel_cover = c(0.0365853658536585,  
0.0609756097560976)),  
row.names = c(NA, -2L), class =  
c("data.table", "data.frame"),  
.Names = c("Spp", "rel_cover"))  

yy2 <- structure(list(Spp = c("eugra", "vicr", "festuca"),  
rel_cover = c(0.048780487804878,  
0.0609756097560976, 0.0975609756097561)),  
row.names = c(NA, -3L),  
class = c("data.table", "data.frame"),  
.Names = c("Spp", "rel_cover"), sorted = "Spp")  
> yy2  
Spp rel_cover  
1: eugra 0.04878049  
2: vicr 0.06097561  
3: festuca 0.09756098  

for some reason, the yy2 dataset had a key assigned (Spp) but wrongly  
applied (in fact, I never sort that dataset or the one that I used to  
create it using that variable). Then, if I try to merge both, I get a  
wrong result:  

> merge(yy1,yy2, by="Spp",all=T)  
Spp rel_cover.x rel_cover.y  
1: eugra NA 0.04878049  
2: festuca 0.06097561 NA  
3: festuca NA 0.09756098  
4: vicr 0.03658537 0.06097561  

however, if I set the key for each variable, I first get a warning,  
and then the right result  

> setkey(yy1, Spp)  
> setkey(yy2, Spp)  
Warning message:  
In setkeyv(x, cols, verbose = verbose, physical = physical) :  
Already keyed by this key but had invalid row order, key rebuilt. If  
you didn't go under the hood please let datatable-help know so the  
root cause can be fixed.  


> merge(yy1,yy2, by="Spp",all=T)  
Spp rel_cover.x rel_cover.y  
1: eugra NA 0.04878049  
2: festuca 0.06097561 0.09756098  
3: vicr 0.03658537 0.06097561  


To solve temporally the problem, I am using merge.data.frame, but I  
would prefer to keep all my data in data.table  

If it is not a bug, and I can do something to fix it, let me know please.  

Thanks in advance  

Carlos Alberto  
_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20150224/633bee8f/attachment.html>


More information about the datatable-help mailing list