[datatable-help] bug in merge when a table is keyed?

Carlos Alberto Arnillas carlosalberto.arnillas at gmail.com
Sun Feb 22 00:41:20 CET 2015


Hello
I am running the last version of R and data.table, however, I found a
problem that I think has been reported for previous versions and I
assumed it was fixed.

Here is the data (as obtained from dput from a larger code)
yy1 <- structure(list(Spp = c("vicr", "festuca"),
                                 rel_cover = c(0.0365853658536585,
0.0609756097560976)),
                                 row.names = c(NA, -2L), class =
c("data.table", "data.frame"),
                                 .Names = c("Spp", "rel_cover"))

yy2 <- structure(list(Spp = c("eugra", "vicr", "festuca"),
                        rel_cover = c(0.048780487804878,
0.0609756097560976, 0.0975609756097561)),
                        row.names = c(NA, -3L),
                        class = c("data.table", "data.frame"),
                  .Names = c("Spp", "rel_cover"), sorted = "Spp")
> yy2
       Spp  rel_cover
1:   eugra 0.04878049
2:    vicr 0.06097561
3: festuca 0.09756098

for some reason, the yy2 dataset had a key assigned (Spp) but wrongly
applied (in fact, I never sort that dataset or the one that I used to
create it using that variable). Then, if I try to merge both, I get a
wrong result:

> merge(yy1,yy2, by="Spp",all=T)
       Spp rel_cover.x rel_cover.y
1:   eugra          NA  0.04878049
2: festuca  0.06097561          NA
3: festuca          NA  0.09756098
4:    vicr  0.03658537  0.06097561

however, if I set the key for each variable, I first get a warning,
and then the right result

> setkey(yy1, Spp)
> setkey(yy2, Spp)
Warning message:
In setkeyv(x, cols, verbose = verbose, physical = physical) :
  Already keyed by this key but had invalid row order, key rebuilt. If
you didn't go under the hood please let datatable-help know so the
root cause can be fixed.


> merge(yy1,yy2, by="Spp",all=T)
       Spp rel_cover.x rel_cover.y
1:   eugra          NA  0.04878049
2: festuca  0.06097561  0.09756098
3:    vicr  0.03658537  0.06097561


To solve temporally the problem, I am using merge.data.frame, but I
would prefer to keep all my data in data.table

If it is not a bug, and I can do something to fix it, let me know please.

Thanks in advance

Carlos Alberto


More information about the datatable-help mailing list