[datatable-help] bug in merge when a table is keyed?
Carlos Alberto Arnillas
carlosalberto.arnillas at gmail.com
Sun Feb 22 00:41:20 CET 2015
Hello
I am running the last version of R and data.table, however, I found a
problem that I think has been reported for previous versions and I
assumed it was fixed.
Here is the data (as obtained from dput from a larger code)
yy1 <- structure(list(Spp = c("vicr", "festuca"),
rel_cover = c(0.0365853658536585,
0.0609756097560976)),
row.names = c(NA, -2L), class =
c("data.table", "data.frame"),
.Names = c("Spp", "rel_cover"))
yy2 <- structure(list(Spp = c("eugra", "vicr", "festuca"),
rel_cover = c(0.048780487804878,
0.0609756097560976, 0.0975609756097561)),
row.names = c(NA, -3L),
class = c("data.table", "data.frame"),
.Names = c("Spp", "rel_cover"), sorted = "Spp")
> yy2
Spp rel_cover
1: eugra 0.04878049
2: vicr 0.06097561
3: festuca 0.09756098
for some reason, the yy2 dataset had a key assigned (Spp) but wrongly
applied (in fact, I never sort that dataset or the one that I used to
create it using that variable). Then, if I try to merge both, I get a
wrong result:
> merge(yy1,yy2, by="Spp",all=T)
Spp rel_cover.x rel_cover.y
1: eugra NA 0.04878049
2: festuca 0.06097561 NA
3: festuca NA 0.09756098
4: vicr 0.03658537 0.06097561
however, if I set the key for each variable, I first get a warning,
and then the right result
> setkey(yy1, Spp)
> setkey(yy2, Spp)
Warning message:
In setkeyv(x, cols, verbose = verbose, physical = physical) :
Already keyed by this key but had invalid row order, key rebuilt. If
you didn't go under the hood please let datatable-help know so the
root cause can be fixed.
> merge(yy1,yy2, by="Spp",all=T)
Spp rel_cover.x rel_cover.y
1: eugra NA 0.04878049
2: festuca 0.06097561 0.09756098
3: vicr 0.03658537 0.06097561
To solve temporally the problem, I am using merge.data.frame, but I
would prefer to keep all my data in data.table
If it is not a bug, and I can do something to fix it, let me know please.
Thanks in advance
Carlos Alberto
More information about the datatable-help
mailing list