[datatable-help] Merge bug in v 1.7.8 and 1.7.9?

DM tb2usd at gmail.com
Tue Jan 31 16:02:45 CET 2012


As promised, here is a reproducible example:

    library(data.table)
    set.seed(0)
    dfA = data.frame(i = 1:20, j = rep(1:2, 10), k = rep(1:4, 5), A =
rnorm(20))
    dfB = data.frame(j = rep(1:2, 2), k = 1:4, B = rnorm(4))

    dtA = as.data.table(dfA)
    dtB = as.data.table(dfB)

    dtC = merge(dtA, dtB, by = c("j","k"), all.x = TRUE)


The first rows of dtC are:

> >     dtC
>        i j k            A          B
>  [1,]  1 1 1  1.262954285 -0.2242679
>  [2,]  5 1 1  0.414641434 -0.2242679
>  [3,]  9 1 1 -0.005767173  0.1333364
>  [4,] 13 1 1 -1.147657009  0.3773956
>

While the entries for dtB are:

>     dtB
>      j k          B
> [1,] 1 1 -0.2242679
> [2,] 2 2  0.3773956
> [3,] 1 3  0.1333364
> [4,] 2 4  0.8041895
>

It seems that the "B" entries in dtC are definitely coming from dtB, but
not the expected rows (i.e. the first one in the above example).







On Tue, Jan 31, 2012 at 8:53 AM, DM <tb2usd at gmail.com> wrote:

> Good morning,
>
> I found a discrepancy in results from two scripts that were run with the
> same data with versions 1.7.6 and 1.7.8 (and now 1.7.9) of data.table.  It
> seems that the root issue is differences in `merge.data.table`.
>
> I have two objects dtA and dtB, with columns (i, j, k, A) and (j, k, B),
> respectively.  I merge these via:
>
> dtC = merge(dtA, dtB, by = c("k", "j"), all.x = TRUE)
>
> NB: In dtA, many rows have matching values of (j, k) (i.e. these are not
> unique per row), while they are unique in dtB.  In addition, there are no
> keys assigned to dtA nor dtB, though it seems V1.7.9 creates keys (k,j) for
> dtC (haven't yet checked for V1.7.6, but it doesn't bother me).
>
> In V1.7.6, for dtC rows with matching (j,k) entries, the B entries also
> match, which is the expected behavior.  In V1.7.8 and V1.7.9, this is no
> longer the case.  I am not sure where the B entries come from.
>
> I will attempt to generate a reproducible example, and follow-up.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20120131/20880a65/attachment.html>


More information about the datatable-help mailing list