[datatable-help] Setting the key of a table produced by merging reorders original table if key column was used as by column

Amelia Hardjasa amelia.hardjasa at pulseenergy.com
Mon Jul 21 21:24:08 CEST 2014


In data.table version 1.9.2: When merging two data tables with
merge.data.table, if the "by" column is the same as the key column of
at least one table, setting the key of the new table will reorder the
original table without changing the key, leading to this warning:

Warning message:
In setkeyv(x, cols, verbose = verbose) :
  Already keyed by this key but had invalid row order, key rebuilt. If
you didn't go under the hood please let datatable-help know so the
root cause can be fixed.

Presumably this is because when the key and by are the same, a copy is
not made/rekeyed (?merge.data.table: "Note that if the specified
columns in by is not the key (or head of the key) of x or y, then a
copy is first rekeyed prior to performing the merge"). The silent
reordering doesn't seem like desired behaviour, however.

Minimal example is below. The second case uses a different column for
the merge by and no problem is seen.


library(data.table)

dt.1 <- data.table(Y = c(rep("a", 2), rep("b", 2)), X = c(1:2), key = "X")
dt.2 <- data.table(X = c(2:1), Z = c("123", "456"))
dt.3 <- merge(dt.1, dt.2, by = "X", all.x = TRUE)
str(dt.1) #keyed by X, ordered by X
setkey(dt.3, Y)
str(dt.1) #keyed by X, but now ordered by Y
setkey(dt.1, X) #warning

dt.1 <- data.table(Y = c(rep("b", 2), rep("a", 2)), X = c(2:1), key = "Y")
dt.2 <- data.table(X = c(2:1), Z = c("123", "456"))
dt.3 <- merge(dt.1, dt.2, by = "X", all.x = TRUE)
str(dt.1) #keyed by Y, ordered by Y
setkey(dt.3, X)
str(dt.1) #remains keyed by Y, ordered by Y


Thanks for any help,
Amelia

-- 
The contents of this email, are confidential and may be privileged. If you 
are not the intended recipient please notify the sender immediately and 
remove it from your system. Please note that we have taken reasonable 
precautions against viruses and accept no liability for loss or damage 
caused by any virus present in this email or its attachments or caused by 
this email being intercepted, lost or corrupted as a result of 
transmission. Thank you. 


More information about the datatable-help mailing list