[datatable-help] Cartesian join invalid key order - bug report

Shir Levkowitz levkowitz at dc-energy.com
Wed Apr 10 16:46:24 CEST 2013


I have encountered a bug in the Cartesian join of two data.tables, where the resulting data.table is not sorted by its full key. This is in data.table v1.8.8. Please let me know if this issue has been brought up or if there is any insight regarding it.

Thank you,
Shir Levkowitz



-------------------------------------------------

library(data.table)

###### set up our example data tables
test1 <- data.table(a=sample(1:3, 100, replace=TRUE),
                    b=sample(1:3, 100, replace=TRUE),
                    c=sample(1:10, 100,replace=TRUE))
setkey(test1, a,b,c)

test2 <- data.table(p=sample(1:3, 100, replace=TRUE),
                    q=sample(1:3, 100, replace=TRUE),
                    r=sample(1:100),
                    w=sample(1:100))
setkey(test2, p,q)


###### a cartesian join - this is where the issue arises
test.join <- test1[test2,nomatch=0, allow.cartesian=TRUE]

### have a look at the key
k <- key(test.join)
k

### if we do a group by, we don't get the right aggregation
test.gb <- test.join[,.N,by='a,b,c']
test.gb[a == 1 & b == 1 & c == 1,]
### when really what we want is:
test.agg <- aggregate(r ~a+b+c, test.join, length)
subset(test.agg, a == 1 & b == 1 & c == 1)

### if we set the same key, we get a warning
setkeyv(test.join, k)

>> Warning message: 
In setkeyv(test.join, k) : Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130410/de22c210/attachment.html>


More information about the datatable-help mailing list