<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">I have encountered a bug in the Cartesian join of two data.tables, where the resulting data.table is not sorted by its full key. This is in data.table v1.8.8. Please let me know if this issue has been brought up or if there is any insight regarding it.<div><br></div><div>Thank you,</div><div>Shir Levkowitz<br><div><br></div><div><br></div><div><br></div><div>-------------------------------------------------</div><div><br></div><div><div><font face="Courier">library(data.table)</font></div><div><font face="Courier"><br></font></div><div><font face="Courier">###### set up our example data tables</font></div><div><font face="Courier">test1 <- data.table(a=sample(1:3, 100, replace=TRUE),</font></div><div><font face="Courier"> b=sample(1:3, 100, replace=TRUE),</font></div><div><font face="Courier"> c=sample(1:10, 100,replace=TRUE))</font></div><div><font face="Courier">setkey(test1, a,b,c)</font></div><div><font face="Courier"><br></font></div><div><font face="Courier">test2 <- data.table(p=sample(1:3, 100, replace=TRUE),</font></div><div><font face="Courier"> q=sample(1:3, 100, replace=TRUE),</font></div><div><font face="Courier"> r=sample(1:100),</font></div><div><font face="Courier"> w=sample(1:100))</font></div><div><font face="Courier">setkey(test2, p,q)</font></div></div><div><font face="Courier"><br></font></div><div><font face="Courier"><br></font></div><div><div><font face="Courier">###### a cartesian join - this is where the issue arises</font></div><div><font face="Courier">test.join <- test1[test2,nomatch=0, allow.cartesian=TRUE]</font></div></div><div><font face="Courier"><br></font></div><div><font face="Courier">### have a look at the key</font></div><div><font face="Courier">k <- key(test.join)</font></div><div><font face="Courier">k</font></div><div><font face="Courier"><br></font></div><div><font face="Courier">### if we do a group by, we don't get the right aggregation</font></div><div><font face="Courier">test.gb <- test.join[,.N,by='a,b,c']</font></div><div><font face="Courier">test.gb[a == 1 & b == 1 & c == 1,]</font></div><div><font face="Courier">### when really what we want is:</font></div><div><font face="Courier">test.agg <- aggregate(r ~a+b+c, test.join, length)</font></div><div><font face="Courier">subset(test.agg, a == 1 & b == 1 & c == 1)</font></div><div><font face="Courier"><br></font></div><div><font face="Courier">### if we set the same key, we get a warning</font></div><div><font face="Courier">setkeyv(test.join, k)</font></div><div><br></div><div><font face="Courier">>> Warning message: </font></div><div><span style="font-family: Courier; ">In setkeyv(test.join, k) : </span><span style="font-family: Courier; ">Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed.</span></div><div><font face="Courier"><br></font></div><div><div><br></div></div></div></body></html>