<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"Lucida Console";
panose-1:2 11 6 9 4 5 4 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Courier New";}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link="#0563C1" vlink="#954F72"><div class=WordSection1><p class=MsoNormal>The code below generates the warning:<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal style='word-break:break-all'><span style='font-size:10.0pt;font-family:"Lucida Console";color:black;background:#E1E2E5'>In setkeyv(x, cols, verbose = verbose) :<o:p></o:p></span></p><p class=MsoNormal style='word-break:break-all'><span style='font-size:10.0pt;font-family:"Lucida Console";color:black;background:#E1E2E5'> Already keyed by this key but had invalid row order, key rebuilt. If you didn't go under the hood please let datatable-help know so the root cause can be fixed.<o:p></o:p></span></p><p class=MsoNormal style='word-break:break-all'><span style='font-size:10.0pt;font-family:"Lucida Console";color:black;background:#E1E2E5'><o:p> </o:p></span></p><p class=MsoNormal>This is my first attempt at using datatable so I probably did something dumb, but maybe that‘s useful for someone. The first case is the one that gives the warnings.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I’m also surprised at the timings. I wrote the original algorithm using dataframe & ddply and I expected datatable to be substantially faster; the opposite is true.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>The algorithm does the following: Certain columns in the table are keys and others are values in the sense that each row with the same set of keys should have the same set of values. Find all the key sets for which this is not true and return the keys sets + conflicting value sets.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Insight into the performance would be appreciated.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Regards,<o:p></o:p></p><p class=MsoNormal>Ron<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>library(data.table)<o:p></o:p></p><p class=MsoNormal>library(plyr)<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>conflictsTable1 <- function(f) {<o:p></o:p></p><p class=MsoNormal> u <- unique(setkey(f))<o:p></o:p></p><p class=MsoNormal> if (nrow(u) == 1) return(NULL)<o:p></o:p></p><p class=MsoNormal> u<o:p></o:p></p><p class=MsoNormal>}<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>conflictsTable2 <- function(f) {<o:p></o:p></p><p class=MsoNormal> u <- unique(f)<o:p></o:p></p><p class=MsoNormal> if (nrow(u) == 1) return(NULL)<o:p></o:p></p><p class=MsoNormal> u<o:p></o:p></p><p class=MsoNormal>}<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>conflictsFrame <- function(f) {<o:p></o:p></p><p class=MsoNormal> u <- unique(f)<o:p></o:p></p><p class=MsoNormal> if (nrow(u) == 1) return(NULL)<o:p></o:p></p><p class=MsoNormal> u<o:p></o:p></p><p class=MsoNormal>}<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>N <- 10000<o:p></o:p></p><p class=MsoNormal>test <- data.table(id=as.character(10000*sample(1:N,N,replace=TRUE)), x1=rnorm(N), x2=rnorm(N), x3=rnorm(N))<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>setkey(test,id)<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>print(system.time(ut1 <- test[, conflictsTable1(.SD), by=id]))<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>print(system.time(ut2 <- test[, conflictsTable2(.SD), by=id]))<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>print(system.time(uf <- ddply(test, .(id), conflictsFrame)))<o:p></o:p></p></div></body></html>