<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><p>Nicely reproducible post. Reproducible in v1.9.3 (latest commit) as well.</p>
<p>This is a tricky one. It happens because you’re setting key on <code>.SD</code> which should normally not be allowed. What happens is, when you set key the first time, there’s no key set (here) and therefore key is set on all the columns <code>x1</code>, <code>x2</code> and <code>x3</code>. </p>
<p>Now, the next group (in the <code>by=.</code>) is passed to your function, it’ll have the <code>key</code> already set to <code>x1,x2,x3</code> (because <code>setkey</code> modifies the object by reference), but <code>.SD</code> has obtained <strong>new</strong> data corresponding to <em>this</em> group. And <code>data.table</code> sorts this data, knowing that it already has key set.. but if the key is set then the order must be 1:n. But it wouldn’t be, as this data isn’t sorted. <code>data.table</code> warns in those scenarios.. and that’s why you get the warning. </p>
<p>To verify this, you can try:</p>
<pre><code>conflictsTable1 <- function(f, address) {
u <- unique(setkey(f))
setattr(f, 'sorted', NULL)
if (nrow(u) == 1) return(NULL)
u
}
</code></pre>
<p>Basically, we set the key of <code>f</code> (which is equal to <code>.SD</code> as it’s only modified by reference) to <code>NULL</code> everytime after.. so that <code>.SD</code> for the new group will not have the key set.</p>
<p>The ideal scenario here, IIUC, is that <code>setkey(.SD)</code> or things pointing to <code>.SD</code> should not be possible (locking binding doesn’t seem to affect things done by reference..). <code>.SD</code> however should retain the key of the data.table, if a key was set, wherever possible.</p>
<p><style>body{font-family:Helvetica,Arial;font-size:13px}</style><style>body {
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
padding:1em;
margin:auto;
background:#fefefe;
}
h1, h2, h3, h4, h5, h6 {
font-weight: bold;
}
h1 {
color: #000000;
font-size: 28pt;
}
h2 {
border-bottom: 1px solid #CCCCCC;
color: #000000;
font-size: 24px;
}
h3 {
font-size: 18px;
}
h4 {
font-size: 16px;
}
h5 {
font-size: 14px;
}
h6 {
color: #777777;
background-color: inherit;
font-size: 14px;
}
hr {
height: 0.2em;
border: 0;
color: #CCCCCC;
background-color: #CCCCCC;
}
p, blockquote, ul, ol, dl, li, table, pre {
margin: 15px 0;
}
a, a:visited {
color: #4183C4;
background-color: inherit;
text-decoration: none;
}
#message {
border-radius: 6px;
border: 1px solid #ccc;
display:block;
width:100%;
height:60px;
margin:6px 0px;
}
button, #ws {
font-size: 12 pt;
padding: 4px 6px;
border-radius: 5px;
border: 1px solid #bbb;
background-color: #eee;
}
code, pre, #ws, #message {
font-family: Monaco;
font-size: 10pt;
border-radius: 3px;
background-color: #F8F8F8;
color: inherit;
}
code {
border: 1px solid #EAEAEA;
margin: 0 2px;
padding: 0 5px;
}
pre {
border: 1px solid #CCCCCC;
overflow: auto;
padding: 4px 8px;
}
pre > code {
border: 0;
margin: 0;
padding: 0;
}
#ws { background-color: #f8f8f8; }
table {
border-collapse: collapse;
font-family: Helvetica, arial, freesans, clean, sans-serif;
color: rgb(51, 51, 51);
font-size: 15px; line-height: 25px;
padding: 0; }
table tr {
border-top: 1px solid #cccccc;
background-color: white;
margin: 0;
padding: 0; }
table tr:nth-child(2n) {
background-color: #f8f8f8; }
table tr th {
font-weight: bold;
border: 1px solid #cccccc;
margin: 0;
padding: 6px 13px; }
table tr td {
border: 1px solid #cccccc;
margin: 0;
padding: 6px 13px; }
table tr th :first-child, table tr td :first-child {
margin-top: 0; }
table tr th :last-child, table tr td :last-child {
margin-bottom: 0; }
.send { color:#77bb77; }
.server { color:#7799bb; }
.error { color:#AA0000; }</style></p><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div> <div id="bloop_sign_1402704505278157056" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px">Arun</div></div> <div style="color:black"><br>From: <span style="color:black">Ron Hylton</span> <a href="mailto:rhylton@verizon.net">rhylton@verizon.net</a><br>Reply: <span style="color:black">Ron Hylton</span> <a href="mailto:rhylton@verizon.net">rhylton@verizon.net</a><br>Date: <span style="color:black">June 14, 2014 at 1:55:53 AM</span><br>To: <span style="color:black">datatable-help@lists.r-forge.r-project.org</span> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>Subject: <span style="color:black"> [datatable-help] data.table is asking for help <br></span></div><br> <blockquote type="cite" class="clean_bq"><span><div lang="EN-US" link="#0563C1" vlink="#954F72" xml:lang="EN-US"><div></div><div>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<title></title>
<div class="WordSection1">
<p class="MsoNormal">The code below generates the warning:</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal" style="word-break:break-all"><span style="font-size:10.0pt;font-family:"Lucida Console";color:black;background:#E1E2E5">
In setkeyv(x, cols, verbose = verbose) :</span></p>
<p class="MsoNormal" style="word-break:break-all"><span style="font-size:10.0pt;font-family:"Lucida Console";color:black;background:#E1E2E5">
Already keyed by this key but had invalid row order, key
rebuilt. If you didn't go under the hood please let datatable-help
know so the root cause can be fixed.</span></p>
<p class="MsoNormal" style="word-break:break-all"><span style="font-size:10.0pt;font-family:"Lucida Console";color:black;background:#E1E2E5">
</span></p>
<p class="MsoNormal">This is my first attempt at using datatable so
I probably did something dumb, but maybe that‘s useful for
someone. The first case is the one that gives the
warnings.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">I’m also surprised at the timings. I
wrote the original algorithm using dataframe & ddply and I
expected datatable to be substantially faster; the opposite is
true.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">The algorithm does the following:
Certain columns in the table are keys and others are values in the
sense that each row with the same set of keys should have the same
set of values. Find all the key sets for which this is not
true and return the keys sets + conflicting value sets.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Insight into the performance would be
appreciated.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Regards,</p>
<p class="MsoNormal">Ron</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">library(data.table)</p>
<p class="MsoNormal">library(plyr)</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">conflictsTable1 <- function(f) {</p>
<p class="MsoNormal"> u <- unique(setkey(f))</p>
<p class="MsoNormal"> if (nrow(u) == 1) return(NULL)</p>
<p class="MsoNormal"> u</p>
<p class="MsoNormal">}</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">conflictsTable2 <- function(f) {</p>
<p class="MsoNormal"> u <- unique(f)</p>
<p class="MsoNormal"> if (nrow(u) == 1) return(NULL)</p>
<p class="MsoNormal"> u</p>
<p class="MsoNormal">}</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">conflictsFrame <- function(f) {</p>
<p class="MsoNormal"> u <- unique(f)</p>
<p class="MsoNormal"> if (nrow(u) == 1) return(NULL)</p>
<p class="MsoNormal"> u</p>
<p class="MsoNormal">}</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">N <- 10000</p>
<p class="MsoNormal">test <-
data.table(id=as.character(10000*sample(1:N,N,replace=TRUE)),
x1=rnorm(N), x2=rnorm(N), x3=rnorm(N))</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">setkey(test,id)</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">print(system.time(ut1 <- test[,
conflictsTable1(.SD), by=id]))</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">print(system.time(ut2 <- test[,
conflictsTable2(.SD), by=id]))</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">print(system.time(uf <- ddply(test, .(id),
conflictsFrame)))</p>
</div>
_______________________________________________
<br>datatable-help mailing list
<br>datatable-help@lists.r-forge.r-project.org
<br>https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</div></div></span></blockquote><p></p></body></html>