<div>I have a keyed data.table, DT, with 800k rows, of which about 0.5% are duplicates that need to removed. </div><div><br></div><div>Using unique(DT) of course widdles down the whole table to one row per key.</div><div><br>
</div><div>I would like to get results similar to unique.data.frame(DT)</div><div>Two problems with using unique.data.frame: (1) Speed (2) loss of key(DT)</div><div><br></div><div>So instead Im using a wrapper that </div>
<div> (1) caches key(DT) (2) removes the key (3) calls unique on DT (4) then repplies the key</div><div><br></div><div>However, this is convoluted (and also requires modifying setkey(.) and getdots(.)). </div><div>It occurs to me that I might be overlooking a simpler alternative. </div>
<div><br></div><div>anythoughts? </div><div><br></div><div>Thanks, </div><div>Rick </div><div><br></div><div><br></div><div>_Here is what I am using_: </div><div><br></div><div> uniqueRows <- function(DT) { </div><div>
# If already keyed (or not a DT), use regular unique(DT)</div><div> if (!haskey(DT) || !is.data.table(x) )</div><div> return(unique(DT))</div><div><br></div><div> .key <- key(DT) </div><div> setkey(DT, NULL)</div>
<div> setkeyE(unique(DT), eval(.key))</div><div> } </div><div><br></div><div><br></div><div> getdotsWithEval <- function () {</div><div> dots <- </div><div> as.character(match.call(sys.function(-1), call = sys.call(-1), </div>
<div> expand.dots = FALSE)$...)</div><div><br></div><div> if (grepl("^eval\\(", dots) && grepl("\\)$", dots))</div><div> return(eval(parse(text=dots)))</div><div> return(dots)</div>
<div> }</div><div><br></div><div> setkeyE <- function (x, ..., verbose = getOption("datatable.verbose")) {</div><div> # SAME AS setkey(.) WITH ADDITION THAT </div><div> # IF KEY IS WRAPPED IN eval(.) IT WILL BE PARSED</div>
<div> if (is.character(x)) </div><div> stop("x may no longer be the character name of the data.table. The possibility was undocumented and has been removed.")</div><div> #** THIS IS THE MODIFIED LINE **#</div>
<div> # OLD**: cols = getdots()</div><div> cols <- getdotsWithEval()</div><div> if (!length(cols)) </div><div> cols = colnames(x)</div><div> else if (identical(cols, "NULL")) </div>
<div> cols = NULL</div><div> setkeyv(x, cols, verbose = verbose)</div><div> }</div><div><br></div><div><br></div>-- <br><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">
<div style="font-size:13px">Ricardo Saporta</div><div style="font-size:13px">Graduate Student, Data Analytics</div><div style="font-size:13px"><span style="font-size:13px">Rutgers University, New Jersey</span></div><div style="font-size:13px">
<span style="font-size:13px">e: </span><a href="mailto:saporta@rutgers.edu" target="_blank" style="color:rgb(17,85,204);font-size:13px">saporta@rutgers.edu</a></div><div><br></div></div>