[datatable-help] data.table is asking for help

Ron Hylton rhylton at verizon.net
Sat Jun 14 01:55:12 CEST 2014


The code below generates the warning:

 

In setkeyv(x, cols, verbose = verbose) :

  Already keyed by this key but had invalid row order, key rebuilt. If you
didn't go under the hood please let datatable-help know so the root cause
can be fixed.

 

This is my first attempt at using datatable so I probably did something
dumb, but maybe that's useful for someone.  The first case is the one that
gives the warnings.

 

I'm also surprised at the timings.  I wrote the original algorithm using
dataframe & ddply and I expected datatable to be substantially faster; the
opposite is true.

 

The algorithm does the following:  Certain columns in the table are keys and
others are values in the sense that each row with the same set of keys
should have the same set of values.  Find all the key sets for which this is
not true and return the keys sets + conflicting value sets.

 

Insight into the performance would be appreciated.

 

Regards,

Ron

 

library(data.table)

library(plyr)

 

conflictsTable1 <- function(f) {

  u <- unique(setkey(f))

  if (nrow(u) == 1) return(NULL)

  u

}

 

conflictsTable2 <- function(f) {

  u <- unique(f)

  if (nrow(u) == 1) return(NULL)

  u

}

 

conflictsFrame <- function(f) {

  u <- unique(f)

  if (nrow(u) == 1) return(NULL)

  u

}

 

N <- 10000

test <- data.table(id=as.character(10000*sample(1:N,N,replace=TRUE)),
x1=rnorm(N), x2=rnorm(N), x3=rnorm(N))

 

setkey(test,id)

 

print(system.time(ut1 <- test[, conflictsTable1(.SD), by=id]))

 

print(system.time(ut2 <- test[, conflictsTable2(.SD), by=id]))

 

print(system.time(uf <- ddply(test, .(id), conflictsFrame)))

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140613/f9849ef1/attachment.html>


More information about the datatable-help mailing list