[datatable-help] Looking for a faster method

arun smartpink111 at yahoo.com
Sun Aug 25 22:27:13 CEST 2013


Hi,
I tried a ?data.table() method to solve the problem in the link below.  

http://r.789695.n4.nabble.com/how-to-combine-apply-and-which-or-alternative-ways-to-do-so-td4674424.html#a4674434

But, it was not that fast.

set.seed(24)
vec1<- sample(1e5,1e3,replace=FALSE)
set.seed(48)
vec2<- sample(1e3,1e6,replace=TRUE)
system.time({res1<- tapply(vec1,1:1e3,FUN=function(i) {which(vec2==i)})})
# user  system elapsed 
#  3.912   0.000   3.880 

system.time(res2<- sapply(vec1,function(x) which(vec2%in%x)))
#   user  system elapsed 
# 24.368   0.000  23.247 

vecR1<-unlist(res1)
names(vecR1)<-NULL
vecR2<- unlist(res2)
identical(vecR1,vecR2)
#[1]TRUE

library(data.table)
dt1<- data.table(vec1,Group=1:1e3,key='Group')
system.time({res3<- dt1[,list(list(which(vec1==vec2))),by=Group]})##Not that fast

# user  system elapsed 
#  3.756   0.120   3.886 ######
identical(vecR1,unlist(res3$V1))
#[1] TRUE



Is there a faster way?

Thanks.

A.K.


More information about the datatable-help mailing list