[datatable-help] converting sql to data table with subqueries and exists clauses
DS
ds5j at excite.com
Thu Feb 23 02:24:27 CET 2012
I took a crack at converting the following sql to data table.
it is working off of the movielens rating data (userid,rating,itemid, timestamp)
> (select ave(rating) from data x
> where x.itemid=a.itemid and x.userid!=a.userid
> and exists (select 1 from
> (select * from data y where y.userid=x.userid) y,
> (select * from data z where z.userid=a.userid) z
> where y.itemid=z.itemid
> and y.itemid!=a.itemid
> and abs(y.rating-z.rating)>=4 )
> ) pred_rating_disimilar
> from data1 a')
>
my data table attempt is below but it takes 13 hours to run:
#create columns in dataframe to hold results
dt$meanDisimilar<-NA
dt$countDisimilar<-NA
system.time(
for (i in 1:nrow(data1))
{
a<- dt[i,]
# get all users for movie
b<- data.table(dt[dt$itemid==a$itemid & dt$userid!=a$userid,list(userid)])
# get movies for users of this movie
c<-data.table(dt[dt$userid==a$userid,list(userid,itemid,rating)])
# get all movies for other users who rated this movie
d<- data.table(dt[dt$userid==b$userid ,list(userid,itemid,rating)])
setkey(c,itemid)
setkey(d,itemid)
e<-c[d,]
names(e)<-c("itemid", "userid0", "rating0", "userid", "rating")
#get all users where share with thi user and where rating >=4
f<-e[ abs(rating0-rating)>=4,list(userid)]
# get avg rating for these users on this movie
dt[i,]$meanDisimilar <- dt[dt$userid==f$userid & dt$itemid==a$itemid,mean(rating)]
dt[i,]$countDisimilar <- dt[dt$userid==f$userid & dt$itemid==a$itemid,nrow(rating)]
#print(dt[dt$userid==f$userid & dt$itemid==a$itemid,mean(rating)]);
}
)
is there a more elegant compact way to do it.
More information about the datatable-help
mailing list