[datatable-help] converting sql to data table with subqueries and exists clauses

DS ds5j at excite.com
Thu Feb 23 02:24:27 CET 2012


I took a crack at converting the following sql to data table.

it is working off of the movielens rating data (userid,rating,itemid, timestamp)

> (select ave(rating) from data x
> where x.itemid=a.itemid and x.userid!=a.userid
> and exists (select 1 from
> (select * from data y where y.userid=x.userid) y,
> (select * from data z where z.userid=a.userid) z
> where y.itemid=z.itemid
> and y.itemid!=a.itemid
> and abs(y.rating-z.rating)>=4 )
> ) pred_rating_disimilar
> from data1 a')
> 

my data table attempt is below but it takes 13 hours to run:
#create columns in dataframe to hold results
dt$meanDisimilar<-NA
dt$countDisimilar<-NA

system.time(
for (i in 1:nrow(data1))
{

	a<- dt[i,]

	# get all users for movie
        b<- data.table(dt[dt$itemid==a$itemid & dt$userid!=a$userid,list(userid)])

        # get movies for users of this movie
	c<-data.table(dt[dt$userid==a$userid,list(userid,itemid,rating)])
   
	# get all movies for other users who rated this movie
	d<- data.table(dt[dt$userid==b$userid ,list(userid,itemid,rating)])

        setkey(c,itemid)
	 setkey(d,itemid)


	e<-c[d,]
	names(e)<-c("itemid", "userid0", "rating0", "userid", "rating")

	#get all users where share with thi user and where rating >=4
	f<-e[ abs(rating0-rating)>=4,list(userid)]

        # get avg rating for these users on this movie
	dt[i,]$meanDisimilar <- dt[dt$userid==f$userid & dt$itemid==a$itemid,mean(rating)]
	dt[i,]$countDisimilar <- dt[dt$userid==f$userid & dt$itemid==a$itemid,nrow(rating)]
	#print(dt[dt$userid==f$userid & dt$itemid==a$itemid,mean(rating)]);
}
)

is there a more elegant compact way to do it.



More information about the datatable-help mailing list