[datatable-help] converting sql to data table with subqueriesand exists clauses

Matthew Dowle mdowle at mdowle.plus.com
Sun Feb 26 18:00:12 CET 2012


So the new column should be:

1 when user voted 5 and there's another user who rated the same movie 1.
5 when user rated 1 and there's another user who voted the same movie 5.
NA otherwise

If that's right, then why is avg needed? Avg of a set of 1's is 1, and a
set of 5's is 5. Anyway, how about this :

DT = data.table(user=c(1,1,2,2,3,4,4),
                movie=c(2,3,3,4,5,4,7),
                rating=c(3,4,4,5,1,1,3))
setkey(DT, movie, rating)
DT[,newcol:=NA_integer_]
DT[ J(DT[rating==1L,unique(movie)],5L), newcol:=1L, mult="first"]
DT[ J(DT[rating==5L,unique(movie)],1L), newcol:=5L, mult="first"]
DT[order(user,movie)]
     user movie rating newcol
[1,]    1     2      3     NA
[2,]    1     3      4     NA
[3,]    2     3      4     NA
[4,]    2     4      5      1
[5,]    3     5      1     NA
[6,]    4     4      1      5
[7,]    4     7      3     NA


On Fri, 2012-02-24 at 18:45 -0500, DS wrote:
> A user could rate multiple times but lets please ignore that fact  as that is not the case in this data and i just made up fake data below and made a mistake.
> 
> The new column is what i want to compute using data table and add to the original data.
> 
> so the output should be:
> user,mv,rating,somedate, Avg_disimilar_users
> 1,2,3,date,NA
> 1,3,4,date,NA
> 2,3,4,date,NA
> 2,4,5,date,1
> 3,5,1,date,NA
> 4,4,1,date,5
> 4,7,3,date,na
> 
> input:
> 
> each row is : user, movie, rating, time.
> user,mv,rating,somedate
> 1,2,3,date
> 1,3,4,date,
> 2,3,4,date,
> 2,4,5,date,
> 3,5,1,date
> 4,4,1,date
> 4,7,3,date
> 
> so here for user 2, movie 4, the new column called avg disimilar rating should be 1.
> for user 4, movie 4, the new column should be 5.
> 
> the new column Avg_disimilar users which I want is the avg of users who rated differently on the movie in question on that row of the data.
> when users dont have movies in common 4 apart the result should be NA or empty.
> 
> Hope that helps.
> 
> Dhruv
> 
> 
> -----Original Message-----
> From: "Matthew Dowle" [mdowle at mdowle.plus.com]
> Date: 02/24/2012 04:43 AM
> To: ds5j at excite.com
> CC: datatable-help at r-forge.wu-wien.ac.at
> Subject: Re: converting sql to data table with subqueriesand exists clauses
> 
> 
> Dhruv,
> 
> But user 4 voted for movie 4 twice (first 1*, then 3*). Which is something
> I already asked about (but you haven't addressed). Further, you've
> described a new column. That implies a new value for every row. There are
> 7 rows but you've only stated what 2 of those 7 values should be.
> 
> Please read through this in full :
> 
>     http://www.catb.org/~esr/faqs/smart-questions.html
> 
> Matthew
> 
> each row is : user, movie, rating, time.
> user,mv,rating,somedate
> 1,2,3,date
> 1,3,4,date,
> 2,3,4,date,
> 2,4,5,date,
> 3,5,1,date
> 4,4,1,date
> 4,4,3,date
> 
> so here for user 2, movie 4, the new column called avg disimilar rating should be 1.
> for user 4, movie 4, the new column should be 5.
> 
> this column is a disimilar column.
> 
> Dhruv
> 
> 




More information about the datatable-help mailing list