[datatable-help] converting sql to data table with subqueriesand exists clauses
Matthew Dowle
mdowle at mdowle.plus.com
Sun Feb 26 18:00:12 CET 2012
So the new column should be:
1 when user voted 5 and there's another user who rated the same movie 1.
5 when user rated 1 and there's another user who voted the same movie 5.
NA otherwise
If that's right, then why is avg needed? Avg of a set of 1's is 1, and a
set of 5's is 5. Anyway, how about this :
DT = data.table(user=c(1,1,2,2,3,4,4),
movie=c(2,3,3,4,5,4,7),
rating=c(3,4,4,5,1,1,3))
setkey(DT, movie, rating)
DT[,newcol:=NA_integer_]
DT[ J(DT[rating==1L,unique(movie)],5L), newcol:=1L, mult="first"]
DT[ J(DT[rating==5L,unique(movie)],1L), newcol:=5L, mult="first"]
DT[order(user,movie)]
user movie rating newcol
[1,] 1 2 3 NA
[2,] 1 3 4 NA
[3,] 2 3 4 NA
[4,] 2 4 5 1
[5,] 3 5 1 NA
[6,] 4 4 1 5
[7,] 4 7 3 NA
On Fri, 2012-02-24 at 18:45 -0500, DS wrote:
> A user could rate multiple times but lets please ignore that fact as that is not the case in this data and i just made up fake data below and made a mistake.
>
> The new column is what i want to compute using data table and add to the original data.
>
> so the output should be:
> user,mv,rating,somedate, Avg_disimilar_users
> 1,2,3,date,NA
> 1,3,4,date,NA
> 2,3,4,date,NA
> 2,4,5,date,1
> 3,5,1,date,NA
> 4,4,1,date,5
> 4,7,3,date,na
>
> input:
>
> each row is : user, movie, rating, time.
> user,mv,rating,somedate
> 1,2,3,date
> 1,3,4,date,
> 2,3,4,date,
> 2,4,5,date,
> 3,5,1,date
> 4,4,1,date
> 4,7,3,date
>
> so here for user 2, movie 4, the new column called avg disimilar rating should be 1.
> for user 4, movie 4, the new column should be 5.
>
> the new column Avg_disimilar users which I want is the avg of users who rated differently on the movie in question on that row of the data.
> when users dont have movies in common 4 apart the result should be NA or empty.
>
> Hope that helps.
>
> Dhruv
>
>
> -----Original Message-----
> From: "Matthew Dowle" [mdowle at mdowle.plus.com]
> Date: 02/24/2012 04:43 AM
> To: ds5j at excite.com
> CC: datatable-help at r-forge.wu-wien.ac.at
> Subject: Re: converting sql to data table with subqueriesand exists clauses
>
>
> Dhruv,
>
> But user 4 voted for movie 4 twice (first 1*, then 3*). Which is something
> I already asked about (but you haven't addressed). Further, you've
> described a new column. That implies a new value for every row. There are
> 7 rows but you've only stated what 2 of those 7 values should be.
>
> Please read through this in full :
>
> http://www.catb.org/~esr/faqs/smart-questions.html
>
> Matthew
>
> each row is : user, movie, rating, time.
> user,mv,rating,somedate
> 1,2,3,date
> 1,3,4,date,
> 2,3,4,date,
> 2,4,5,date,
> 3,5,1,date
> 4,4,1,date
> 4,4,3,date
>
> so here for user 2, movie 4, the new column called avg disimilar rating should be 1.
> for user 4, movie 4, the new column should be 5.
>
> this column is a disimilar column.
>
> Dhruv
>
>
More information about the datatable-help
mailing list