CF-1 - SEAS

The BellKor 2008 Solution to the Netflix Prize by Leenarat Leelapanyalert Netflix Dataset • Over 100 million movie ratings with date-stamp (100,480,507 ratings) • M = 17,770 movies • N = 480,189 customers • 1 (star) = no interest, 5(stars) = strong interest • Dec 31, 1999 – Dec 31, 2005 The user-item matrix N*M = 8,532,958,530 elements 98.9% values are missing Netflix Competition • 4.2 from 100 million ratings – Training set (Probe set) – Qualifying set (Quiz set & Test set) • Scoring – Show RMSE achieved on the Quiz set – Best RMSE on the Test set → THE WINNER!! Outline • Necessary index letters • Baseline predictors – With temporal effects • Latent factor models – with temporal effects • Neighborhood models – with temporal effects • Integrated models • Extra: Shrinking towards recent actions Outline • Necessary index letters • Baseline predictors → Adjust deviations of – With temporal effects each user (rater, • Latent factor models customer) and item – with temporal effects (movie) • Neighborhood models – with temporal effects • Integrated models • Extra: Shrinking towards recent actions Outline • Necessary index letters • Baseline predictors – With temporal effects Compare between • Latent factor models → items and users – with temporal effects by SVD • Neighborhood models – with temporal effects • Integrated models • Extra: Shrinking towards recent actions Outline • Necessary index letters • Baseline predictors – with temporal effects • Latent factor models – with temporal effects Compute the • Neighborhood models→ relationship between items – with temporal effects • Integrated models (or users) • Extra: Shrinking towards recent actions Outline • Necessary index letters • Baseline predictors - with temporal effects • Latent factor models –with temporal effects • Neighborhood models –with temporal effects • Integrated models • Extra: Other methods – Shrinking towards recent actions – Blending multiple solutions Outline • Necessary index letters • Baseline predictors – with temporal effects • Latent factor models – with temporal effects • Neighborhood models – with temporal effects Combine Latent factor models and Neighborhood • Integrated models → models together • Extra:Shrinking towards recent actions Outline • Necessary index letters • Baseline predictors – with temporal effects • Latent factor models – with temporal effects • Neighborhood models – with temporal effects • Integrated models • Extra: Shrinking towards recent actions → New ideas Index Letters • • • • • • • • • u,v → i,j → rui → ^ rui → tui → K → R(u) → R(i) → N(u) → users, raters, or customers movies, or items the score by user u of movie i predicted value of rui the time of rating rui the training set which rui is known all the items for which rating by u the set of users who rated item i all items that can estimated u’s score Baseline Predictors (bui) bui    bu  bi µ → the overall average rating bu → deviations of user u bi → deviation of item i Example: µ = 3.7, Simha(bu) = -0.3, Titanic (bi) = 0.5 bui = 3.7 – 0.3 + 0.5 = 3.9 stars Estimate Parameter (bu, bi) – Formula bui    bu  bi bi  bu   uR (i ) (r   ) 1  R(i )  iR ( u ) (r    bi ) 2  R(u ) The regularization parameters (𝜆1,𝜆2) are determined by validation on the Probe set. In this case: 𝜆1 = 25, 𝜆2 = 10 Estimate Parameter (bu, bi) – The Least Squares Problem bui    bu  bi min b*  ( u ,i )K (rui    bu  bi ) 2  1 ( bu2   bi2 ) u i Estimate Parameter (bu, bi) – The Least Squares Problem bui    bu  bi min b*  ( u ,i )K (rui    bu  bi ) 2  1 ( bu2   bi2 ) to fit the given rating u i to avoid overfitting by penalizing the magnitudes of the parameters Time Change VS Baseline Predictors • An item’s popularity may change over time • Users change their baseline rating over time bui    bu  bi bui    bu (tui )  bi (tui ) bi(tui) • We do not expect movie likeability to fluctuate on a daily basis • Time periods → Bins • 30 bins bui    bu (tui )  bi (tui ) bi (t )  bi  bi , Bin(t ) bu(tui) • Unlike movies, user effects can change on a daily basis • Time deviation devu (t )  sign (t  tu )  t  tu  tu → the mean date of rating by tu t → the date that user u rated the movie β = 0.4 by validation on the Probe set bu(tui) bui    bu (tui )  bi (tui ) b (t )  bu   u  devu (t ) (1) u devu (t )  sign (t  tu )  t  tu • Suit well with gradual drifts  bu(tui) • How about sudden drifts? – Since we found that multiple ratings that a user gives in a single day b (t )  bu   u  devu (t )  but ( 3) u • A user rates on 40 different days on average • Thus, but requires about 40 parameters per user Baseline Predictors bui (t )    bu   u  devu (tui )  bu ,tui  bi  bi , Bin(tui ) Baseline Predictors Bu (user bias) Bi (movie bias) bui (t )    bu   u  devu (tui )  bu ,tui  bi  bi , Bin(tui ) Baseline Predictors Bu (user bias) Bi (movie bias) bui (t )    bu   u  devu (tui )  bu ,tui  bi  bi , Bin(tui ) • Movie bias is not completely user-independent bui (t )    bu   u  devu (tui )  bu ,tu i  (bi  bi , Bin(tu i ) )  cu (tui ) cu(t) → time-dependent scaling feature cu → (stable part) cut → (day-specific variable) cu (t )  cu  cut bui (t )    bu   u  devu (tui )  bu ,tu i  (bi  bi , Bin(tu i ) )  cu (tui ) RMSE = 0.9555 Frequencies (additional) • The number of ratings a user gave on a specific day SIGNIFICANT f ui  log a Fui Fui → the overall number of ratings that user u gave on day tui bui (t )    bu   u  devu (tui )  bu ,tui  (bi  bi , Bin(tui ) )  cu (tui )  bi , f ui bif → the bias specific for the item i at log-frequency f RMSE 0.9555 → 0.9278 Why Frequencies Work? • Bad when using with user-movie interaction terms • Nothing when using with user-related parameters • Rate a lot in a bulk → Not closely to the actual watching day – Positive approach – Negative approach • High frequencies (or bulk ratings) do not represent much change in people’s taste, but mostly biased selection of movies Predicting Future Days • The day-specific parameters should be set to default value • cu(tui) = cu • bu,t = 0 • The transient temporal model doesn’t attempt to capture future changes. Latent Factor Models • To transform both items and users to the same latent factor space – Obvious dimensions • Comedy VS Drama • Amount of action • Orientation to children – Less well defined dimensions • Depth of character development • Tool → SVD Singular Value Decomposition (SVD) • Factoring matrices into a series of linear approximations that expose the underlying structure of the matrix Singular Value Decomposition (SVD) Predicted Score = User Baseline Rating * Movie Average Score A B C Simha 4 4 4 4 Ateeq 5 5 5 5 Smith 3 3 3 3 Greg 4 4 4 Mcq 4 4 4 Ramin 4 4 4 4 Xiao 4 4 4 3 Wu 3 3 3 5 Riz 5 5 5 = 4 4 4 * 1 1 1 Singular Value Decomposition (SVD) Predicted Score = User Baseline Rating * Movie Average Score A B C Simha 4 4 5 Ateeq 4 5 5 Smith 3 3 2 Greg 4 5 4 Mcq 4 4 4 Ramin 3 5 4 Xiao 4 4 3 Wu 2 4 4 Riz 5 5 5 Singular Value Decomposition (SVD) Predicted Score = User Baseline Rating * Movie Average Score A B C Simha 3.95 4.64 4.34 4.34 Ateeq 4.27 5.02 4.69 4.69 Smith 2.42 2.85 2.66 2.66 Greg 3.97 4.67 4.36 Mcq 3.64 4.28 4.00 Ramin 3.69 4.33 4.05 3.66 Xiao 3.33 3.92 3.66 3.39 Wu 3.08 3.63 3.39 5.00 Riz 4.55 5.35 5.00 = 4.36 4.00 4.05 * 0.91 1.07 1.00 Singular Value Decomposition (SVD) Predicted Score = User Baseline Rating * Movie Average Score A B C Simha 3.95 4.64 4.34 5 Ateeq 4.27 5.02 4.69 3 2 Smith 2.42 2.85 2.66 4 5 4 Greg 3.97 4.67 4.36 Mcq 4 4 4 Mcq 3.64 4.28 4.00 Ramin 3 5 4 Ramin 3.69 4.33 4.05 Xiao 4 4 3 Xiao 3.33 3.92 3.66 Wu 2 4 4 Wu 3.08 3.63 3.39 Riz 5 5 5 Riz 4.55 5.35 5.00 A B C Simha 4 4 5 Ateeq 4 5 Smith 3 Greg - Singular Value Decomposition (SVD) Predicted Score = User Baseline Rating * Movie Average Score A B C Simha 0.05 -0.64 0.66 -0.18 Ateeq -0.28 -0.02 0.31 -0.38 Smith 0.58 0.15 -0.66 0.80 Greg 0.03 0.33 -0.36 Mcq 0.36 -0.28 0.00 0.67 -0.05 0.89 Ramin -0.69 = 0.15 0.35 -0.67 Xiao 0.67 0.08 -0.66 -1.29 Wu -1.08 0.37 0.61 0.44 Riz 0.45 -035 0.00 * 0.82 -0.20 -0.53 Singular Value Decomposition (SVD) Predicted Score = User Baseline Rating * Movie Average Score A B C Simha 4 4 5 4.34 -0.18 -0.90 Ateeq 4 5 5 4.69 -0.38 -0.15 Smith 3 3 2 2.66 0.80 0.40 4.36 0.15 0.47 4.00 0.35 -0.29 4.05 -0.67 0.68 = Greg 4 5 4 Mcq 4 4 4 Ramin 3 5 4 3.66 0.89 0.33 Xiao 4 4 3 3.39 -1.29 0.14 Wu 2 4 4 5.00 0.44 -0.36 Riz 5 5 5 * 0.91 1.07 1.00 0.82 -0.20 -0.53 -0.21 0.76 -0.62 Latent Factor Models rûi  bui  puT qi pu → user-factors vector qi → item-factors vector • Add implicit feedback – Asymmetric-SVD    12  12 rûi  bui  q  R(u )  (ruj  buj ) x j  N (u )  y j  jR ( u ) jN ( u )   T i – SVD++    12 rûi  bui  q  pu  N (u )  y j  jN ( u )   T i 60 factors RMSE = 0.8966 Temporal Effects • Time – Movie biases – go in and out of popularity over time bi – User biases – user change their baseline ratings over time bu – User preferences – genre, perception on actors and directors, household pu    12 rûi  bui (t )  q  pu (t )  N (u )  y j  jN ( u )   T i Temporal Effects b (t )  bu   u  devu (t ) (1) u b (t )  bu   u  devu (t )  but ( 3) u • The same way we treat user bias we can also treat the user preferences pu (t )T   pu1 (t ), pu 2 (t ),..., puf (t ) p (t )  puk   uk  devu (t ) (1) uk p (t )  p (t )  puk ,t (3) uk (1) uk k=1,2,…,f k=1,2,…,f RMSE    12 rûi  bui (t )  q  pu (t )  N (u )  y j  jN ( u )   T i p (t )  puk   uk  devu (t ) (1) uk p (t )  p (t )  puk ,t (3) uk (1) uk f = 500 RMSE = 0.8815 f = 500 RMSE = 0.8841 !! • Most accurate factor model (add frequencies) rûi  bui (t )  (q  q T i f = 500, RMSE = 0.8784 T i , f ui    12 ) pu (t )  N (u )  y j  jN ( u )   f = 2000, RMSE = 0.8762 Neighborhood Models • To compute the relationship between items • Evaluate the score of a user to an item based on ratings of similar items by the same user The Similarity Measure • The Pearson correlation coefficient, ρij The Similarity Measure • The Pearson correlation coefficient, ρij nij def sij  nij  2  ij ;λ2 = 100 sij – similarity nij – the number of users that rated both i and j • A weighted average of the ratings of neighborhood items rûi  bui   s (ruj  buj ) jS k ( i ;u ) ij  s jS k ( i ;u ) ij Sk(i;u) – the set of k items rated by u, which are most similar to i Problem With The Model rûi  bui   s (ruj  buj ) jS k ( i ;u ) ij  s jS k ( i ;u ) ij • Isolate the relations between 2 items • Fully rely on the neighbors, even if they are absent rûi  bui   (r jR ( u ) • The wij’s are not user specific • Sum over all item rated by u uj  buj ) wij Improving The Model rûi  bui   s (ruj  buj ) jS k ( i ;u ) ij  s jS k ( i ;u ) ij • Isolate the relations between 2 items • Fully rely on the neighbors, even if they are absent rûi  bui   (r jR ( u ) uj  buj ) wij • The wij’s are not user specific • Sum over all item rated by u rûi  bui   (r jR ( u ) uj  buj ) wij  c jN ( u ) • Not only what he rated, but also what he did not rate. • cij is expected to be high if j is predictive on i ij Improving The Model rûi  bui   (r jR ( u ) uj  buj ) wij  c jN ( u ) ij • The current model somewhat overemphasizes the dichotomy between heavy raters and those that rarely rate • Moderate this behavior by normalization rûi  bui  R(u )  12  (r jR ( u ) uj  buj ) wij  N (u )  12 c jN ( u ) ij • 𝛼 = 0 → non-normalized rule – encourages greater deviations • 𝛼 = 1 → fully normalized rule – eliminate the effect of number of rating • In this case, 𝛼 = 0.5 RMSE = 0.9002 Improving The Model rûi  bui  R(u )  12  (r uj jR ( u )  buj ) wij  N (u )  12 c jN ( u ) ij RMSE = 0.9002 • Reduce the model by pruning parameters rûi  bui  R (i; u ) k  12  (r uj  buj ) wij  N (i; u ) k jR ( i ;u ) k Sk(i) – the set of k items most similar i def R k (i; u )  R(u )  S k (i) def N k (i; u )  N (u )  S k (i) k = 17,770 → RMSE = 0.8906 k = 2000 → RMSE = 0.9067  12 c ij jN ( i ;u ) k Integrated Models • Baseline predictors + Factor models + Neighborhood models  (1)   12  12  12 k k rûi    b (t )  bi (t )  q  pu (t )  N (u )  y j   R (i; u ) (ruj  buj ) wij  N (i; u ) cij   jN ( u )  jR k ( i ;u ) jN k ( i ;u )  (1) u T i f = 170, k = 300 → RMSE = 0.8827 • Further improve accuracy, we add a more elaborated temporal model for the user bias  (1)   12  12  12 k k rûi    b (t )  bi (t )  q  pu (t )  N (u )  y j   R (i; u ) (ruj  buj ) wij  N (i; u) cij   jN ( u )  jR k ( i ;u ) jN k ( i ;u )  ( 3) u T i f = 170, k = 300 → RMSE = 0.8786 EXTRA: Shrinking Towards Recent Actions • To correct rui • Shrink rui towards the average rating of u on day t • The single day effect is among the strongest temporal effects in data   rûi  cut  rut α = 8 cut  nut  exp(  Vut ) β = 11   cut nut – the number of ratings u gave on day t rut – the mean rating of u at day t Vut – the variance of u’s ratings at day t Shrinking Towards Recent Actions • A stronger corrections accounts for periods longer than a single day • And tries to characterize the recent user behavior on similar movies wiju  sij  exp(  tui  tuj ) cut  nui  exp(  Vui ) nui  u w  ij u _ rated _ j rûi  cui  rui 1  cui rui  Vui  w r w u ij u _ rated _ j uj u ij u _ rated _ j  w  (r w u ij u _ rated _ j u ij u _ rated _ j uj )2  (rui ) 2 Q&A

CF-1 - SEAS

Related documents

Products

Support

CF-1 - SEAS

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib