Robert Bell
AT&T Labs-Research
In collaboration with
Chris Volinsky, AT&T Labs-Research
& Yehuda Koren, Yahoo! Research
“We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules
• Goal to improve on Netflix’s existing movie recommendation technology
• Contest began October 2, 2006
• Prize
– Based on reduction in root mean squared error
(RMSE) on test data
– $1,000,000 grand prize for 10% drop (19% for MSE)
– Or, $50,000 progress for best result each year
2
• Training data
– 100 million ratings (from 1 to 5 stars)
– 6 years (2000-2005)
– 480,000 users
– 17,770 “movies”
• Test data
– Last few ratings of each user
– Split as shown on next slide
3
• Probe
– Ratings released
– Allows participants to assess methods directly
• Daily submissions allowed for combined Quiz/Test data
– Identity of Quiz cases withheld
– RMSE released for Quiz
– Test RMSE withheld
– Prizes based on Test RMSE
4
40
35
30
25
20
15
10
5
0
Training (m = 3.60)
Probe (m = 3.67)
1 2 3
Rating
4 5
5
Something Happened in Early 2004
2004
6
Most Loved Movies
The Shawshank Redemption
Lord of the Rings :The Return of the King
The Green Mile
Lord of the Rings :The Two Towers
Finding Nemo
Raiders of the Lost Ark
Avg rating
4.593
4.545
4.306
4.460
4.415
4.504
Count
137812
133597
180883
150676
139050
117456
Most Rated Movies
Miss Congeniality
Independence Day
The Patriot
The Day After Tomorrow
Pretty Woman
Pirates of the Caribbean
Highest Variance
The Royal Tenenbaums
Lost In Translation
Pearl Harbor
Miss Congeniality
Napolean Dynamite
Fahrenheit 9/11
User ID
305344
387418
2439493
1664010
2118461
1461435
1639792
1314869
# Ratings Mean Rating
17,651 1.90
17,432
16,560
15,811
1.81
1.22
4.26
14,829
9,820
9,764
9,739
4.08
1.37
1.33
2.95
8
1. Size of data
– Places premium on efficient algorithms
– Stretched memory limits of standard PCs
2. 99% of data are missing
– Eliminates many standard prediction methods
– Certainly not missing at random
3. Training and test data differ systematically
– Test ratings are later
– Test cases are spread uniformly across users
9
4. Countless factors may affect ratings
– Genre, movie/TV series/other
– Style of action, dialogue, plot, music et al.
– Director, actors
– Rater’s mood
5. Large imbalance in training data
– Number of ratings per user or movie varies by several orders of magnitude
– Information to estimate individual parameters varies widely
10
Ratings per Movie in Training Data
Avg #ratings/movie: 5627
11
Avg #ratings/user: 208
12
• How can we estimate as much signal as possible where there are sufficient data, without over fitting where data are scarce?
13
• Personalized recommendations of items
(e.g., movies) to users
• Increasingly common
– To deal with explosive number of choices on the internet
– Netflix
– Amazon
– Many others
14
• A pre-specified list of attributes
• Score each item on all attributes
• User interest obtained for the same attributes
– Direct solicitation, or
– Estimated based on user rating, purchases, or other behavior
15
• Music recommendation system
• Songs rated on 400+ attributes
– Music genome project
– Roots, instrumentation, lyrics, vocals
• Two types of user feedback
– Seed songs
– Thumbs up/down for recommended songs
16
• Avoids need for:
– Determining “proper” content
– Collecting information about items or users
• Infers user-item relationships from purchases or ratings
• Used by Amazon and Netflix
• Two main CF tools
– Nearest neighbors
– Latent factor models
17
• Most common CF tool at the beginning of the contest
•
• Predict rating for a specific user-item pair based on ratings of
– Similar items
– By the same user
– Or vice versa r
ˆ ui
j
N j
N
( i ; u
( i ; u
) s ij
) s ij r uj
• Pearson correlation or cosine similarity
18
• Few modeling assumptions
• Few tuning parameters to learn
• Easy to explain to users
– Dear Amazon.com Customer, We've noticed that customers who have purchased or rated How Does the Show Go On: An Introduction to the Theater by
Thomas Schumacher have also purchased Princess
Protection Program #1: A Royal Makeover (Disney
Early Readers) .
19
• Models with latent classes of items and users
– Individual items and users are assigned to either a single class or a mixture of classes
• Neural networks
– Restricted Boltzmann machines
• Singular Value Decomposition (SVD)
– AKA matrix factorization
– Items and users described by unobserved factors
– Main method used by leaders of competition
20
• Dimension reduction technique for matrices
• Each item summarized by a d -dimensional vector q i
• Similarly, each user summarized by p u
• Choose d much smaller than number of items or users
– e.g., d = 50 << 18,000 or 480,000
• Predicted rating for Item i by User u
– Inner product of q i
– r
ˆ ui
q i
' p u or r
ˆ ui
and p u
a u
b i
q i
' p u
21
serious
Amadeus
Braveheart
The Color Purple
Geared towards females
Sense and
Sensibility
The Princess
Diaries
Ocean’s 11
Lethal Weapon
Geared towards males
Dumb and
Dumber
The Lion King
Independence
Day escapist
22
serious
Amadeus
Braveheart
The Color Purple
Geared towards females
Sense and
Sensibility
The Princess
Diaries
Lethal Weapon
Ocean’s 11
Dave
The Lion King
Independence
Day escapist
Geared towards males
Gus
Dumb and
Dumber
23
• Want to minimize SSE for Test data
• One idea: Minimize SSE for Training data
– Want large d to capture all the signals
– But, Test RMSE begins to rise for d > 2
• Regularization is needed
– Allow rich model where there are sufficient data
– Shrink aggressively where data are scarce
• Minimize
( r ui training
p u
' q i
)
2
u p u
2 i q i
2
24
serious
Amadeus
Braveheart
The Color Purple
Geared towards females
Sense and
Sensibility
The Princess
Diaries
Lethal Weapon
Ocean’s 11
The Lion King
Independence
Day escapist
Geared towards males
Gus
Dumb and
Dumber
25
serious
Amadeus
Braveheart
The Color Purple
Geared towards females
Sense and
Sensibility
The Princess
Diaries
Lethal Weapon
Ocean’s 11
The Lion King
Independence
Day escapist
Geared towards males
Gus
Dumb and
Dumber
26
serious
Amadeus
Braveheart
The Color Purple
Geared towards females
Sense and
Sensibility
The Princess
Diaries
Lethal Weapon
Ocean’s 11
Geared towards males
The Lion King
Independence
Day
Gus
Dumb and
Dumber escapist
27
serious
Amadeus
Braveheart
The Color Purple
Geared towards females
Sense and
Sensibility
The Princess
Diaries
Ocean’s 11
Lethal Weapon
The Lion King escapist
Gus
Independence
Day
Geared towards males
Dumb and
Dumber
28
• Fit by gradient descent
– Loop over observed ratings
– Update each relevant parameter
– Small step in each parameter, proportional to gradient
– Repeat until convergence
• Alternatively, fit by sequence of ridge regressions
– Fix item factors
– Loop over users, estimating user factors
– Do same to estimate item factors
– Repeat until convergence
29
• Fine tune existing methods
• Incorporate alternative “effects”
• Incorporate a variety of modeling methods
• Careful regularization to avoid over fitting
• SVD uses all of a user’s ratings to train the user’s factors
• But what if the user is multiple people?
– Different factor values may apply to movies rated by
Mom vs. Dad vs. the Kids
• This approach computes user factors, p u
, specific to the movie being predicted
– Given all the { q i
}, p u is the solution of a ridge regression
– Weighted ridge regressions with higher weights for movies similar to the target movie
Improvement from Localized SVD
• Very limited feature set
– User, movie, date
– Places focus on models/algorithms
• Major steps forward associated with incorporating new data features
– What movies a user rated
– Temporal effects
33
• What you rate (and don’t) provides information about your preferences
• Paterek’s NSVD explicitly characterizes users by which movies they like
• Incorporate what a user rated into the user factor
– r
ˆ ui
a u
b i
q i
'
p u
|N(u)|
1 / 2
j
N ( u ) y j
• Substantially reduces RMSE
34
• User behavior may change over time
– Ratings go up or down
– Interests change
– For example, with addition of a new rater
• Allow user biases and/or factors to change over time
– r
ˆ ui
( t )
– Model a u a u
( t )
b i
( t )
( t ) and p u q i
'
p u
( t )
|N(u)|
1 / 2
j
N ( u ) y j
( t ) as linear, unrestricted, or a sum of both types
35
serious
Amadeus
Braveheart
The Color Purple
Geared towards females
Sense and
Sensibility
The Princess
Diaries
Lethal Weapon
Ocean’s 11
The Lion King
Independence
Day escapist
Geared towards males
Gus
Dumb and
Dumber
serious
Amadeus
Braveheart
The Color Purple
Geared towards females
Sense and
Sensibility
The Princess
Diaries
Lethal Weapon
Ocean’s 11
The Lion King
Independence
Day escapist
Geared towards males
Gus
Dumb and
Dumber
serious
Amadeus
Braveheart
The Color Purple
Geared towards females
Sense and
Sensibility
The Princess
Diaries
Lethal Weapon
Ocean’s 11
The Lion King
Independence
Day escapist
Geared towards males
Gus
Dumb and
Dumber
The Color Purple serious
Amadeus
Braveheart
Geared towards females
Sense and
Sensibility
Gus +
The Princess
Diaries
Ocean’s 11
Lethal Weapon
Geared towards males
Dumb and
Dumber
The Lion King
Independence
Day escapist
• Allowed anyone to approach early leaders
– Powerful predictor
– Efficient
– Easy to program
• Flexibility to incorporate additional features
– Implicit feedback
– Temporal effects
– Neighborhood effects
• Accurate regularization is essential
40
Factor models: RMSE vs. #parameters
0.905
0.900
0.895
0.890
0.885
0.880
0.875
10
50
100
200
Basic SVD
… + What was Rated
… + Linear Time Factors
… + Per-Day User Biases
… + per-Day User Factors
50
100
50
200
100
200
500
100
200
500
50
100
200
500 1000
1500
100 1000
Millions of Parameters
10000 100000
41
#3: The Wisdom of Crowds (of Models)
• All models are wrong; some are useful – G. Box
• Used linear blends of many prediction sets
– 107 in Year 1
– Over 800 at the end
• Difficult, or impossible, to build the grand unified model
• Mega blends are not needed in practice
– A handful of simple models achieves 80 percent of the improvement of the full blend
• Yehuda Koren
– The engine of progress for the Netflix Prize
– Implicit feedback
– Temporal effects
– Nearest neighbor modeling
• Big Chaos: Michael Jahrer, Andreas Toscher (Year 2)
– Optimization of tuning parameters
– Blending methods
• Pragmatic Theory: Martin Chabbert, Martin Piotte (Year 3)
– Some movies age better than others
– Link functions
43
44
• The Ensemble: 0.856714
45
• The Ensemble: 0.856714
• BellKor’s Pragmatic Theory: 0.856704
46
• The Ensemble: 0.856714
• BellKor’s Pragmatic Theory: 0.856704
• Both scores round to 0.8567
47
• The Ensemble: 0.856714
• BellKor’s Pragmatic Theory: 0.856704
• Both scores round to 0.8567
• Tie breaker is submission date/time
48
49
• AT&T’s donated its full share to organizations supporting science education
• Young Science Achievers Program
• New Jersey Institute of Technology pre-college and educational opportunity programs
• North Jersey Regional Science Fair
• Neighborhoods Focused on African American
Youth
• Big Success for Netflix
– Lots of cheap labor, good publicity
– Already incorporated 6 percent improvement
– Potential for much more using other data they have
• Big advances to the science of recommender systems
– Regularized SVD
– Identification of new features
– Understanding nearest neighbors
– Contributions to literature
51
• Industrial strength data
• Very good design
• Accessibility to anyone with a PC
• Free flow of ideas
– Leaderboard
– Forum
– Workshop and papers
• Money?
52
• Need a conceptually simple task
• Winner-take-all has drawbacks
• Intellectual property and liability issues
• How many prizes can overlap?
53
• rbell@research.att.com
• www.netflixprize.com
– …/leaderboard
– …/community
• Click BellKor’s Pragmatic Chaos or The
Ensemble on Leaderboard for details
54