BobBell_Dimacs1

Lessons from the Netflix Prize

Robert Bell

AT&T Labs-Research

In collaboration with

Chris Volinsky, AT&T Labs-Research

& Yehuda Koren, Yahoo! Research

“We’re quite curious, really. To the tune of one million dollars.” – Netflix Prize rules

• Goal to improve on Netflix’s existing movie recommendation technology

• Contest began October 2, 2006

• Prize

– Based on reduction in root mean squared error

(RMSE) on test data

– $1,000,000 grand prize for 10% drop (19% for MSE)

– Or, $50,000 progress for best result each year

2

Data Details

• Training data

– 100 million ratings (from 1 to 5 stars)

– 6 years (2000-2005)

– 480,000 users

– 17,770 “movies”

• Test data

– Last few ratings of each user

– Split as shown on next slide

3

Test Data Split into Three Pieces

• Probe

– Ratings released

– Allows participants to assess methods directly

• Daily submissions allowed for combined Quiz/Test data

– Identity of Quiz cases withheld

– RMSE released for Quiz

– Test RMSE withheld

– Prizes based on Test RMSE

4

Higher Mean Rating in Probe Data

40

35

30

25

20

15

10

5

0

Training (m = 3.60)

Probe (m = 3.67)

1 2 3

Rating

4 5

5

Something Happened in Early 2004

2004

6

Data about the Movies

Most Loved Movies

The Shawshank Redemption

Lord of the Rings :The Return of the King

The Green Mile

Lord of the Rings :The Two Towers

Finding Nemo

Raiders of the Lost Ark

Avg rating

4.593

4.545

4.306

4.460

4.415

4.504

Count

137812

133597

180883

150676

139050

117456

Most Rated Movies

Miss Congeniality

Independence Day

The Patriot

The Day After Tomorrow

Pretty Woman

Pirates of the Caribbean

Highest Variance

The Royal Tenenbaums

Lost In Translation

Pearl Harbor

Miss Congeniality

Napolean Dynamite

Fahrenheit 9/11

Most Active Users

User ID

305344

387418

2439493

1664010

2118461

1461435

1639792

1314869

# Ratings Mean Rating

17,651 1.90

17,432

16,560

15,811

1.81

1.22

4.26

14,829

9,820

9,764

9,739

4.08

1.37

1.33

2.95

8

Major Challenges

1. Size of data

– Places premium on efficient algorithms

– Stretched memory limits of standard PCs

2. 99% of data are missing

– Eliminates many standard prediction methods

– Certainly not missing at random

3. Training and test data differ systematically

– Test ratings are later

– Test cases are spread uniformly across users

9

Major Challenges (cont.)

4. Countless factors may affect ratings

– Genre, movie/TV series/other

– Style of action, dialogue, plot, music et al.

– Director, actors

– Rater’s mood

5. Large imbalance in training data

– Number of ratings per user or movie varies by several orders of magnitude

– Information to estimate individual parameters varies widely

10

Ratings per Movie in Training Data

Avg #ratings/movie: 5627

11

Ratings per User in Training Data

Avg #ratings/user: 208

12

The Fundamental Challenge

• How can we estimate as much signal as possible where there are sufficient data, without over fitting where data are scarce?

13

Recommender Systems

• Personalized recommendations of items

(e.g., movies) to users

• Increasingly common

– To deal with explosive number of choices on the internet

– Netflix

– Amazon

– Many others

14

Content Based Systems

• A pre-specified list of attributes

• Score each item on all attributes

• User interest obtained for the same attributes

– Direct solicitation, or

– Estimated based on user rating, purchases, or other behavior

15

Pandora

• Music recommendation system

• Songs rated on 400+ attributes

– Music genome project

– Roots, instrumentation, lyrics, vocals

• Two types of user feedback

– Seed songs

– Thumbs up/down for recommended songs

16

Collaborative Filtering (CF)

• Avoids need for:

– Determining “proper” content

– Collecting information about items or users

• Infers user-item relationships from purchases or ratings

• Used by Amazon and Netflix

• Two main CF tools

– Nearest neighbors

– Latent factor models

17

Nearest Neighbor Methods

• Most common CF tool at the beginning of the contest

•

• Predict rating for a specific user-item pair based on ratings of

– Similar items

– By the same user

– Or vice versa r

ˆ ui





 j



N j



N

( i ; u

( i ; u

) s ij

) s ij r uj

• Pearson correlation or cosine similarity

18

Merits of Nearest Neighbors

• Few modeling assumptions

• Few tuning parameters to learn

• Easy to explain to users

– Dear Amazon.com Customer, We've noticed that customers who have purchased or rated How Does the Show Go On: An Introduction to the Theater by

Thomas Schumacher have also purchased Princess

Protection Program #1: A Royal Makeover (Disney

Early Readers) .

19

Latent Factor Models

• Models with latent classes of items and users

– Individual items and users are assigned to either a single class or a mixture of classes

• Neural networks

– Restricted Boltzmann machines

• Singular Value Decomposition (SVD)

– AKA matrix factorization

– Items and users described by unobserved factors

– Main method used by leaders of competition

20

SVD

• Dimension reduction technique for matrices

• Each item summarized by a d -dimensional vector q i

• Similarly, each user summarized by p u

• Choose d much smaller than number of items or users

– e.g., d = 50 << 18,000 or 480,000

• Predicted rating for Item i by User u

– Inner product of q i

– r

ˆ ui

 q i

' p u or r

ˆ ui

 and p u

  a u

 b i

 q i

' p u

21

serious

Amadeus

Braveheart

The Color Purple

Geared towards females

Sense and

Sensibility

The Princess

Diaries

Ocean’s 11

Lethal Weapon

Geared towards males

Dumb and

Dumber

The Lion King

Independence

Day escapist

22

serious

Amadeus

Braveheart

The Color Purple


Sense and

Sensibility

The Princess

Diaries

Lethal Weapon

Ocean’s 11

Dave

The Lion King

Independence

Day escapist


Gus

Dumb and

Dumber

23

Regularization for SVD

• Want to minimize SSE for Test data

• One idea: Minimize SSE for Training data

– Want large d to capture all the signals

– But, Test RMSE begins to rise for d > 2

• Regularization is needed

– Allow rich model where there are sufficient data

– Shrink aggressively where data are scarce

• Minimize 

( r ui training

 p u

' q i

)

2  





 u p u

2   i q i

2





24

serious

Amadeus

Braveheart

The Color Purple


Sense and

Sensibility

The Princess

Diaries

Lethal Weapon

Ocean’s 11

The Lion King

Independence

Day escapist


Gus

Dumb and

Dumber

25

serious

Amadeus

Braveheart

The Color Purple


Sense and

Sensibility

The Princess

Diaries

Lethal Weapon

Ocean’s 11

The Lion King

Independence

Day escapist


Gus

Dumb and

Dumber

26

serious

Amadeus

Braveheart

The Color Purple


Sense and

Sensibility

The Princess

Diaries

Lethal Weapon

Ocean’s 11


The Lion King

Independence

Day

Gus

Dumb and

Dumber escapist

27

serious

Amadeus

Braveheart

The Color Purple


Sense and

Sensibility

The Princess

Diaries

Ocean’s 11

Lethal Weapon

The Lion King escapist

Gus

Independence

Day


Dumb and

Dumber

28

Estimation for SVD

• Fit by gradient descent

– Loop over observed ratings

– Update each relevant parameter

– Small step in each parameter, proportional to gradient

– Repeat until convergence

• Alternatively, fit by sequence of ridge regressions

– Fix item factors

– Loop over users, estimating user factors

– Do same to estimate item factors

– Repeat until convergence

29

Improvements to

Collaborative Filtering

• Fine tune existing methods

• Incorporate alternative “effects”

• Incorporate a variety of modeling methods

• Careful regularization to avoid over fitting

Localized SVD

• SVD uses all of a user’s ratings to train the user’s factors

• But what if the user is multiple people?

– Different factor values may apply to movies rated by

Mom vs. Dad vs. the Kids

• This approach computes user factors, p u

, specific to the movie being predicted

– Given all the { q i

}, p u is the solution of a ridge regression

– Weighted ridge regressions with higher weights for movies similar to the target movie

Improvement from Localized SVD

Lesson #1: Data >> Models

• Very limited feature set

– User, movie, date

– Places focus on models/algorithms

• Major steps forward associated with incorporating new data features

– What movies a user rated

– Temporal effects

33

You are What You Rate

• What you rate (and don’t) provides information about your preferences

• Paterek’s NSVD explicitly characterizes users by which movies they like

• Incorporate what a user rated into the user factor

– r

ˆ ui

   a u

 b i

 q i

'





 p u



|N(u)|

1 / 2

 j



N ( u ) y j







• Substantially reduces RMSE

34

Temporal Effects

• User behavior may change over time

– Ratings go up or down

– Interests change

– For example, with addition of a new rater

• Allow user biases and/or factors to change over time

– r

ˆ ui

( t )

  

– Model a u a u

( t )

 b i

( t )



( t ) and p u q i

'





 p u

( t )



|N(u)|

1 / 2

 j



N ( u ) y j







( t ) as linear, unrestricted, or a sum of both types

35

serious

Amadeus

Braveheart

The Color Purple


Sense and

Sensibility

The Princess

Diaries

Lethal Weapon

Ocean’s 11

The Lion King

Independence

Day escapist


Gus

Dumb and

Dumber

serious

Amadeus

Braveheart

The Color Purple


Sense and

Sensibility

The Princess

Diaries

Lethal Weapon

Ocean’s 11

The Lion King

Independence

Day escapist


Gus

Dumb and

Dumber

serious

Amadeus

Braveheart

The Color Purple


Sense and

Sensibility

The Princess

Diaries

Lethal Weapon

Ocean’s 11

The Lion King

Independence

Day escapist


Gus

Dumb and

Dumber

The Color Purple serious

Amadeus

Braveheart


Sense and

Sensibility

Gus +

The Princess

Diaries

Ocean’s 11

Lethal Weapon


Dumb and

Dumber

The Lion King

Independence

Day escapist

#2: The Power of Regularized

SVD Fit by Gradient Descent

• Allowed anyone to approach early leaders

– Powerful predictor

– Efficient

– Easy to program

• Flexibility to incorporate additional features

– Implicit feedback


– Neighborhood effects

• Accurate regularization is essential

40

Factor models: RMSE vs. #parameters

0.905

0.900

0.895

0.890

0.885

0.880

0.875

10

50

100

200

Basic SVD

… + What was Rated

… + Linear Time Factors

… + Per-Day User Biases

… + per-Day User Factors

50

100

50

200

100

200

500

100

200

500

50

100

200

500 1000

1500

100 1000

Millions of Parameters

10000 100000

41

#3: The Wisdom of Crowds (of Models)

• All models are wrong; some are useful – G. Box

• Used linear blends of many prediction sets

– 107 in Year 1

– Over 800 at the end

• Difficult, or impossible, to build the grand unified model

• Mega blends are not needed in practice

– A handful of simple models achieves 80 percent of the improvement of the full blend

#4: Find Good Teammates

• Yehuda Koren

– The engine of progress for the Netflix Prize

– Implicit feedback


– Nearest neighbor modeling

• Big Chaos: Michael Jahrer, Andreas Toscher (Year 2)

– Optimization of tuning parameters

– Blending methods

• Pragmatic Theory: Martin Chabbert, Martin Piotte (Year 3)

– Some movies age better than others

– Link functions

43

The Final Leaderboard

44

Test Set Results

• The Ensemble: 0.856714

45

Test Set Results


• BellKor’s Pragmatic Theory: 0.856704

46

Test Set Results



• Both scores round to 0.8567

47

Test Set Results



• Both scores round to 0.8567

• Tie breaker is submission date/time

48

Final Test Set Leaderboard

49

Who Got the Money?

• AT&T’s donated its full share to organizations supporting science education

• Young Science Achievers Program

• New Jersey Institute of Technology pre-college and educational opportunity programs

• North Jersey Regional Science Fair

• Neighborhoods Focused on African American

Youth

#5: Is This the Way to Do Science?

• Big Success for Netflix

– Lots of cheap labor, good publicity

– Already incorporated 6 percent improvement

– Potential for much more using other data they have

• Big advances to the science of recommender systems

– Regularized SVD

– Identification of new features

– Understanding nearest neighbors

– Contributions to literature

51

Why Did this Work so Well?

• Industrial strength data

• Very good design

• Accessibility to anyone with a PC

• Free flow of ideas

– Leaderboard

– Forum

– Workshop and papers

• Money?

52

But There are Limitations

• Need a conceptually simple task

• Winner-take-all has drawbacks

• Intellectual property and liability issues

• How many prizes can overlap?

53

Thank You!

• rbell@research.att.com

• www.netflixprize.com

– …/leaderboard

– …/community

• Click BellKor’s Pragmatic Chaos or The

Ensemble on Leaderboard for details

54

BobBell_Dimacs1

Lessons from the Netflix Prize

Data Details

Test Data Split into Three Pieces

Higher Mean Rating in Probe Data

Data about the Movies

Most Active Users

Major Challenges

Major Challenges (cont.)

Ratings per User in Training Data

The Fundamental Challenge

Recommender Systems

Content Based Systems

Pandora

Collaborative Filtering (CF)

Nearest Neighbor Methods

Merits of Nearest Neighbors

Latent Factor Models

SVD

Regularization for SVD

Estimation for SVD

Improvements to

Collaborative Filtering

Localized SVD

Lesson #1: Data >> Models

You are What You Rate

Temporal Effects

#2: The Power of Regularized

SVD Fit by Gradient Descent

#4: Find Good Teammates

The Final Leaderboard

Test Set Results

Test Set Results

Test Set Results

Test Set Results

Final Test Set Leaderboard

Who Got the Money?

#5: Is This the Way to Do Science?

Why Did this Work so Well?

But There are Limitations

Thank You!

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib