Marketing and CS Philip Chan

advertisement
Marketing and CS
Philip Chan
Enticing you to buy a product
1. What is the content of the ad?
2. Where to advertise?

TV, radio, newspaper, magazine, internet, …
3. Who is the target audience/customers?
 Which question is the most important?
Target customers
 The more you know about the customers

The more effective to find the “right”
customers
 Advertising kids’ toys


Where to advertise?
How to advertise?
Traditional vs Modern Media
 Traditional media (TV, Newspaper, …)


non-interactive
mostly broadcast
 Modern media (via internet)



interactive
more individualize
more information on individuals
Problems to Study
 Problem 1

Product Recommendation
 Problem 2
 Ranking Ad’s on Search Engines
Product Recommendation
Product Recommendation
 Shopping sites: amazon, netflix, …
 To sell more products

Recommend products the customers might
buy
Can you read minds?
 “Can you read minds?” (amazon.com
recruitment T-shirt)
 Why does amazon.com want employees who
can read minds?
Recommendation Systems
 amazon.com


based on what you have looked at, bought, on
your wish list, what similar customers bought,
…
recommends products
 netflix.com


based on your ratings of movies, what similar
customers rate, …
recommends movies
Netflix Prize (2006)
 Task
 Given customer ratings on some movies
 Predict customer ratings on other movies
 If John rates
 “Mission Impossible” a 5
 “Over the Hedge” a 3, and
 “Back to the Future” a 4,
 how would he rate “Harry Potter”, … ?
 Performance
 Error rate (accuracy)
 www.netflixprize.com
Performance of Algorithms
 Root Mean Square Error (RMSE)
n
 (real
i
 predictioni )
i
n
2
Cash Award
 Grand Prize



$1M
10% improvement
by 2011 (in 5 years)
Leader Board
 Announced on Oct 2, 2006
 Progress

www.netflixprize.com/community/viewtopic.php?id=386
 Improvement by the top algorithm
 after 1 week: ~ 0.9%
 after 2 weeks: ~ 4.5%
 after 1 month: ~ 5%
 after 1 year:
8.43%
 after 2 years:
9.44%
 after ~3 years: 10.06% [July 26, 2009]
Problem Formulation
 Given (input)
 Movie


MovieID, title, year
Customer:

CustID, MovieID, rating, date
 Find (output)
 Rating of a movie by a user
 Simplification: no actors/actresses, genre, …
Netflix Data (1998-2005)
 Customers
480,189 (ID: 1 – 2,649,429)
Movies
 17,770 (ID: 1 – 17,770)
 ID, title, year
Ratings given in Training Set
 100,480,507
 min=1; max=17,653; avg=209 ratings per customer
 Rating scale: 1 – 5
 Date
Ratings to predict in Qualifying Set
 2,817,131
About 1 GB (700 MB compressed)





Naïve Algorithm 1
 Calculate the average rating for each movie
 Always predict the movie average

with no regard to the customer
 RMSE =1.0515
 “improvement” = -11%
Naïve Algorithm 2
 Calculate the average rating for a customer
 Always predict the customer average

with no regard to the movies
 RMSE = 1.0422
 “Improvement” = -10%
Nearest Neighbor Algorithm
 Distance/Similarity for any pair of customers
 Find the most similar customer (nearest
neighbor)
Distance Function
• Given the ratings of 2 customers
• How do you calculate the distance between
them?
Customer Movie 1
Movie 2
Movie 3
Movie 4
Movie 5
A
2
3
4
0
5
B
3
3
0
5
1
Distance Function
• How to handle 0 [no rating]?
Customer Movie 1
Movie 2
Movie 3
Movie 4
Movie 5
A
2
3
4
0
5
B
3
3
0
5
1
Distance Function
• Consider we want to estimate Movie 4 for
Customer A
• Do we want to consider Customer C?
Customer Movie 1
Movie 2
Movie 3
Movie 4
Movie 5
A
2
3
4
?
5
B
3
3
0
5
1
C
2
5
5
0
2
Distance Function
• Do we want to consider Movie 4 in the
distance?
Customer Movie 1
Movie 2
Movie 3
Movie 4
Movie 5
A
2
3
4
?
5
B
3
3
0
5
1
Distance Function
• Distance(A,B) = 1 + 0 + 1 + 4 = 6
Customer Movie 1
Movie 2
Movie 3
Movie 4
Movie 5
A
2
3
4
?
5
B
3
3
0
5
1
Distance
1
0
1
4
Overall Algorithm
 To estimate movie M for a customer C
Overall Algorithm
 To estimate movie M for a customer C
1.
Consider only customers who rated movie M
Overall Algorithm
 To estimate movie M for a customer C
1.
2.
Consider only customers who rated movie M
Calculate distance to C from other customers
Overall Algorithm
 To estimate movie M for a customer C
1.
2.
3.
Consider only customers who rated movie M
Calculate distance to C from other customers
Find the closest customer X
Overall Algorithm
 To estimate movie M for a customer C
1.
2.
3.
4.
Consider only customers who rated movie M
Calculate distance to C from other customers
Find the closest customer X
output the rating of M by X
Euclidean Distance
 2-dimensional
 A: (x1 , y1 )
 B: (x2 , y2 )
 sqrt ( (x1 – x2 )2 + (y1 – y2 )2 )
 n-dimensional
 A: (a1 , a2 , ... , an)
 B: (b1 , b2 , ... , bn)
 sqrt( Si ( ai - bi )2 )
 Similarity
 1 / EuclideanDistance
Download