Marketing and CS Philip Chan Enticing you to buy a product 1. What is the content of the ad? 2. Where to advertise? TV, radio, newspaper, magazine, internet, … 3. Who is the target audience/customers? Which question is the most important? Target customers The more you know about the customers The more effective to find the “right” customers Advertising kids’ toys Where to advertise? How to advertise? Traditional vs Modern Media Traditional media (TV, Newspaper, …) non-interactive mostly broadcast Modern media (via internet) interactive more individualize more information on individuals Problems to Study Problem 1 Product Recommendation Problem 2 Ranking Ad’s on Search Engines Product Recommendation Product Recommendation Shopping sites: amazon, netflix, … To sell more products Recommend products the customers might buy Can you read minds? “Can you read minds?” (amazon.com recruitment T-shirt) Why does amazon.com want employees who can read minds? Recommendation Systems amazon.com based on what you have looked at, bought, on your wish list, what similar customers bought, … recommends products netflix.com based on your ratings of movies, what similar customers rate, … recommends movies Netflix Prize (2006) Task Given customer ratings on some movies Predict customer ratings on other movies If John rates “Mission Impossible” a 5 “Over the Hedge” a 3, and “Back to the Future” a 4, how would he rate “Harry Potter”, … ? Performance Error rate (accuracy) www.netflixprize.com Performance of Algorithms Root Mean Square Error (RMSE) n (real i predictioni ) i n 2 Cash Award Grand Prize $1M 10% improvement by 2011 (in 5 years) Leader Board Announced on Oct 2, 2006 Progress www.netflixprize.com/community/viewtopic.php?id=386 Improvement by the top algorithm after 1 week: ~ 0.9% after 2 weeks: ~ 4.5% after 1 month: ~ 5% after 1 year: 8.43% after 2 years: 9.44% after ~3 years: 10.06% [July 26, 2009] Problem Formulation Given (input) Movie MovieID, title, year Customer: CustID, MovieID, rating, date Find (output) Rating of a movie by a user Simplification: no actors/actresses, genre, … Netflix Data (1998-2005) Customers 480,189 (ID: 1 – 2,649,429) Movies 17,770 (ID: 1 – 17,770) ID, title, year Ratings given in Training Set 100,480,507 min=1; max=17,653; avg=209 ratings per customer Rating scale: 1 – 5 Date Ratings to predict in Qualifying Set 2,817,131 About 1 GB (700 MB compressed) Naïve Algorithm 1 Calculate the average rating for each movie Always predict the movie average with no regard to the customer RMSE =1.0515 “improvement” = -11% Naïve Algorithm 2 Calculate the average rating for a customer Always predict the customer average with no regard to the movies RMSE = 1.0422 “Improvement” = -10% Nearest Neighbor Algorithm Distance/Similarity for any pair of customers Find the most similar customer (nearest neighbor) Distance Function • Given the ratings of 2 customers • How do you calculate the distance between them? Customer Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 A 2 3 4 0 5 B 3 3 0 5 1 Distance Function • How to handle 0 [no rating]? Customer Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 A 2 3 4 0 5 B 3 3 0 5 1 Distance Function • Consider we want to estimate Movie 4 for Customer A • Do we want to consider Customer C? Customer Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 A 2 3 4 ? 5 B 3 3 0 5 1 C 2 5 5 0 2 Distance Function • Do we want to consider Movie 4 in the distance? Customer Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 A 2 3 4 ? 5 B 3 3 0 5 1 Distance Function • Distance(A,B) = 1 + 0 + 1 + 4 = 6 Customer Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 A 2 3 4 ? 5 B 3 3 0 5 1 Distance 1 0 1 4 Overall Algorithm To estimate movie M for a customer C Overall Algorithm To estimate movie M for a customer C 1. Consider only customers who rated movie M Overall Algorithm To estimate movie M for a customer C 1. 2. Consider only customers who rated movie M Calculate distance to C from other customers Overall Algorithm To estimate movie M for a customer C 1. 2. 3. Consider only customers who rated movie M Calculate distance to C from other customers Find the closest customer X Overall Algorithm To estimate movie M for a customer C 1. 2. 3. 4. Consider only customers who rated movie M Calculate distance to C from other customers Find the closest customer X output the rating of M by X Euclidean Distance 2-dimensional A: (x1 , y1 ) B: (x2 , y2 ) sqrt ( (x1 – x2 )2 + (y1 – y2 )2 ) n-dimensional A: (a1 , a2 , ... , an) B: (b1 , b2 , ... , bn) sqrt( Si ( ai - bi )2 ) Similarity 1 / EuclideanDistance