Contextual Recommendation in Multi-User Devices Raz Nissim, Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee Recommendation in Personal Devices and Accounts 3/21/2016 2 Challenge: Recommendations in Shared Accounts and Devices “I am a 34 yo man who enjoys action and sci-fi movies. This is what my children have done to my netflix account” 3/21/2016 3 Our Focus: Recommendations for Smart TVs Smart TVs can track what is being watched on them Main problems: Inferring who has consumed each item in the past Who is currently requesting the recommendations “Who” can be a subset of users 3/21/2016 4 Solution: Using Context Previous work: time of day 3/21/2016 5 Context in this Work: Current Item Being Watched 3/21/2016 6 This Work: Contextual Personalized Recommendations WatchItNext problem: 3/21/2016 it is 8:30pm and “House of Cards” is on What should we recommend to be watched next on this device? Implicit assumption: there’s a good chance whoever is in front of the set now, will remain there Technically, think of HMM where the hidden state corresponds to who is watching the set, and states don’t change too often 7 WatchItNext Inputs and Output Available programs, a.k.a. “line-up” 3/21/2016 Ranked recommendations 8 Recommendation Settings: Exploratory and Habitual One typically doesn’t buy the same book twice, nor do people typically read the same news story twice But people listen to the songs they like over and over again, and watch movies they like multiple times as well In the TV setting, people regularly watch series and sports events Habitual setting: all line-up items are eligible for recommendation to a device Exploratory setting: only items that were not previously watched on the device are eligible for recommendation 3/21/2016 9 Contextual Recommendations in a Different Context Personalized How can contextualized and personalized recommendations be served together? Contextual Popular 3/21/2016 10 Collaborative Filtering A fundamental principle in recommender systems Taps similarities in patterns of consumption/enjoyment of items by users Recommends to a user what users with detected similar tastes have consumed/enjoyed 3/21/2016 11 Collaborative Filtering – Mathematical Abstraction Consider a consumption matrix R of users and items ru,i=1 whenever person u consumed item i In other cases, ru,i might be person u’s rating on item i Items The matrix R is typically very sparse …and often very large – predict which yet-to-be-consumed items the user would most enjoy • Related task on ratings data: matrix completion R= users • Real-life task: top-k recommendation |U| x |I| – Predict users’ ratings for items they have yet to rate, i.e. “complete” missing values 3/21/2016 12 Collaborative Filtering – Matrix Factorization Latent factor models (LFM): Map both users and items to some f-dimensional space Rf, i.e. produce f-dimensional vectors vu and wi for each user and item Define rating estimates as inner products: qui = <vu,wi> Main problem: finding a mapping of users and items to the latent factor space that produces “good” estimates Items V W users R= ≈ |U| x |I| 3/21/2016 |U| x f Closely related to dimensionality reduction techniques of the ratings matrix R (e.g. Singular Value Decomposition) f x |I| 13 LFMs Rise to Fame: Netflix Prize Used extensively by Challenge winners “Bellkor’s Pragmatic Chaos” (2006-2009) 3/21/2016 14 Latent Dirichlet Allocation (LDA) [Blei, Ng, Jordan 2003] Originally devised as a generative model of documents in a corpus, where documents are represented as bags-of-words V U Word1 Word2 … L Document 1 #1,1 #1,2 #1,… Document2 #2,1 #2,2 #2,… … #...,2 … #...,1 L ≈ |D| x k k x |W| k is a parameter representing the number of “topics” in the corpus V is a stochastic matrix: V[d,t] = P(topict|documentd), t=1,…,k U is a stochastic matrix: U[t,w] = P(wordw|topict), t=1,…,k L is a vector holding the documents’ lengths (#words per document) 3/21/2016 15 Latent Dirichlet Allocation (cont.) In our case: given a parameter k and the collection of devices (=documents) and their viewing history (=bags of shows), output: k “profiles”, where each profile is a distribution over items Associate each device to a distribution over the profiles Profiles, hopefully, will represent viewing preferences such as: “Kids shows” “Cooking reality and home improvement” “News and Late Night” “History and Science” “Redneck reality: fishing & hunting shows, MMA” A-priori probability of an item being watched on a device: Score(item|device) = profile=1,…,k P(item|profile) x P(profile|device) 3/21/2016 16 Contextualizing Recommendations: Three Main Approaches 1. Contextual pre-filtering: use context to restrict the data to be modeled 2. Contextual post-filtering: use context to filter or weight the recommendations produced by conventional models 3. Contextual modeling: context information is incorporated in the model itself Typically requires denser data due to many more parameters Computationally intensive E.g. Tensor Factorization, Karatzoglou et al., 2010 3/21/2016 17 Main Contribution: “3-Way” Technique Learn a standard matrix factorization model (LFM/LDA) When recommending to a device d currently watching context item c, score each target item t as follows: S(t follows c|d) = j=1..k vd(j)*wc(j)*wt(j) With LFM, requires an additive shift to all vectors to get rid of negative values Results in “Sequential LFM/LDA” – a personalized contextual recommender Score is high for targets that agree with both context and device Again – no need to model context or change learning algorithm; learn as usual, just apply change when scoring 3/21/2016 18 Data: Historical Viewing Logs Triplets of the form (devide ID, program ID, timestamp) Don’t know who watched the device at that time Actually, don’t know whether anyone watched Is anyone watching? Time 3/21/2016 19 Data by the Numbers Training data: three months’ worth of viewership data Devices Unique items* Triplets 339647 17232 More than 19M Test Data: derived from one month of viewership data Setting Test Instances Average Line-up Size Habitual ~3.8M 390 Exploratory ~1.7M 349 * Items are {movie, sports event, series} – not at the individual episode level 3/21/2016 20 Metric: Avg. Rank Percentile (ARP) Rank Percentile properties: Ranges in (0,1] Higher is better Random scores ~0.5 in large lineups next ? (RP = 1.0) RP = 0.75 (RP = 0.50) (RP = 0.25) Note: with large line-ups, ARP is practically equivalent to average AUC 3/21/2016 21 Baselines Name Personalized? Contextual? General popularity No No Sequential popularity No Yes Temporal popularity No Yes Device popularity* Yes No LFM Yes No LDA Yes No * Only applicable to habitual recommendations 3/21/2016 22 Contextual Personalized Recommenders SequentialLDA [LFM]: 3-way element-wise multiplication of device vector, context item and target item TemporalLDA[LFM]: regular LDA/LFM score, multiplied by Temporal Popularity TempSeqLDA[LFM]: 3-way score multiplied by Temporal Popularity All LDA/LFM models are 80-dimensional 3/21/2016 23 Results (1) Sequential Context Matters 0.8 0.75 0.7123 ARP 0.7 0.65 0.642 0.6175 0.7457 0.6493 0.5773 0.6 0.55 0.5 LFM No Context LDA Currently Watched Context Random Item Context Degradation when using a random item as context indicates that the correct context item reflects the current viewing session, and implicitly the current watchers of the device 3/21/2016 24 Results (2) Sequential Context Matters Device Entropy: the entropy of p(topic | device) as computed by LDA on the training data; high values correspond to diverse distributions 3/21/2016 25 Results (3) - Exploratory Setting 0.85 0.82809 0.79725 0.8 0.78168 0.76202 0.75878 0.74572 0.75 ARP 0.7123 0.7 0.65 0.634 0.642 0.64926 0.6 0.55 0.5365 0.5 3/21/2016 26 Results (4) - Habitual Setting 0.95 0.9 0.862 0.85 0.7916 ARP 0.8 0.75 0.73002 0.77445 0.79101 0.73656 0.7 0.65 0.6 0.554 0.55 0.5 3/21/2016 27 Conclusions Multi-user or shared devices pose challenging recommendation problems TV recommendations characterized by two use cases – habitual and exploratory Sequential context helps – it “narrows" the topical variety of the program to be watched next on the device Intuitively, context serves to implicitly disambiguate the current user or users of the device 3-Way technique is an effective way of incorporating sequential context that has no impact on learning Future: explore applications of Hidden Topic Markov Models [Gruber, Rosen-Zvi, Weiss 2007] 3/21/2016 28 Thank You – Questions? rlempel [at] yahoo-inc [dot] com 3/21/2016 29