The World-Wide Quest for Better Web Search

advertisement
Contextual Recommendation
in Multi-User Devices
Raz Nissim, Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee
Recommendation in Personal Devices
and Accounts
3/21/2016
2
Challenge: Recommendations in
Shared Accounts and Devices

“I am a 34 yo man who enjoys action and sci-fi movies. This is
what my children have done to my netflix account”
3/21/2016
3
Our Focus: Recommendations for
Smart TVs

Smart TVs can track what is being watched on them

Main problems:



Inferring who has consumed
each item in the past
Who is currently requesting
the recommendations
“Who” can be a subset of
users
3/21/2016
4
Solution: Using Context

Previous work: time of day
3/21/2016
5
Context in this Work:
Current Item Being Watched
3/21/2016
6
This Work: Contextual Personalized
Recommendations
WatchItNext problem:




3/21/2016
it is 8:30pm and “House of Cards” is on
What should we recommend to be watched
next on this device?
Implicit assumption: there’s a good chance
whoever is in front of the set now, will
remain there
Technically, think of HMM where the hidden
state corresponds to who is watching the
set, and states don’t change too often
7
WatchItNext Inputs and Output
Available programs,
a.k.a. “line-up”
3/21/2016
Ranked
recommendations
8
Recommendation Settings:
Exploratory and Habitual





One typically doesn’t buy the same book twice, nor do
people typically read the same news story twice
But people listen to the songs they like over and over
again, and watch movies they like multiple times as well
In the TV setting, people regularly
watch series and sports events
Habitual setting: all line-up items are eligible for
recommendation to a device
Exploratory setting: only items that were not previously
watched on the device are eligible for recommendation
3/21/2016
9
Contextual Recommendations in a
Different Context
Personalized
How can contextualized and personalized
recommendations be served together?
Contextual
Popular
3/21/2016
10
Collaborative Filtering



A fundamental principle in recommender systems
Taps similarities in patterns of consumption/enjoyment
of items by users
Recommends to a user what users with detected similar
tastes have consumed/enjoyed
3/21/2016
11
Collaborative Filtering –
Mathematical Abstraction

Consider a consumption matrix R of users and items



ru,i=1 whenever person u consumed item i
In other cases, ru,i might be person u’s rating on item i Items
The matrix R is typically very sparse

…and often very large
– predict which yet-to-be-consumed items
the user would most enjoy
• Related task on ratings data: matrix
completion
R=
users
• Real-life task: top-k recommendation
|U| x |I|
– Predict users’ ratings for items they have
yet to rate, i.e. “complete” missing values
3/21/2016
12
Collaborative Filtering –
Matrix Factorization

Latent factor models (LFM):



Map both users and items to some f-dimensional space Rf, i.e.
produce f-dimensional vectors vu and wi for each user and item
Define rating estimates as inner products: qui = <vu,wi>
Main problem: finding a mapping of users and items to the
latent factor space that produces “good” estimates
Items
V
W
users
R=
≈
|U| x |I|
3/21/2016
|U| x f
Closely related to
dimensionality reduction
techniques of the ratings
matrix R (e.g. Singular
Value Decomposition)
f x |I|
13
LFMs Rise to Fame: Netflix Prize
Used extensively by
Challenge winners
“Bellkor’s Pragmatic Chaos”
(2006-2009)
3/21/2016
14
Latent Dirichlet Allocation (LDA)
[Blei, Ng, Jordan 2003]
Originally devised as a generative model of documents in a corpus,
where documents are represented as bags-of-words

V
U
Word1 Word2 …
L
Document
1 #1,1
#1,2
#1,…
Document2 #2,1
#2,2
#2,…
…
#...,2
…

#...,1
L
≈
|D| x k




k x |W|
k is a parameter representing the number of “topics” in the corpus
V is a stochastic matrix: V[d,t] = P(topict|documentd), t=1,…,k
U is a stochastic matrix: U[t,w] = P(wordw|topict), t=1,…,k
L is a vector holding the documents’ lengths (#words per document)
3/21/2016
15
Latent Dirichlet Allocation (cont.)



In our case: given a parameter k and the collection of devices
(=documents) and their viewing history (=bags of shows), output:
 k “profiles”, where each profile is a distribution over items
 Associate each device to a distribution over the profiles
Profiles, hopefully, will represent viewing preferences such as:
 “Kids shows”
 “Cooking reality and home improvement”
 “News and Late Night”
 “History and Science”
 “Redneck reality: fishing & hunting shows, MMA”
A-priori probability of an item being watched on a device:
Score(item|device) = profile=1,…,k P(item|profile) x P(profile|device)
3/21/2016
16
Contextualizing Recommendations:
Three Main Approaches
1. Contextual pre-filtering: use context to restrict the data
to be modeled
2. Contextual post-filtering: use context to filter or weight
the recommendations produced by conventional models
3. Contextual modeling: context information is
incorporated in the model itself



Typically requires denser data due to many more parameters
Computationally intensive
E.g. Tensor Factorization,
Karatzoglou et al., 2010
3/21/2016
17
Main Contribution:
“3-Way” Technique


Learn a standard matrix factorization model (LFM/LDA)
When recommending to a device d currently watching context item
c, score each target item t as follows:
S(t follows c|d) = j=1..k vd(j)*wc(j)*wt(j)




With LFM, requires an additive shift to all vectors to get rid of
negative values
Results in “Sequential LFM/LDA” – a personalized contextual
recommender
Score is high for targets that agree with both context and device
Again – no need to model context or change learning algorithm;
learn as usual, just apply change when scoring
3/21/2016
18
Data: Historical Viewing Logs



Triplets of the form (devide ID, program ID, timestamp)
Don’t know who watched the device at that time
Actually, don’t know whether anyone watched
Is anyone
watching?
Time
3/21/2016
19
Data by the Numbers


Training data: three months’ worth of viewership data
Devices
Unique items*
Triplets
339647
17232
More than 19M
Test Data: derived from one month of viewership data
Setting
Test Instances
Average Line-up Size
Habitual
~3.8M
390
Exploratory
~1.7M
349
* Items are {movie, sports event, series} – not at the individual episode level
3/21/2016
20
Metric: Avg. Rank Percentile (ARP)
Rank Percentile properties:

Ranges in (0,1]
 Higher is better

Random scores ~0.5 in
large lineups
next
?
(RP = 1.0)
RP = 0.75
(RP = 0.50)
(RP = 0.25)
Note: with large line-ups, ARP is practically equivalent to average AUC
3/21/2016
21
Baselines
Name
Personalized?
Contextual?
General popularity
No
No
Sequential popularity
No
Yes
Temporal popularity
No
Yes
Device popularity*
Yes
No
LFM
Yes
No
LDA
Yes
No
* Only applicable to habitual recommendations
3/21/2016
22
Contextual Personalized
Recommenders




SequentialLDA [LFM]: 3-way element-wise multiplication
of device vector, context item and target item
TemporalLDA[LFM]: regular LDA/LFM score, multiplied
by Temporal Popularity
TempSeqLDA[LFM]: 3-way score multiplied by Temporal
Popularity
All LDA/LFM models are 80-dimensional
3/21/2016
23
Results (1)
Sequential Context Matters
0.8
0.75
0.7123
ARP
0.7
0.65
0.642
0.6175
0.7457
0.6493
0.5773
0.6
0.55
0.5
LFM
No Context
LDA
Currently Watched Context
Random Item Context

Degradation when using a random item as context indicates
that the correct context item reflects the current viewing
session, and implicitly the current watchers of the device
3/21/2016
24
Results (2)
Sequential Context Matters
Device Entropy: the entropy of p(topic | device) as computed by LDA
on the training data; high values correspond to diverse distributions
3/21/2016
25
Results (3) - Exploratory Setting
0.85
0.82809
0.79725
0.8
0.78168
0.76202
0.75878
0.74572
0.75
ARP
0.7123
0.7
0.65
0.634
0.642
0.64926
0.6
0.55
0.5365
0.5
3/21/2016
26
Results (4) - Habitual Setting
0.95
0.9
0.862
0.85
0.7916
ARP
0.8
0.75
0.73002
0.77445
0.79101
0.73656
0.7
0.65
0.6
0.554
0.55
0.5
3/21/2016
27
Conclusions





Multi-user or shared devices pose challenging recommendation
problems
TV recommendations characterized by two use cases – habitual
and exploratory
Sequential context helps – it “narrows" the topical variety of the
program to be watched next on the device
 Intuitively, context serves to implicitly disambiguate the
current user or users of the device
3-Way technique is an effective way of incorporating sequential
context that has no impact on learning
Future: explore applications of Hidden Topic Markov Models
[Gruber, Rosen-Zvi, Weiss 2007]
3/21/2016
28
Thank You – Questions?
rlempel [at] yahoo-inc [dot] com
3/21/2016
29
Download