Scott Wen-tau Yih & Chris Meek Microsoft Research Keyword-driven Online Advertising Sponsored Search Ads with bid keywords that match the query Contextual Advertising (keyword-based) Ads with bid keywords that are relevant to the content To deliver relevant ads leads to problems related to phrase relevance measures. query flight to kyoto Are these ads relevant to the query? How relevant are the keywords behind the ads? Given a document d and a phrase ph, we want to measure whether ph is relevant to d (e.g., p(ph|d)) Applications – judging ad relevance Sponsored search (query vs. ad landing page) Ad relevance verification Whether a keyword/query is relevant to the page Contextual advertising (page vs. bid keyword) External keyword verification Whether the new keyword is relevant to the content page For in-document phrases, we can use keyword extractor (KEX) directly [Yih et al. WWW-06] Machine Learning model learned by logistic regression Use more than 10 categories of features e.g., position, format, hyperlink, etc. TrueCredit Digital Camera Review truecredit 0.879 Getnew immediate to The flagshipaccess of Canon’s your complete creditS80 report S-series, PowerShot from 3camera, credit bureaus. Just digital incorporates per month, including 8$14.95 megapixels for shooting $25K ID Theft still images andinsurance. a movie Contact mode thatTransUnion records an for more detail… 1024 x 768 pixels. impressive transunion 0.705 creditdocument? bureaus 0.637 What if the phrase is NOT in the KEX id theft … 0.138 Given a document d and a phrase ph that is not in d Estimate the probability that ph is relevant to d TrueCredit Get immediate access to your complete credit report from 3 credit bureaus. Just $14.95 per month, including $25K ID Theft insurance. Contact TransUnion for more detail… truecredit 0.879 transunion 0.705 credit bureaus 0.637 id theft … 0.138 credit bureau report ? credit report services ? equifax credit bureau ? equifax credit report ? exquifax ? equfax ? trans union canada … ? Given a document d and a phrase ph that is not in d Estimate the probability that ph is relevant to d Challenges How do we measure it? Lack of contextual information that in-doc phrases have Consistent with the probabilities of in-doc phrases May need some methods to calibrate probabilities Calibrated cosine similarity methods Treat in-doc and out-of-doc phrases equally Map cosine similarity scores to probabilities Regression methods based on semantic kernels Given robust in-doc phrase relevance measures Predict out-of-doc phrase relevance using similarity between the target phrase and in-doc phrases Regression methods achieve better empirical results Introduction Relevance measures using cosine similarity Out-of-doc phrase relevance measure using Gaussian process regression Experiments Conclusions Step 1: Estimate sim(d,ph) → R Represent d as a sparse word vector Words in document d, associated with weights Vec(d) = {‘truecredit’,0.9; ‘transunion’,0.7; ‘access’,0.1; … } Represent ph as a sparse word vector via query expansion Issue ph as a query to search engine; let the result page be document d’ Vec(ph) ← Vec(d’) sim(d,ph) = cosine(Vec(d),Vec(ph)) Choices of term-weighing schemes Bag of words (SimBin), TFIDF (SimTFIDF) Keyword Extraction (SimKEX) Step 2: Map sim(d,ph) to prob(ph|d) Via a sigmoid function where the weights are pre-learned [Platt ’00] sim (d , ph) f log 1 sim (d , ph) prob( ph | d ) 1 1 exp( f ) The sigmoid function can be used to combine multiple relevance scores 1 prob( ph | d ) m 1 exp( i i f i ) SimCombine: Combine SimBin, SimTFIDF & SimKEX Introduction Relevance Measures using cosine similarity Out-of-doc phrase relevance measure using Gaussian process regression Experiments Conclusions TrueCredit Get immediate access to your complete credit report from 3 credit bureaus. Just $14.95 per month, including $25K ID Theft insurance. Contact TransUnion… Relevant in-doc phrases: TrueCredit, TransUnion Out-of-doc phrases: credit bureau report vs. Olympics Which out-of-doc phrase is more relevant? Step 1: Estimate probabilities of in-doc phrases KEX(d) = {(‘truecredit’,0.88),(‘transunion’,0.71), (‘credit bureaus’,0.64), (‘id theft’,0.14)} Step 2: Represent each phrase as a TFIDF vector via query expansion x1=Vec(‘truecredit’), y1=0.88; x2=Vec(‘transunion’), y2=0.71 x3=Vec(‘credit bureaus’), y3=0.64; x4=Vec(‘id theft’), y4=0.14 Step 3: Represent the target phrase ph as a vector x =Vec(ph), y=? Step 4: Use a regression model to predict y Input: (x1, y1), …, (xn, yn) and x Output: y We don’t specify the functional form of the regression model Instead, we only need to specify the “kernel function” k(x1, x2): linear kernel, polynomial kernel, RBF kernel, etc. Conceptually, kernel function tells how similar x1 & x2 are Changing kernel function changes the regression function Linear kernel → Bayesian linear regression (x1,y1), (x2,y2), …, (xn,yn) GPR x kernel function e.g., k(xi,xj) = xi·xj y y k Τ (K n2 I ) 1 y O(N3) from matrix inversion, where N≤20 typically Introduction Relevance Measures using cosine similarity Out-of-doc phrase relevance measure using Gaussian process regression Experiments Conclusions From sponsored search ad-click logs (3-month period in 2007) Randomly select 867 English ad landing pages Each page is associated with the original query and ~10 related keywords (from internal query suggestion algorithms) Labeled 9,319 document-keyword pairs 4,381 (47%) relevant; 4,938 (53%) irrelevant Most keywords (81.9%) are out-of-document 10-fold cross-validation when learning is used Accuracy Quality of binary classification False positive and false negative are treated equally AUC (Area Under the ROC curve) Quality of ranking Equivalent to pair-wise accuracy Cross Entropy Quality of probability estimations -log2[p(ph|d)] if ph is labeled relevant to d -log2[1-p(ph|d)] if ph is labeled irrelevant to d Better 0.704 7 6 0.681 5 4 0.654 3 0.663 2 0.651 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Better 0.773 7 6 0.752 5 4 3 0.726 2 0.726 0.702 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Better 0.835 7 6 0.864 5 4 3 0.882 2 0.887 0.939 1 0 0.2 0.4 0.6 0.8 1 Phrase relevance measure is a crucial task for online advertising Our solution: similarity & regression based methods Consistent probabilities for out-of-doc phrases Similarity-based methods Simple and straightforward The combined approach can lead to decent performance Regression-based methods Achieved the best results in our experiments Quality depends on the in-doc relevance estimates & kernel Future Work – More machine learning techniques SimCombine An ML method using basic similarity measures as features Explore more features (e.g., query frequency, page quality) Other machine learning models Gaussian process regression Learning a better kernel function Kernel meta-training [Platt et al. NIPS-14] Maximum likelihood training