Please interrupt me at any point! Online Advertising Open lecture at Warsaw University January 7/8, 2011 Ingmar Weber Yahoo! Research Barcelona ingmar@yahoo-inc.com Disclaimers & Acknowledgments • This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc. or any other entity. • Algorithms, techniques, features, etc. mentioned here might or might not be in use by Yahoo! or any other company. • Some of the slides in this lecture are based on slides for “Introduction to Computational Advertising”, given by A. Broder and V. Josifovski at Stanford University. http://www.stanford.edu/class/msande239/ Goals of this Presentation • Give an overview of the two main types of online advertising; (i) search advertising and (ii) display advertising • Explain the key technical aspects behind with a focus on computational aspects • This time: more breadth • Next time: more depth (you tell me where!) Types of Online Advertising • • • • • • Search Advertising Display Advertising E-mail Advertising Classifieds Sponsorships … Part 1 Part 2 Part 0 Setting the Scene Different Advertising Objectives Brand Advertising You’re not expected to buy a rolex watch tomorrow. What’s different? Direct Marketing Tries to cause an (almost) immediate reaction. US Online Spending share by objective What’s bigger? Branding or direct response? Lots of $$$ (or zloty) Poland’s state deficit in 2010: ~$11 billion Poland’s agriculture GDP: ~$32 billion Part 1 Search Advertising The Life of an Ad - Terminology “impression”/“pageview” “click” “click-through rate”: (# clicks)/(# impressions) “landing page” “target page” “conversion” or “action” “conversion rate”: (# conversions)/(# page visits) “tracking code” <script type="text/javascript" src="http://www.yahoo.com/conversion.js"> </script> Search Advertising • Advertisements are sold in auctions – • Advertisers bid on search terms [show live] Different payment models – – – de-facto standard CPC (cost per click) Advertiser pays $X when an ad gets clicked CPA (cost per action) growing popularity Advertiser pays $Y when a click on an ad leads to a (trans-)action/purchase CPM (cost per mille [page impressions]) used for display ads Advertiser pays $Z for 1,000 ad displayments Bidding for search terms Advertisers compete for search terms “warsaw hotels”, “online advertising”, … A click has a different value for different advertisers Search on engines need and to decide: depends profit margin on conversion rate * How should slots be assigned? There’s a ranked list the of sponsored search results MyComputer.com * How much should be paid per click? Assumption: higher ranking => more clicks (CTR) 99% of web site visitors don’t purchase anything Advertisers need to decide: 1% buy a computer - conversion rate (from transaction) Advertisers bid for a (good) slot in click thetoresults Profit per computer sold $100 * How $ 0.01 permuch click -to$ bid? 100.00How per would click you do it? Expected profit per visitor $1 – value of a single visit/click Search engine the order/inclusion Guess the decides most expensive search term? slots are assigned to (successful) bidders When a user clicks on a sponsored search result … … payment is made by the advertiser How much do people typically pay? How much do people typically pay? How much does X cost? • Try to guess some expensive key words – Clear (commercial) intent – Very high value for new customer • Keyword tool – Small competition … • The winner is … – Mesothelioma Exercise • Build six teams • Think of terms to bid on (exact match) and corresponding ads. You can choose the target page! • You’ll get 5 EUR per team to target the US&Canada search market • Ads will go live around 18h00 today (Friday) and we’ll look at the results tomorrow (Saturday) around 16h00 Exercise • All ads will run under my account • All keywords have to be “distinct” (system doesn’t allow self-competition) • Assigned in reversing round robin fashion (1,2,3,3,2,1,1,2,3,…) • Max 5 key words and 1 ad per team • The team with the largest number of clicks by 16h00 on Saturday wins • Please, no cheating Pricing of Ads • • • • How was it done? What was wrong with that? How is it done now? Does that solve all problems? Historic Overture mechanism Slot assignment by bid order Assign the slots in the order of the bid values higher bid => higher slot When a user clicks, you pay your bid value You bid $1.00 per click? - You pay $1.00 per click! Simple. - Intuitive. - Used for many years. What’s wrong with this? End of story? – No, because … Difficult for advertisers to “play” this “game”: There’s no equilibrium! Scenario: • Two available with CTR 5% and 4% respectively Difficultad to slots “play” this game optimally. • ThreePotential bidders with valuations $20, $18, $10 per click feeling of “being cheated”. What happens? Bidder 2 bids $10.01 to beat Bidder 1 and to get a slot Bidder 1 will not pay more than $10.02 Then bidder 2 bids $10.03 Then bidder 1 bids $10.04 … and the fun continues until $14 … when it all collapses back to $10.01 End of story? – And no, because … Ads can have different motivations – Motivating an action/purchase/click – Simply placing/marketing a brand Want to get rid of high-bidding free riders. ebay could afford to bid for every term … ... because no one will click the ad! “Buy * on ebay!” * = world peace, grandmother, happiness, … ebay cares more about page impressions Addressing the first problem: Second price auction If only a single slot exists, do the following: Assign the slot to the highest bidder. Ex: Slot goes to Bidder 1 who bid $17. Let him pay the second highest bid. Ex: Bidder 1 pays $15, Bidder 2’s bid. Theorem (Vickrey ‘61): Bidding truthfully is a dominant strategy in this setting. (c.f. stamp auctions 1878+) Second Price Auction Explained This ad slot is worth €1 to me. He’s “lying”. I bid €0.80! Loses Pays €0.70. Loses item. item.But Should Butcould could have have have bid bid bid €1.00. €1.00. €1.00. Your title here Your cool ad text goes here. www.domain.com I bid €0.70! €0.90! €1.50! Bidding “truthfully” is always best. Regardless of what others do. Only works for a single slot … Addressing the first problem: Generalized second price auction If many slots exist, do the following: Assign the slots in (decreasing) order of the bids. Let each one pay the next (lower) bid. Called: Generalized second price (GSP) auction Is bidding “truthfully” a dominant strategy? Are there any dominant strategies? Addressing the first problem: Generalized second price auction Same scenario again: • Two available ad slots with CTR 5% and 4% respectively • Three bidders with valuations $20, $18, $10 per click What happens if everyone bids truthfully ($20, $18, $10 respectively)? Bidder 1: ($20-$18)*0.05 = $0.10 profit per page impression Bidder 2: ($18-$10)*0.04 = $0.32 profit per page impression Bidder 3: $0.00 profit per page impression If bidder 1 bids $11 instead … … his profit is ($20-$10)*0.04 = $0.40 per page impression Bidding “truthfully” is not a dominant strategy in GSP. In fact, no dominant strategy exists for GSP. So, still saw-tooth under GSP? As long as you bid less than the higher bid, your payment doesn’t change … … but the guy above gets charged more. So: Bidder 2 increases bid to stay just slightly below bidder 1 No difference for his position/payment But payment of other bidder 1 goes up Bidder 1 can “retaliate” by underbidding bidder 2 Bidder 1 now pays less (for a worse slot) Bidder 2 now pays more (for a better slot) Bidder 1 and bidder 2 have swapped position and (kind of) bids. “locally envy-free” if these games don’t happen. Locally envy-free equilibria “Internet Advertising and the GSP Auction: Selling Billions of Dollars Worth of Keywords”, Edelman et al., 2006 A (pure Nash) equilibrium is locally envy-free if for any rank i: ®i sg(i) – p(i) ¸ ®i-1 sg(i) – p(i-1) ®i = CTR at rank i (think “volume”) p(i) = cost for rank i small i = low rank = high CTR Locally envy-free equilibria Lemma 1: A locally envy-free equilibrium of the GSP game corresponds to a stable assignment. Stable assignment: nobody wants to swap position and payment with anybody else Proof: No swap with positions below as we have an equilibrium: could just undercut advertiser to make this swap. Remains to show: no swap with positions (far) above. Locally envy-free equilibria Proof (ctd): Claim: resulting order is “assortative”, i.e. in the order of the sg(i): ®i sg(i) – p(i) ¸ ®i+1 sg(i) – p(i+1) (equilibrium) ®i+1 sg(i+1) – p(i+1) ¸ ®i sg(i+1) – p(i) (envy-free) Gives: (®i - ®i+1) sg(i) ¸ (®i - ®i+1) sg(i+1) Locally envy-free equilibria Proof (ctd): Suppose i wants to go to m<i ®i sg(i) – p(i) ¸ ®i-1 sg(i) – p(i-1) ®i-1 sg(i-1) – p(i-1) ¸ ®i-2 sg(i-1) – p(i-2) … ®m+1 sg(m+1) – p(m+1) ¸ ®m sg(m+1) – p(m) Replace all sq(x) by sq(i) (using Claim and ®j > ®j+1). Then add and cancel. Get: ®i sg(i) – p(i) ¸ ®m sg(i) – p(m) Locally envy-free equilibria Lemma 2: When there are more advertisers than slots, then any stable assignment corresponds to a locally envy free equilibrium of the GSP game. Could be an empty set …but Theorem: Bidding bj = pV,(j-1)/®j-1 gives a locally envy-free equilibrium with VCG payments. Here pV,(j-1) are VCG payments. Why is this of little practical relevance? So, still saw-tooth under GSP? At least GSP has equilibria, though not in dominant strategies. GSP is “reasonably stable”. Payment depends on position, not on bid directly. “Correct” generalization of SP: Vickrey-Clarke-Groves Mechanism Assume “no ebay”: CTR depends only on slot Assign the slots in bid order … (again) Advertiser X has to pay for loss in (bid * clicks) (Sum of (bi¢CTRi) before X enters the game sum of (bi¢CTRi) of other players after X enters) / CTRX Example: …. next slide … “Correct” generalization of SP: Vickrey-Clarke-Groves Mechanism Same scenario again: 3 advertisers: bids $20, $18, $10 (their valuations) Two slots: CTR 5%, CTR 4% [think: 5 clicks, 4 click] Slots go to bids $20 and $18 respectively. Corresponding payments? Advertiser 1: W/o adv. 1, sum over adv. 2 and 3 $18*0.05 + $10*0.04 = $1.30 W/ adv. 1, sum only over adv. 2 $18*0.04 = $0.72 Advertiser 2: Without adv. 2, sum over adv. 1 and 3 $20*0.05 + $10*0.04 = $1.40 With adv. 2, sum only over adv. 1 $20*0.05 = $1.00 Payment by advertiser 1: ($1.30-$0.72)/0.05 = $11.6 (per click) Payment by advertiser 2: ($1.40-$1.00)/0.04 = $10 (per click) “Correct” generalization of SP: Vickrey-Clarke-Groves Mechanism Theorem: Bidding “truthfully” is a dominant strategy in this mechanism. VCG mechanism not used for web advertising! Still have ebay problem … Vickrey got Nobel prize in economics in ‘96 (a few days before his death) Addressing the “ebay problem” Slot assignment by revenue order Have weights for different advertisers Measure probability of click (= quality of ad) ctrebay = 0.001, ctringmar = 0.01 Revenue ordering vs. bid ordering 30% more revenue per page impression Assign slots in (decreasing) order of ctri ¢bi (~ revenue for search engine) Pay minimum bid needed to stay ahead: pi = ctri+1¢bi+1/ctri GSP in Practice • GSP with revenue ordering used by all major search engines • But with modifications … – minimum price (“reserve price”) – number of slots is variable – quality of landing page to avoid frustration – positional constraints –… “Putting Nobel Prize-winning theories to work” ? Google’s unique auction model uses Nobel Prize-winning economic theory to eliminate the winner’s curse – that feeling that you’ve paid too much. While the auction model lets advertisers bid on keywords, the AdWords™ Discounter makes sure that they only pay what they need in order to stay ahead of their nearest competitor. http://www.google.com/adsense/afs.pdf Knowing the Click-Through Rates • How do we know the click-through rates? – Estimated from past performance • What if aWhat newisadvertiser arrives? the problem? – If we show his ads, lose chance to show other good ads. – If we don’t show his ads, might not discover a new high-performing ad. Solution: Explore-Exploit Multi-Armed Bandits $1 $3 Expect $2 $2 $10 $4 Expect $8 $10 First, explore! Now, exploit! $6 $4 $8 Expect $6 Multi-Armed Bandits • Set of k bandits, i.e. real distributions B = {R1, …, RK} ¹k = mean(Rk) ¹* = maxk {¹k} Game is played for H rounds Regret: ½(H) = H ¹* - t=1H rt where rt is the (random) reward at time t Want ½(H)/H ! 0 with probability 1 as H! 1 Suggestions? Multi-Armed Bandits Epsilon-greedy strategy: The currently best bandit is selected for a fraction of 1- ² of the rounds, and a bandit selected uniformly at random for a fraction of ². Restless Bandit Problem – distributions change Arm Acquiring Bandit – new bandits arrive Practical CTR Complications • CTR depends also presence/absence of other ads • And what the user has seen in the past • And on quality of search results • Should we show the worst search results so that users are “desperate” and click the ads? Fraud • Click fraud – On opponent's paid search results (10%-20%) – On the contextual ads of your homepage • Impression fraud Other kinds? – Give your opponent a lower CTR – Lowers the amount you’ll have to bid • What should search engines do? – All search engines do not bill for fraudulent clicks – See case “Lane’s Gifts v. Google” Does CPA Solve Fraud? Click fraud no longer works. Only get charged for “actions”, aka conversion. End of story? Now advertisers can cheat by underreporting conversions. Can Y!/G trust advertisers? Have to hand over monitoring to search engine. Can advertisers trust Y!/G? Very, very sparse data to derive estimates. Hard for Y!/G to make optimal decisions. Mobile Sponsored Search • Mobile devices offer more context – Location – More short-term needs -> more monetizable • More focused user attention – Can’t just open another tab while loading • More positive associations – People tend to feel “closer” to their mobile Summary of Part 1 • • • • Search advertising is a multi-billion dollar business Allows very targeted advertising Fair payment model: you only pay for clicks (CPC) How much you pay depends on – Your bid – Fraction of people clicking your ad (CTR) • Payment reasonably stable and “gaming” is difficult • Practical problems such as learning CTRs and avoiding click fraud Exercise • 6 teams … Part 2 Display Advertising Display Advertising Historical note: banners • Banners seem to be the oldest standard format in use • According to Wikipedia the first banner ad ever was sold in 1993 by Global Network Navigator (GNN) to Heller, Ehrman, White, & McAuliffe, a legal firm popular in Silicon Valley. • GNN was a popular pre-Yahoo! directory eventually sold to AOL in 1995 • Heller Ehrman White & McAuliffe was started in 1890 and went bankrupt in 2008. In 1929 they negotiated the financing of the Bay Bridge. Display Advertising • Usually sold on a CPM basis • Guaranteed delivery (GD): deliver 30 million impressions on finance.yahoo.com in Feb ’11 – Typically large, “premium” campaigns • Non-guaranteed delivery (NGD): sold in auctions on the spot market at varying prices – Typically smaller, ad-hoc campaigns How much does it cost? Components of a GD system 1. Forecast supply and demand How many users will visit a page in a certain period? 2. Forecast NGD pricing How much could we get on the spot market? 3. Admission control & pricing 30m impressions in July 2011 on sports.yahoo.com Should we accept the contract? Can we meet the guarantee? What price should we charge? How are other contracts impacted? 4. “Optimal” allocation of impressions to active contracts What is the objective function? Cannot re-run after every impression due to scalability. “Simple” (stochastic) packing problem? 5. Ad serving Demand (long term) depends on quality of allocation! “females, 30-50, high income” more valuable than “teenager drop-outs” Cannot only use low value impressions to satisfy contract Optimal Allocation • Optimal allocation – Maximize a stated objective function subject to supply and demand constraints • What objective? – Value of the remaining inventory? - Good for publisher – Maximize quality? - Good for advertiser • Need to balance utilities: publisher, advertiser, user, & network! Representative Allocations A. Ghosh & al., “Randomized Bidding for Maximally Representative Allocation”, Yahoo! Research Technical Report 2008-003 • Unless the targeting is very fine-grained there is a wide spectrum of quality of impressions matching a typical contract • Contract says: Male, US, auto interests. What should be supply to this contract? – Is it OK to supply 100% 15 year-old males, daydreaming about cars, weekly allowances $25 ? – Advertiser probably wants/expects a representative sample of car-buying US male population Publisher’s potential strategies Assume publisher has just one GD contract • Suboptimal strategy: Why suboptimal? – Deliver first all impressions to the contract – Only after the contract is met, sell in spot market • Bad for the publisher because some of the GD pageviews may fetch lot more money on the spot than the contract value • Better strategy – Put up every pageview on auction (as a seller) – Also place a bid on it for the contract (as a buyer) – Value determined by probability & penalty of not fulfilling the contract Publisher-optimal bid strategy • If target is 30 million, place the smallest constant bid in each round so that exactly 30 million pageviews are won • All excess inventory will be sold to someone else (not the GD contract) at a higher price. • “Unfair” to the GD contract – All impressions delivered are of low value • 2 a.m. viewers • viewers from poor neighborhoods • basically, viewers nobody wanted! Volume vs. price of winning bids on spot market Volume = number of impressions sold at p ~ price density Price on sport market used as proxy for “quality” of impression Price p Publisher-Optimal Volume Find position for the arrow such that area before the arrow = d (GD Advertiser gets the cheapest stuff) Price Advertiser-Optimal Volume Find position for the arrow such that area after the arrow = d (GD Advertiser gets the most expensive stuff) Price Compromises • The GD contract could get half of the bottom stuff and half of the top stuff • More fine-grained: – Of the supply selling at every price, give d/s fraction to the GD contract. – Then, price distribution in GD mirrors the intrinsic distribution in the total supply. – Objective function must penalize deviation from this ideal. Problem setting • Assume the publisher knows the distribution of the external winning bid on the spot market • Notation – p = price (winning bid) – f(p) = price density = the highest bid is drawn i.i.d. from f – s = total supply (inventory) of impressions – d = demand (GD volume) for the contract – t = target spend per impression (budget) • d/s is the fraction of the total supply that needs to be delivered to the (unique!) contract Find an allocation a(p) • a(p)/s = fractional allocation to GD at price p, that is: – There are s*f(p)*dp impressions available at price p (or rather in interval [p,p+dp) – The GD contract gets a(p)/s * s*f(p)*dp = a(p)*f(p)*dp impressions at price p • Ideal: a(p)/s = d/s for all p • Objective: close to this ideal • u measures distance Allocation Constraints • a() is not assumed continuous a priori • If indeed a(p)/s = d/s for all p, constraint is satisfied! Allocation Constraints = the dollar amount “lost” due meeting the contract. So we must have • Recall t = the average budget per impression. Publisher does get more than this per impression. Final Optimization Problem Minimize over a() Subject to No solution if t (cost per impression) is too small. Possible distance: Kullback-Leibler divergence • K-L divergence between two nonnegative functions is K-L Optimization Problem Minimize over a() Subject to Parameter t governs revenue-fairness trade-off Bidding strategy • Now we have found an optimal allocation – At price p give fraction a(p)/s to GD • How can we implement the optimal allocation a(p) in the auction environment? – We have to bid randomly – Bidding the same amount each round is suboptimal Stochastic Bidding • Recall a(p)/s is the fraction of supply available at price p that should be won for GD • At price p, what fraction of the supply will be won for GD? • Fraction won = prob{GD bid > p} = 1 – H(p) – H(p) is the GD bid distribution (cdf) – a(p)/s = 1 – H(p) • Get a(p)/s from optimization, convert to H(p) – a(p) non-increasing • Enter auction with probability a(0)/p Targeting • Which ads could be shown on a page via the spot market? • Only they participate in bidding for the impressions. Contextual Targeting Contextual Targeting How would you do it? Taken from: http://tutorialfreakz.com/30-misplaced-ads/ Demo • Show textual ads • Also sold on a CPC basis • Which “queries” should be triggered by page? Phrase Extraction for Contextual Advertising “Finding Advertising Keywords on Web Pages”, Yih et al., 2006 • Goal: given a page find phrases that are good for placing ads • Reverse search problem: given a page, find the queries that would match (summarize) the content of this page • Select ads based on a single selected keyword: – Contextual Advertising translated into database approach of Sponsored Search – Reuse of the Sponsored Search infrastructure – lower cost – Ad Networks earn less per impression in CA • Lower click-through rates (high-variance) • Lower conversion (less clear intent) • revenue share with the publisher System Architecture Input: web page 1. Preprocessor process html -> text 2. Candidate Selector generate candidates = candidate bid phrases 3. Classifier Machine learning? score the candidates 4. Postprocessor Combine scores -> probability of being “useful” Output: bid phrases 1. Preprocessor • Translate HTML into plain text • Preserve the blocks in the original document • Preserve info about outgoing anchor text, meta tags • Open source HTML parser for scraping – BeautifulSoup • Part-of-Speech (POS) tagger – record the type of the word • Chunker – detecting noun phrases 2. Candidate Selection • All phrases of length up to 5 (including single words) – Within a single page block (sentence) • Two dimensions of candidate selection: – Individual occurrences extracted separately vs. combining all occurrences into entry per page (separate vs. combined) – Keep phrases or break up into individual words • Label individual words with their relationship with a phrase (if phrases are broken up): – – – – Beginning of a phrase Inside a phrase Last word of a phrase … 3. Classifier • Given a phrase predict if it is “keyword” usable for selecting ads – “adverse affects of coffee” vs. “sat down on breakfast table” • For the whole phrase a single binary classifier – Logistic regression model P(Y=1|x) = 1/(1 + e-wx) – x is vector of features of a given phrase – w is a vector of importance weights learned from the training set • Decomposed – multi label classifier (B,I,L,…) – P(Yi=1|x) = exwi/(i exwj) 3. Classifier: Features • Linguistic features: is a noun; is a proper name; is a noun phrase; are all words in the phrase of the same type • Capitalization: any/all/first word capitalization • Section based features: – Hypertext – is the feature extracted from anchor text – Title, Meta tags, URL • IR features: tf, idf, log(tf), log(idf), sentence length, phrase length, relative location in the document • Query log features: log(phrase frequency), log(first/second/interior word frequency) • Feature reconciliation – Binaryfeatures features – OR of all occurrences Which are most important? – Real valued features – min 4. Postprocessor • Score reconciliation: instance with the highest score • Separate words -> phrase probability: – p1= probability of a phrase: product of the confidence of the classification of each term – p0 = probability of all the words of the phrase being outside a keyword – score = p1/(p1+p0) Experiments: Data • • • • • • • 828 pages Indexed by MSN Have ads In the Internet Archive One page per domain Eliminate foreign and adult pages Editors (8) instructed to seek highly prominent keywords with advertising potential Experiments: Metrics • Editorial judgments • Precision-recall – might be too difficult – Too long for the judges to find all the relevant phrases – Given a phrase – influence the judges • A proxy for Precision-Recall – top-1 = top-1 result is in the list selected by the editor, count across the set of pages – top-10 = % of top-10 results in the editor set, averaged over the set of pages Experiments: Results Best performance for combining occurrences and not breaking up into word. Demographic Targeting Image is taken from: http://realblogging.com/christine-wade/targeted-ad-on-facebook-test-and-the-results/ A Glimpse at my Own Work http://clues.yahoo.com Behavioral Targeting […] for instance, if a visitor has a recent history of researching SUVs and is a regular visitor of Yahoo! Music, Yahoo! BT will have the insights to serve up a relevant SUV ad while the visitor is browsing the Yahoo! Music homepage. Summary of Part 2 • • • • • Display ads usually less targeted than search ads Translates to lower CTRs Ads sold in contracts (GD) and on the spot (NGD) Different targeting options Need lots of user data for good targeting – Yahoo!, Google, Facebook, … Part 3 Afterthoughts Banner Blindness • People learn to ignore ads … … even when they are highly relevant – “Banner Blindness: The Irony of Attention Grabbing on the World Wide Web”, Benway ‘98 • Danger of falling CTRs due to over-imposure – Might be beneficial to show less advertising Search Result Bidding • In current sponsored search systems, advertisers bid on query terms • Could also bid on the search results – Show my ad whenever abc.com is returned – Show my ad whenever xyz appears in a snippet Why could this be useful? Next time, Feb 25/26, 2011 • This time I focused on breadth • Next time I’ll focus on depth • Which topics did you find most interesting? • Do you want more theory? More of an “economic overview”? More hands-on insights? More academic papers? Paid Summer Internships at Y! Research Barcelona • Cool location – Best beach City in the world (NG) http://travel.nationalgeographic.com/travel/top-10/beach-cities-photos/ http://www.travelandleisure.com/articles/10-best-city-beaches-in-the-world • Cool colleagues – international, dynamic, open environment • Cool data – search, mail, toolbar, finance, Flickr, … • Cool projects – The goal is *always* to publish at top venues Deadline JANUARY 15 http://barcelona.research.yahoo.net/internships Dziekuje! ingmar@yahoo-inc.com http://www.couchsurfing.org