Please interrupt me at any point!
Online Advertising
Open lecture at Warsaw University
January 7/8, 2011
Ingmar Weber
Yahoo! Research Barcelona
Disclaimers & Acknowledgments
• This talk presents the opinions of the author. It
does not necessarily reflect the views of Yahoo!
Inc. or any other entity.
• Algorithms, techniques, features, etc. mentioned
here might or might not be in use by Yahoo! or
any other company.
• Some of the slides in this lecture are based on
slides for “Introduction to Computational
Advertising”, given by A. Broder and V. Josifovski
at Stanford University.
Goals of this Presentation
• Give an overview of the two main types of
online advertising; (i) search advertising
and (ii) display advertising
• Explain the key technical aspects behind
with a focus on computational aspects
• This time: more breadth
• Next time: more depth (you tell me where!)
Types of Online Advertising
Search Advertising
Display Advertising
E-mail Advertising
Part 1
Part 2
Part 0
Setting the Scene
Different Advertising Objectives
Brand Advertising
You’re not expected to
buy a rolex watch
What’s different?
Direct Marketing
Tries to cause an
(almost) immediate
US Online Spending share by objective
What’s bigger?
Branding or direct response?
Lots of $$$ (or zloty)
Poland’s state deficit in 2010: ~$11 billion
Poland’s agriculture GDP: ~$32 billion
Part 1
Search Advertising
The Life of an Ad - Terminology
“click-through rate”:
(# clicks)/(# impressions)
“landing page”
“target page”
“conversion” or “action”
“conversion rate”:
(# conversions)/(# page visits)
“tracking code”
<script type="text/javascript"
Search Advertising
Advertisements are sold in auctions
Advertisers bid on search terms [show live]
Different payment models
de-facto standard
CPC (cost per click)
Advertiser pays $X when an ad gets clicked
CPA (cost per action)
growing popularity
Advertiser pays $Y when a click on an ad leads to a
CPM (cost per mille [page impressions]) used for display ads
Advertiser pays $Z for 1,000 ad displayments
Bidding for search terms
Advertisers compete for search terms
“warsaw hotels”, “online advertising”, …
A click has a different value for different advertisers
Search on
need and
to decide:
profit margin
on conversion rate
* How
slots be assigned?
a ranked
list the
of sponsored
search results
99% of web site visitors don’t purchase anything
to decide:
1% buy a computer
- conversion
bid for
in click
Profit per computer sold $100
* How
$ 0.01
click -to$ bid?
per would
click you do it?
Expected profit per visitor $1 – value of a single visit/click
the order/inclusion
the decides
most expensive
search term?
slots are assigned to (successful) bidders
When a user clicks on a sponsored search result …
… payment is made by the advertiser
How much do people typically pay?
How much do people typically pay?
How much does X cost?
• Try to guess some expensive key words
– Clear (commercial) intent
– Very high value for new customer
• Keyword tool
– Small competition …
• The winner is …
– Mesothelioma
• Build six teams
• Think of terms to bid on (exact match) and
corresponding ads. You can choose the
target page!
• You’ll get 5 EUR per team to target the
US&Canada search market
• Ads will go live around 18h00 today
(Friday) and we’ll look at the results
tomorrow (Saturday) around 16h00
• All ads will run under my account
• All keywords have to be “distinct” (system
doesn’t allow self-competition)
• Assigned in reversing round robin fashion
• Max 5 key words and 1 ad per team
• The team with the largest number of clicks
by 16h00 on Saturday wins
• Please, no cheating
Pricing of Ads
How was it done?
What was wrong with that?
How is it done now?
Does that solve all problems?
Historic Overture mechanism
Slot assignment by bid order
Assign the slots in the order of the bid values
higher bid => higher slot
When a user clicks, you pay your bid value
You bid $1.00 per click? - You pay $1.00 per click!
Simple. - Intuitive. - Used for many years.
What’s wrong with this?
End of story? – No, because …
Difficult for advertisers to “play” this “game”:
There’s no equilibrium!
• Two available
5% and
4% respectively
to slots
• ThreePotential
bidders with
$20, $18,
$10 per click
of “being
What happens?
Bidder 2 bids $10.01 to beat Bidder 1 and to get a slot
Bidder 1 will not pay more than $10.02
Then bidder 2 bids $10.03
Then bidder 1 bids $10.04
… and the fun continues until $14
… when it all collapses back to $10.01
End of story? – And no, because …
Ads can have different motivations
– Motivating an action/purchase/click
– Simply placing/marketing a brand
Want to get rid of high-bidding free riders.
ebay could afford to bid for every term …
... because no one will click the ad!
“Buy * on ebay!”
* = world peace, grandmother, happiness, …
ebay cares more about page impressions
Addressing the first problem:
Second price auction
If only a single slot exists, do the following:
Assign the slot to the highest bidder.
Ex: Slot goes to Bidder 1 who bid $17.
Let him pay the second highest bid.
Ex: Bidder 1 pays $15, Bidder 2’s bid.
Theorem (Vickrey ‘61): Bidding truthfully is a
dominant strategy in this setting.
(c.f. stamp auctions 1878+)
Second Price Auction Explained
This ad slot is worth
€1 to me.
He’s “lying”.
I bid €0.80!
Pays €0.70.
Your title here
Your cool ad
text goes here.
I bid €0.70!
Bidding “truthfully” is always best.
Regardless of what others do.
Only works for a single slot …
Addressing the first problem:
Generalized second price auction
If many slots exist, do the following:
Assign the slots in (decreasing) order of the bids.
Let each one pay the next (lower) bid.
Called: Generalized second price (GSP) auction
Is bidding “truthfully” a dominant strategy?
Are there any dominant strategies?
Addressing the first problem:
Generalized second price auction
Same scenario again:
• Two available ad slots with CTR 5% and 4% respectively
• Three bidders with valuations $20, $18, $10 per click
What happens if everyone bids truthfully ($20, $18, $10 respectively)?
Bidder 1: ($20-$18)*0.05 = $0.10 profit per page impression
Bidder 2: ($18-$10)*0.04 = $0.32 profit per page impression
Bidder 3:
$0.00 profit per page impression
If bidder 1 bids $11 instead …
… his profit is ($20-$10)*0.04 = $0.40 per page impression
Bidding “truthfully” is not a dominant strategy in GSP.
In fact, no dominant strategy exists for GSP.
So, still saw-tooth under GSP?
As long as you bid less than the higher bid, your payment doesn’t
change …
… but the guy above gets charged more. So:
Bidder 2 increases bid to stay just slightly below bidder 1
No difference for his position/payment
But payment of other bidder 1 goes up
Bidder 1 can “retaliate” by underbidding bidder 2
Bidder 1 now pays less (for a worse slot)
Bidder 2 now pays more (for a better slot)
Bidder 1 and bidder 2 have swapped position and (kind of) bids.
“locally envy-free” if these games don’t happen.
Locally envy-free equilibria
“Internet Advertising and the GSP Auction: Selling Billions of
Dollars Worth of Keywords”, Edelman et al., 2006
A (pure Nash) equilibrium is locally envy-free
if for any rank i:
®i sg(i) – p(i) ¸ ®i-1 sg(i) – p(i-1)
®i = CTR at rank i (think “volume”)
p(i) = cost for rank i
small i = low rank = high CTR
Locally envy-free equilibria
Lemma 1: A locally envy-free equilibrium of
the GSP game corresponds to a stable
Stable assignment: nobody wants to swap position and payment with
anybody else
Proof: No swap with positions below as we
have an equilibrium: could just undercut
advertiser to make this swap.
Remains to show: no swap with positions
(far) above.
Locally envy-free equilibria
Proof (ctd):
Claim: resulting order is “assortative”, i.e. in
the order of the sg(i):
®i sg(i) – p(i) ¸ ®i+1 sg(i) – p(i+1) (equilibrium)
®i+1 sg(i+1) – p(i+1) ¸ ®i sg(i+1) – p(i) (envy-free)
(®i - ®i+1) sg(i) ¸ (®i - ®i+1) sg(i+1)
Locally envy-free equilibria
Proof (ctd): Suppose i wants to go to m<i
®i sg(i) – p(i) ¸ ®i-1 sg(i) – p(i-1)
®i-1 sg(i-1) – p(i-1) ¸ ®i-2 sg(i-1) – p(i-2)
®m+1 sg(m+1) – p(m+1) ¸ ®m sg(m+1) – p(m)
Replace all sq(x) by sq(i) (using Claim and ®j >
®j+1). Then add and cancel. Get:
®i sg(i) – p(i) ¸ ®m sg(i) – p(m)
Locally envy-free equilibria
Lemma 2: When there are more advertisers
than slots, then any stable assignment
corresponds to a locally envy free
equilibrium of the GSP game.
Could be an empty set …but
Theorem: Bidding bj = pV,(j-1)/®j-1 gives a
locally envy-free equilibrium with VCG
payments. Here pV,(j-1) are VCG payments.
Why is this of little practical relevance?
So, still saw-tooth under GSP?
At least GSP has equilibria, though not in
dominant strategies.
GSP is “reasonably stable”.
Payment depends on position, not on bid
“Correct” generalization of SP:
Vickrey-Clarke-Groves Mechanism
Assume “no ebay”: CTR depends only on slot
Assign the slots in bid order … (again)
Advertiser X has to pay for loss in (bid * clicks)
(Sum of (bi¢CTRi) before X enters the game sum of (bi¢CTRi) of other players after X enters) / CTRX
Example: …. next slide …
“Correct” generalization of SP:
Vickrey-Clarke-Groves Mechanism
Same scenario again:
3 advertisers: bids $20, $18, $10 (their valuations)
Two slots: CTR 5%, CTR 4% [think: 5 clicks, 4 click]
Slots go to bids $20 and $18 respectively.
Corresponding payments?
Advertiser 1:
W/o adv. 1, sum over adv. 2 and 3
$18*0.05 + $10*0.04 = $1.30
W/ adv. 1, sum only over adv. 2
$18*0.04 = $0.72
Advertiser 2:
Without adv. 2, sum over adv. 1 and 3
$20*0.05 + $10*0.04 = $1.40
With adv. 2, sum only over adv. 1
$20*0.05 = $1.00
Payment by advertiser 1:
($1.30-$0.72)/0.05 = $11.6 (per click)
Payment by advertiser 2:
($1.40-$1.00)/0.04 = $10 (per click)
“Correct” generalization of SP:
Vickrey-Clarke-Groves Mechanism
Bidding “truthfully” is a dominant strategy in
this mechanism.
VCG mechanism not used for web advertising!
Still have ebay problem …
Vickrey got Nobel prize in economics in ‘96
(a few days before his death)
Addressing the “ebay problem”
Slot assignment by revenue order
Have weights for different advertisers
Measure probability of click (= quality of ad)
ctrebay = 0.001, ctringmar = 0.01
Revenue ordering vs. bid ordering
30% more revenue per page impression
Assign slots in (decreasing) order of
ctri ¢bi (~ revenue for search engine)
Pay minimum bid needed to stay ahead:
pi = ctri+1¢bi+1/ctri
GSP in Practice
• GSP with revenue ordering used by all
major search engines
• But with modifications …
– minimum price (“reserve price”)
– number of slots is variable
– quality of landing page to avoid frustration
– positional constraints
“Putting Nobel Prize-winning
theories to work” ?
Google’s unique auction model uses
Nobel Prize-winning economic theory to
eliminate the winner’s curse – that feeling
that you’ve paid too much. While the
auction model lets advertisers bid on
keywords, the AdWords™ Discounter
makes sure that they only pay what they
need in order to stay ahead of their
nearest competitor.
Knowing the Click-Through Rates
• How do we know the click-through rates?
– Estimated from past performance
• What if aWhat
the problem?
– If we show his ads, lose chance to show other
good ads.
– If we don’t show his ads, might not discover a
new high-performing ad.
Solution: Explore-Exploit
Multi-Armed Bandits
$3 Expect $2
$4 Expect $8
First, explore!
Now, exploit!
Expect $6
Multi-Armed Bandits
• Set of k bandits, i.e. real distributions
B = {R1, …, RK}
¹k = mean(Rk) ¹* = maxk {¹k}
Game is played for H rounds
Regret: ½(H) = H ¹* - t=1H rt where rt is the
(random) reward at time t
Want ½(H)/H ! 0 with probability 1 as H! 1
Multi-Armed Bandits
Epsilon-greedy strategy:
The currently best bandit is selected for a fraction
of 1- ² of the rounds, and a bandit selected
uniformly at random for a fraction of ².
Restless Bandit Problem – distributions change
Arm Acquiring Bandit – new bandits arrive
Practical CTR Complications
• CTR depends also presence/absence of
other ads
• And what the user has seen in the past
• And on quality of search results
• Should we show the worst search results
so that users are “desperate” and click the
• Click fraud
– On opponent's paid search results (10%-20%)
– On the contextual ads of your homepage
• Impression fraud
Other kinds?
– Give your opponent a lower CTR
– Lowers the amount you’ll have to bid
• What should search engines do?
– All search engines do not bill for fraudulent clicks
– See case “Lane’s Gifts v. Google”
Does CPA Solve Fraud?
Click fraud no longer works. Only get charged for
“actions”, aka conversion.
End of story?
Now advertisers can cheat by underreporting
conversions. Can Y!/G trust advertisers?
Have to hand over monitoring to search engine. Can
advertisers trust Y!/G?
Very, very sparse data to derive estimates. Hard for
Y!/G to make optimal decisions.
Mobile Sponsored Search
• Mobile devices offer more context
– Location
– More short-term needs -> more monetizable
• More focused user attention
– Can’t just open another tab while loading
• More positive associations
– People tend to feel “closer” to their mobile
Summary of Part 1
Search advertising is a multi-billion dollar business
Allows very targeted advertising
Fair payment model: you only pay for clicks (CPC)
How much you pay depends on
– Your bid
– Fraction of people clicking your ad (CTR)
• Payment reasonably stable and “gaming” is difficult
• Practical problems such as learning CTRs and
avoiding click fraud
• 6 teams …
Part 2
Display Advertising
Display Advertising
Historical note: banners
• Banners seem to be the oldest standard format
in use
• According to Wikipedia the first banner ad ever
was sold in 1993 by Global Network Navigator
(GNN) to Heller, Ehrman, White, & McAuliffe, a
legal firm popular in Silicon Valley.
• GNN was a popular pre-Yahoo! directory
eventually sold to AOL in 1995
• Heller Ehrman White & McAuliffe was started in
1890 and went bankrupt in 2008. In 1929 they
negotiated the financing of the Bay Bridge.
Display Advertising
• Usually sold on a CPM basis
• Guaranteed delivery (GD): deliver 30 million
impressions on finance.yahoo.com in Feb ’11
– Typically large, “premium” campaigns
• Non-guaranteed delivery (NGD): sold in
auctions on the spot market at varying prices
– Typically smaller, ad-hoc campaigns
How much does it cost?
Components of a GD system
1. Forecast supply and demand
How many users will visit a page in a certain period?
2. Forecast NGD pricing
How much could we get on the spot market?
3. Admission control & pricing
30m impressions in July 2011 on sports.yahoo.com
Should we accept the contract? Can we meet the guarantee?
What price should we charge? How are other contracts impacted?
4. “Optimal” allocation of impressions to active contracts
What is the objective function?
Cannot re-run after every impression due to scalability.
“Simple” (stochastic) packing problem?
5. Ad serving
Demand (long term) depends on quality of allocation!
“females, 30-50, high income” more valuable than “teenager drop-outs”
Cannot only use low value impressions to satisfy contract
Optimal Allocation
• Optimal allocation
– Maximize a stated objective function subject to supply
and demand constraints
• What objective?
– Value of the remaining inventory? - Good for publisher
– Maximize quality? - Good for advertiser
• Need to balance utilities: publisher, advertiser, user,
& network!
Representative Allocations
A. Ghosh & al., “Randomized Bidding for Maximally
Representative Allocation”, Yahoo! Research Technical
Report 2008-003
• Unless the targeting is very fine-grained there is
a wide spectrum of quality of impressions
matching a typical contract
• Contract says: Male, US, auto interests. What
should be supply to this contract?
– Is it OK to supply 100% 15 year-old males,
daydreaming about cars, weekly allowances $25 ?
– Advertiser probably wants/expects a representative
sample of car-buying US male population
Publisher’s potential strategies
Assume publisher has just one GD contract
• Suboptimal strategy:
Why suboptimal?
– Deliver first all impressions to the contract
– Only after the contract is met, sell in spot market
• Bad for the publisher because some of the GD pageviews
may fetch lot more money on the spot than the contract
• Better strategy
– Put up every pageview on auction (as a seller)
– Also place a bid on it for the contract (as a buyer)
– Value determined by probability & penalty of not fulfilling the contract
Publisher-optimal bid strategy
• If target is 30 million, place the smallest constant
bid in each round so that exactly 30 million
pageviews are won
• All excess inventory will be sold to someone else
(not the GD contract) at a higher price.
• “Unfair” to the GD contract
– All impressions delivered are of low value
• 2 a.m. viewers
• viewers from poor neighborhoods
• basically, viewers nobody wanted!
Volume vs. price of winning bids on
spot market
Volume = number of
impressions sold at p
~ price density
Price on sport market used as proxy
for “quality” of impression
Price p
Find position for the arrow such that area
before the arrow = d (GD Advertiser gets
the cheapest stuff)
Find position for the arrow such that area
after the arrow = d (GD Advertiser gets the
most expensive stuff)
• The GD contract could get half of the
bottom stuff and half of the top stuff
• More fine-grained:
– Of the supply selling at every price, give d/s
fraction to the GD contract.
– Then, price distribution in GD mirrors the
intrinsic distribution in the total supply.
– Objective function must penalize deviation
from this ideal.
Problem setting
• Assume the publisher knows the distribution of
the external winning bid on the spot market
• Notation
– p = price (winning bid)
– f(p) = price density = the highest bid is drawn i.i.d.
from f
– s = total supply (inventory) of impressions
– d = demand (GD volume) for the contract
– t = target spend per impression (budget)
• d/s is the fraction of the total supply that needs
to be delivered to the (unique!) contract
Find an allocation a(p)
• a(p)/s = fractional allocation to GD at price
p, that is:
– There are s*f(p)*dp impressions available at
price p (or rather in interval [p,p+dp)
– The GD contract gets
a(p)/s * s*f(p)*dp = a(p)*f(p)*dp
impressions at price p
• Ideal: a(p)/s = d/s for all p
• Objective: close to this ideal
• u measures distance
Allocation Constraints
• a() is not assumed continuous a priori
• If indeed a(p)/s = d/s for all p, constraint is
Allocation Constraints
= the dollar amount “lost” due meeting the
contract. So we must have
• Recall t = the average budget per
impression. Publisher does get more than
this per impression.
Final Optimization Problem
over a()
Subject to
No solution if t (cost per impression) is too small.
Possible distance:
Kullback-Leibler divergence
• K-L divergence between two nonnegative
functions is
K-L Optimization Problem
over a()
Subject to
Parameter t governs revenue-fairness trade-off
Bidding strategy
• Now we have found an optimal allocation
– At price p give fraction a(p)/s to GD
• How can we implement the optimal
allocation a(p) in the auction environment?
– We have to bid randomly
– Bidding the same amount each round is
Stochastic Bidding
• Recall a(p)/s is the fraction of supply available at
price p that should be won for GD
• At price p, what fraction of the supply will be won
for GD?
• Fraction won = prob{GD bid > p} = 1 – H(p)
– H(p) is the GD bid distribution (cdf)
– a(p)/s = 1 – H(p)
• Get a(p)/s from optimization, convert to H(p)
– a(p) non-increasing
• Enter auction with probability a(0)/p
• Which ads could be shown on a page via
the spot market?
• Only they participate in bidding for the
Contextual Targeting
Contextual Targeting
How would you do it?
Taken from: http://tutorialfreakz.com/30-misplaced-ads/
• Show textual ads
• Also sold on a CPC basis
• Which “queries” should be triggered by
Phrase Extraction for Contextual
“Finding Advertising Keywords on Web Pages”, Yih et al., 2006
• Goal: given a page find phrases that are good for placing
• Reverse search problem: given a page, find the
queries that would match (summarize) the content of this
• Select ads based on a single selected keyword:
– Contextual Advertising translated into database approach of
Sponsored Search
– Reuse of the Sponsored Search infrastructure – lower cost
– Ad Networks earn less per impression in CA
• Lower click-through rates (high-variance)
• Lower conversion (less clear intent)
• revenue share with the publisher
System Architecture
Input: web page
1. Preprocessor
process html -> text
2. Candidate Selector
generate candidates = candidate bid phrases
3. Classifier
Machine learning?
score the candidates
4. Postprocessor
Combine scores -> probability of being “useful”
Output: bid phrases
1. Preprocessor
• Translate HTML into plain text
• Preserve the blocks in the original document
• Preserve info about outgoing anchor text,
meta tags
• Open source HTML parser for scraping –
• Part-of-Speech (POS) tagger – record the
type of the word
• Chunker – detecting noun phrases
2. Candidate Selection
• All phrases of length up to 5 (including single words)
– Within a single page block (sentence)
• Two dimensions of candidate selection:
– Individual occurrences extracted separately vs. combining all
occurrences into entry per page (separate vs. combined)
– Keep phrases or break up into individual words
• Label individual words with their relationship with a
phrase (if phrases are broken up):
Beginning of a phrase
Inside a phrase
Last word of a phrase
3. Classifier
• Given a phrase predict if it is “keyword” usable for
selecting ads
– “adverse affects of coffee” vs. “sat down on breakfast table”
• For the whole phrase a single binary classifier
– Logistic regression model P(Y=1|x) = 1/(1 + e-wx)
– x is vector of features of a given phrase
– w is a vector of importance weights learned from the
training set
• Decomposed – multi label classifier (B,I,L,…)
– P(Yi=1|x) = exwi/(i exwj)
3. Classifier: Features
• Linguistic features: is a noun; is a proper name; is a
noun phrase; are all words in the phrase of the same type
• Capitalization: any/all/first word capitalization
• Section based features:
– Hypertext – is the feature extracted from anchor text
– Title, Meta tags, URL
• IR features: tf, idf, log(tf), log(idf), sentence length, phrase
length, relative location in the document
• Query log features: log(phrase frequency),
log(first/second/interior word frequency)
• Feature reconciliation
– Binaryfeatures
features – OR
of all
– Real valued features – min
4. Postprocessor
• Score reconciliation: instance with the
highest score
• Separate words -> phrase probability:
– p1= probability of a phrase: product of the
confidence of the classification of each term
– p0 = probability of all the words of the phrase
being outside a keyword
– score = p1/(p1+p0)
Experiments: Data
828 pages
Indexed by MSN
Have ads
In the Internet Archive
One page per domain
Eliminate foreign and adult pages
Editors (8) instructed to seek highly prominent
keywords with advertising potential
Experiments: Metrics
• Editorial judgments
• Precision-recall – might be too difficult
– Too long for the judges to find all the relevant phrases
– Given a phrase – influence the judges
• A proxy for Precision-Recall
– top-1 = top-1 result is in the list selected by the editor,
count across the set of pages
– top-10 = % of top-10 results in the editor set,
averaged over the set of pages
Experiments: Results
Best performance for combining occurrences and not breaking up into word.
Demographic Targeting
Image is taken from:
A Glimpse at my Own Work
Behavioral Targeting
[…] for instance, if a visitor has a recent history of
researching SUVs and is a regular visitor of
Yahoo! Music, Yahoo! BT will have the insights to
serve up a relevant SUV ad while the visitor is
browsing the Yahoo! Music homepage.
Summary of Part 2
Display ads usually less targeted than search ads
Translates to lower CTRs
Ads sold in contracts (GD) and on the spot (NGD)
Different targeting options
Need lots of user data for good targeting
– Yahoo!, Google, Facebook, …
Part 3
Banner Blindness
• People learn to ignore ads …
… even when they are highly relevant
– “Banner Blindness: The Irony of Attention
Grabbing on the World Wide Web”, Benway ‘98
• Danger of falling CTRs due to over-imposure
– Might be beneficial to show less advertising
Search Result Bidding
• In current sponsored search systems,
advertisers bid on query terms
• Could also bid on the search results
– Show my ad whenever abc.com is returned
– Show my ad whenever xyz appears in a snippet
Why could this be useful?
Next time, Feb 25/26, 2011
• This time I focused on breadth
• Next time I’ll focus on depth
• Which topics did you find most interesting?
• Do you want more theory? More of an
“economic overview”? More hands-on
insights? More academic papers?
Paid Summer Internships at Y!
Research Barcelona
• Cool location
– Best beach City in the world (NG)
• Cool colleagues
– international, dynamic, open environment
• Cool data
– search, mail, toolbar, finance, Flickr, …
• Cool projects
– The goal is *always* to publish at top venues
Deadline JANUARY 15