>> Kamal Jain: It's my pleasure to introduce Mohammad Mahdian. He's from Yahoo, where they don't even recall their talks. And he spent two years already. Before that he was a post-doc researcher in Microsoft for two years and before that he was a Microsoft fellow. So here is Mohammad. >> Mohammad Mahdian: Thank you. Thanks, Kamal. It's good to be back. So this is -- I'm going to talk about externalities in online advertising, and this is based on -- this is based on two papers, one joint work with Arpita Ghosh that already appears in the (inaudible), and the second one is joint work with David Kempe which was going to appear in the (inaudible) workshop, so. So let me start with some introduction although I'm sure that most of the people in the room already know about these. Online advertising is a huge business. It's already 21 billion dollars in 2007 and it's one of the fastest growing segments of the advertising business. And the standard in this business is that advertisers specify how much they are willing to pay for each impression or click or whatever else that you are selling here. And the publishers decide which ads to show on a page based on the values that the advertisers are submitting and also based on the estimates that they have of values quality measures of the advertisers. Like the click-to rate. The probability that the user collision on their ad. Or other measures of quality. An important implicit assumption in all of the models that are standard in the business is that the value of an ad only depends on the ad that's shown and where it's shown on the page, but not on other factors, like other ads that are shown on this page. And obviously in (inaudible) wrong the different advertisers are actually competing for the same thing. They are competing for a user's attention and user's attention is a limited resource, therefore you should expect naturally that the value that an ad receives is a function of other ads that are shown on the same page near to it. So for example, if you increase the number of ads on a page that is going to the decease the value to each of those advertisers. That's intuitive. And not only that, the identity of the ads could also matter. For example, if I show an ad for Toyota next to an ad for Honda presumably that detracts attention from the ad for Toyota more than if I show an ad for Toyota and an ad for Ford, just because these two advertisers are essentially targeting Toyota and Hondas are targeting the same market segment, their ads for example Toyota and Fords are probably targeting other market segments. Or a better example for example if you search for Harry Potter, the ads that are shown probably an ad for Harry Potter movie is detracts from attention to an ad for Harry Potter book less so than two ads for Harry Potter book, for example. Okay. So basically this is -- this becomes a problem that's called in economic literature, this is called externalities, an effect that -- an effect that one agent can have on other agents by just receiving an (inaudible). And the problem becomes essentially the problem of mechanism design and externalities that has been constituted in economic literature and different contexts. So let me just mention a few related in economic literature and the last one in the CS literature. These are generally for designing auctions when there are externalities. It's not entirely relevant to what I'm going to talk about, which is mostly modeling the externalities in the context of online advertising. The last one is much more relevant. There are a couple of papers that I'm going to refer to later in the talk. That's about effect of different links that you have on the page, on the page click-to rate of the page link. The first one is the eye-tracking experiment by a bunch of people at Cornell, and the second one is is click-log evaluation by people at MSR. Okay. So in this talk I'm going to go over two models that we propose for externalities in online advertising. The first model is based on a rational choice model for the viewers, for the users of the search engine or whatever, publisher of the advertisements. And in -- for this problem, for this model, we are going to focus on the lead generation advertising which I'm going to define in a couple of slides. And the second model I'm going to talk about is based on a problematic model for viewer behavior. And for each of these essentially what you are going to do is that you are going to assume a model for how users behave and then based on that we derive how playing one ad on the page or sending one ad to a user is going to affect other ads that are the same user sees. So for each of these models that I'm going to define, we will discuss the competition, complexity of the winner determination problem, basically assuming that the values are going to follow a model like this, how should we decide which ads to show on the same page? And following that, there's a brief discussion of incentive compatible mechanism design. Okay. So let me start with talking about the lead generation advertising which is a segment of the online advertising business. I'm sure you have seen it if you have ever tried to buy car insurance or mortgage or even buying a car you go to -- there are a number of Websites that you can go to, you enter your information, and they contact the number of mortgage companies for example or car insurance companies and each of those companies will contact you directly with codes for car insurance or for whatever else that they're providing. This is mostly used by segments of the market like mortgage firms, insurance companies, auto dealers, distance education industry like Phoenix University and so on. So basically a lead is information is credible information provided by user and the lead generation companies collect all these leads and sell them to advertisers. And advertisers directly contact potential customers. But now there is the -- by the way, this is a huge segment of the market. In 2006, that was the latest year I could find data for, this was 1.3 billion dollars and it was like about eight percent of online advertising revenue total. There is an obvious trade-off here. For example if you -- if your information is sent to ten mortgage companies, each of those mortgage companies have lower chance of getting your business. There is an obvious trade-off here. And this is something that in reality people have to deal with. Like I've talked to people who are trying to design a lead generation business and this is the real question. To how many people, to how many advertisers should they send these leads. So here's an abstract model for this problem. Let's say we have end bidders, each bidder is an advertiser, and but the value that the bidder has, the value function that the bidder has depends on the other -- the set of all advertisers that are winning in this auction, so it's a function of the foot of all substance of one to N to non-negative real numbers. And VIFS is the value of I, assuming that the set of winners is S. Now, we want to design incentive compatible mechanism that maximize advertiser welfare, which is basically the sum of the values that advertisers receive. And the classical results in the economic literature is (inaudible) mechanism so that if you can actually find a set that maximizes this function, then there are simple payment schemes that can induce incentive compatible recording of -- reporting of values. So this is the abstract model at the very high level. Let me get into the specifics of the first model I want to define for externalities. Assume we have an advertisers number one through N. Now, a user type, here a user is an audience of the advertising, specifies the preference that this user has over advertisers 1 through N, as well as some outside auctions. Okay. So for example if you're buying a car, you've probably already walked into a dealership, you have some price quotes from them, but also you are researching online, you are going to receive some codes and you're going to compare all those codes as well as the outside option that you have. I'm going to denote the outside option by zero. Now, we have a prior on users types, a distribution of how the preferences go. And advertiser I receives a value of the I, which is a fixed number if this advertiser is chosen by the user, okay, which means that if this advertiser has the highest preference for the user among all the advertisements that this user sees. Now, given this, the value of VIFS can be defined as VIFS is this number VI times the probability that I is preferred to everything in S union zero, the outside option. And the probability is over the choice of the user, random type of user. Okay. And notice that in this model, the value of the set, the sum of the values to the advertisers is not necessarily a monotone function so it's not necessarily best for you to send the advertisement to as many advertisers as possible. And this is intuitive and the reason here is that if you have an advertiser that a lot of the users actually prefer but has a very low value, adding this advertiser to your set is going to decrease the overall value of the set. And notice that here by the value I mean the value to the advertisers. If you want to take the value to the users into account, that would be a whole different story. Okay. So now in order to look at the complexity of this problem, I have to define an input of presentation because here I'm assuming that I'm given a distribution over users types which is in general it's a -- it can't be given concisely. Yes? >>: (Inaudible) because if you are the only search engine in town, you can wipe out let's say Southwest Airlines because they don't pay anything for the ad but if the guy across town does show it and people like it, you will lose business. >> Mohammad Mahdian: You're talking about an user side. So on the advertiser side, I'm trying to basically modeling I'm trying to model existence of other options by having this outside option here, right. So basically if you're a user, you've searched other search engines, you might have physically walked into like basically retail stores and have gotten codes and you have some outside option based on all of those things. This is noted here. This is already included in their preferences of the user. So in some sense, you don't want to -- you want to keep the users happy to some extent as well, as much as the outside option forces you to basically. Okay. So yeah, that's good, feel free to interrupt me any time. Which Jennifer and Kirsten were here, I could be sure that I would be interrupted, but okay, so now I need to talk about the input representation. We need to input the representation of users preferences on advertisers. And the simplest model I can -- we could think of is explicit presentation. Assume that M, a fixed number of types of users, a small number of types of users and a user of type I with probability PI, so PI is all going to add up to one, and the references of users of type I is given by permutation over a set of all advertisers, actually I should say all advertisers union zero. So you have a permutation over a set of zero, 1, all the way to N. Probability for this type and that's it. So now, for this explicit representation, there is the winner determination problem becomes essentially like this, that you are given N a negative values, you want to VN, okay, these are the values to the advertisers if they receive the business from the user. We have a number K, which is basically a bound on how many advertisers can possibly receive the lead, and we have M permutation, pi 1 through pi M of 0 through N. These are -- each permutation depends -corresponds to one type of user. And a probability PJ is also associated with this permutation. And the question is to find a set S of cardinality at most K that maximizes this function. And this function basically what it says is the same expression that you have in the previous slide. We are summing over all types of users the probability of the user times the sum over all advertisers that are in the winning set, S the value of the advertiser times an indicator variable which is one if and only if this advertiser is preferred to all the other advertisers and outside option by this the particular user type, okay. So this is an optimization problem. The question is whether we can solve this optimization problem efficiently. So first of all, this is actually not hard to see is that the winner determination problem is MP hard, and the proof is actually pretty simple. The idea is that when all the values, even if all the values are equal, okay, still a problem is essentially a weighted version of the maximum K conversation and if you don't know what the maximum K coverage problem is it's basically you're given a number of sets and you want to select K of these sets to maximize the size of the union, okay. So you basically want to minimize their overlap and maximize the size of their union. So why is this problem a special case of the maximum K coverage or sorry why is maximum K coverage a special case of this problem? If all the values are the same, the only thing that you care about is how much of the users you are actually capturing, right, the values are all the same, right? So now advertiser J is going to correspond to a set of users, user types I that track J above the outside option. Okay? So these are the only user types that if advertiser J is shown are going to go with this advertiser or at least some advertiser. Now, the problem becomes to find a set of at most K advertisers to maximize the weight of the set of covered user types which is exact objective function in the maximum K coverage. And the maximum K coverage problem is MP hard, so that means that this problem is also computationally hard. But on a positive side, maximum K covers problem there is a simple intuitive greedy algorithm that achieves a constant factor approximation for that problem and it also it works very well in practice. So for a while our hope was that maybe we can do the same for this problem, maybe there is a simple algorithm that solves this problem with a constant approximation factor. And we actually spent a lot of time on this but eventually came up with the hardness of approximation proof which is pretty strong. We can show that the winner determination problem is hard to approximate within any factor better than N to the 1 minus epsilon, which is pretty strong negative result. And the proof is very simple deduction for independence problem. I'm going to talk about the proof, too, just because it gives some intuition of what type of hard instances are there. So in this respect this is actually a pretty simple reduction. You are given a graph, G, that has X vertices, okay, and the problem is to find the maximum independent set in this graph. And if you don't know what that means, it's a set of vertices that there is no edge between them, okay. And this problem is MP hard, it's very hard to approximate. Now, given this graph, I want to construct an instance of this winner determination problem that is essentially as hard as solving the maximum independent set problem in the graph. So if I have X vertices in the graph, I'm going to set the number of user types and also the number of advertisers to X, and also I'm going to set K to X. So K was the upper bound and the number of advertisers that could win. So basically that means that there is no upper bound in the number of advertisers that can win. So each vertex corresponds to one advertiser and also corresponds to one user type. Now, corresponding to node I, the advertiser that's defined corresponding to note I I'm going to give it a value of L to the power I where L is a really large number. Okay. So the values of different advertisers are going to be very, very different. And also notice that here I'm essentially I'm numbering the nodes from 1 through N, okay, so node 1, the value of the corresponding advertiser is L to power 1, node to L squared and so on and so forth. Also, I'm going to define this set NI. This is the set of neighbors of I in the graph that have index less than I. Okay? And the permutation pi I is defined this way. I'm going to rank all the elements of NI before I, it doesn't matter in which order I'm ranking these elements of NI, but everything in NI comes before I and then I put I and then I put the outside option. And then after the outside option it doesn't really matter, I can put anything. And the probability of this permutation also I set the probability of this permutation to some constant C divided by L to the I. And the constant C is set so that the sum of all the probabilities are important ones. So now let's see what's happening here. So I have N user types and I have N advertisers. The value of the I advertiser is L to the I and the probability of the I user type is C divided by L to the I. And notice that the preference of the Ith user essentially the only things that you -- this user has before the outside option are I and also neighbors of I that are -- that have index less than I, okay. So the value that this user is going to get, the expected value that this user type is going to get if she's assigned to advertiser I is going to be C divided by L to the I times L to the I which is C, a constant. Well, if she's assigned to any advertiser NI in this set, then the value is going to be C divided by L to the I times something that's much, much smaller than L to the I, it's like L to. J for some J less than I. So the value, if it happens that this advertise -- this user is assigned to an advertiser and NI, the value is going to be much, much lower than if she's assigned to I. So basically the total value that we get, the only dominant terms are going to be the ones that correspond to user types that are assigned to the same advertiser in essentially the advertiser with the same index. And as a result, we can prove this result basically. The point is that if you have two advertisers, two winning advertisers in your set that are connected by an edge in the graph, then at least one of those advertisers are not going to be able to derive this high value, okay? And as a result we get that the value of the optimal set is going to be something between C times the size of the maximum independent set in the graph and the same value plus some small number. So what this shows is that the problem is essentially the as hard as problem of solving the maximum independent set in the graph which we all know is a pretty difficult problem. So but on the positive side notice that this reduction uses instances where the advertisers value have a large span, okay. So you have values that -- you have advertisers that have a very, very low value for this lead and other advertisers that have a high value, which is not exactly a realistic situation. So actually that's the nice thing about looking at the competition, complexity of the problem and looking at the reduction because it gives you a feeling of what are the hard instances and then you can try to modify your approach, try to target instances that are not ruled out by those reductions. So basically here the questions would be if we have a bound on the maximum value divided by the minimum value can we get something better here? Can we of an algorithm that has a better factor? There is a simple R times E over E minus one algorithm by just completely ignoring the values, assuming that all the values are the same, and then running the maximum K coverage greedy algorithm that's going to give us this. Basically we're losing a factor of R because we're ignoring the values and another factor of E over E minus one because of the 3D maximum K coverage algorithm. But can we do better than this? And in fact, we can. There is a further technique that we can apply here. We can divide advertisers into log R buckets. Each bucket will correspond to advertisers that have value in some interval, in some exponentially increasing interval. So here I've set the Jth interval here, yeah, I've set the Ith interval as the set of all advertisers that have value between E to the I minus one where E is the base of the natural algorithm times E to the I times 3. Now, given this bucketing of the advertisers, notice that in each bucket the value of the advertiser are only off by a factor of E at most and now if you look at the optimal solution of the problem, this optimal solution is driving some of the value from each of these buckets, okay. So since there are log R different buckets, there must be at least one of these buckets that gives -- that in the optimal solution gives at least the one over log R fraction of the revenue. So that means that if I actually -- if I even pick a random bucket and so the maximum K coverage for this bucket, I'm going to get the factor that's at most log R plus 1 times E squared or E minus one here, log or plus one comes from the fact that I'm only looking at one bucket, instead of all of the buckets. There is a factor E that comes if the fact that I'm ignoring the values within each bucket and the values could be off by a factor E and there is another E over E minus 1 that's coming if the greedy maximum K coverage algorithm. So here is one positive result. There is an approximation algorithm with a factor called to this number and if I -- it's very easy to de-randomize this algorithm basically picking random bucket you can just pick the bucket that has the maximum value and in fact this algorithm so now if you want to turn this algorithm, this is an approximation algorithm, so if you want to use the VCG technology, the VC grove payment scheme to the turn this into an incentive compatible mechanism, one practical result says that this is only possible if the allocation rule is monotone. What that means is that if I'm an advertiser. If I increase my value, the algorithm should not drop me from the set of winners, okay, it should not be the case that increasing ones value decreases the likelihood that this person will be one of the winners. So this algorithm as I stated it is not monotone because when you increase your value you might fall into a different bucket and the competition might be tougher in that bucket, but in fact it's relatively straightforward, it's not difficult to actually turn this algorithm into a monotone algorithm by making these buckets essentially overlapping. And therefore we get a monotone allocation rule and using this and the VCG payment we can get an incentive compatible deterministic mechanism that approximates social welfare within a factor of the log B max divided (inaudible). And one nice thing is that the algorithm is actually -- I mean at the end of the day when you look at the algorithm, it's actually pretty simple and intuitive. Basically what it's doing is it's taking one threshold for the value and only looking at the advertisers that fall above this threshold and for those advertisers essentially solves the problem using a maximum K coverage greedy algorithm. And the threshold is chosen, you know, in a way that maximizes the revenue. Okay. So that's one positive result. There are a couple of other positive results that I'm just going to mention. For other special cases of the preferences we can also get exact algorithm, exact (inaudible) algorithm. Again, the point is if you look at the reduction, you see that the references that we are giving the users are very different from one user to another user the preferences are very, very different. If the preferences are correlated in some sense then the -- then the -- we have a better picture. For example, if the preferences are single picked, I'm not going to define this, but essentially means that the -- so like for example in the -- if there's a spectrum of everything from right to left and each user has an ideal point and essentially ranks things based on the distance between himself, a lift between his IV point and the object, then in this case we can actually get an exact algorithm for the optimization problem. And also if all the preferences are some notion of perseveration of a single ranking, then we can get an exact algorithm. Of if the algorithm use dynamic programming there are a lot of details there, but there is nothing fundamentally difficult. Okay. So that's pretty much all I wanted to say about the first model. I'm going to get back to this at the end and have some discussion. Yes? >>: (Inaudible) but do you assume that the value (inaudible) independent of the user type? >> Mohammad Mahdian: Well, I am assuming that. The reason we are assuming that is basically that the user types you can't necessarily observe the user type here, okay, so we are assuming that there are these user types but when a user comes to your search engine you don't necessarily know what user type is this. >>: Right. >> Mohammad Mahdian: But obviously you might be able to use external information like targeting information in order to deduce things about the user type. And that's a very good question actually but basically this assumption is not losing much in generality because if you actually have external information that you could target things better, you can basically essentially separate these markets like if based on additional information you can guess whether this user is looking for Harry Potter the book or Harry Potter the movie, you can basically assume that you have two different markets here and solve the optimization problem independently for each of these markets. And presumably your set of winners should be different if you actually have some external information. That's what targeting means really. >>: (Inaudible) but those users have (inaudible). So if we can assume that for a single user type (inaudible) is the same (inaudible). >> Mohammad Mahdian: Say it again? >>: So the reason (inaudible) is because (inaudible) I assume that they are intended for different user classes which have permutations of preferences. >> Mohammad Mahdian: Okay. Let's see if I'm understanding the question here. Basically if you have some external information that helps to classify the users if you can basically -- if you can tell user 1 from user two, then a separate market for user 1 and a separate market for user 2 and for each of those you can do the optimization for a set independently. >>: (Inaudible). All the users have the same (inaudible) in that case? >> Mohammad Mahdian: All the users have the same preference permutation but the values are different. That's your question, right? The values of advertiser for different user types. If they all have the same user type then, sure, I mean basically you can assume that the advertisers have the average value for these guys and you can just merge them into one. If you can't observe them. Yes? >>: The model of very large changes in value and very large change in probability you don't (inaudible) optimize it (inaudible) model. >> Mohammad Mahdian: What was that? >>: The (inaudible) very low probability, very high gain for the crook for some users. >> Mohammad Mahdian: Thanks for the comment. So that was a good question. Maybe we can discuss it afterwards. I think there might be some interesting problem there. But there we can actually use -- even if user types are unobservable, is there a way to take advantage of differences in permutations in the order to target more profitable user types. That's a good question. Yes? >>: (Inaudible) monotone, do you (inaudible) the benefit of the search engine? >> Mohammad Mahdian: You mean the revenue? >>: Yes. >> Mohammad Mahdian: No, these, the algorithms are based on usage, the mechanisms are based on VCG, obviously they're all trying to maximize the social welfare which is the sum of the values to the advertisers and to the search engine. If you want to optimize for the revenue that general -- I mean this is not a theoretical result but like basically the general approach is to use a mechanism like this but set the right reserve prices for different items. And that usually gets you close to the maximum revenue. The difficulty in theoretically analyzing and designing an algorithm like that is that you usually need to make assumptions about distributions of the user types or at least do some sort of sampling to deduce things about the distribution user types yourself which makes the result not as clean and probably not as directly applicable to practice. Yes? >>: (Inaudible) is that really the case, because if a user is looking for something, I would think intuitively that whichever advertiser is deriving the least value (inaudible) best choice. >> Mohammad Mahdian: Okay. So that's also another good question. Basically we are taking part of the -- taking the user values and advertiser values as exogenous given but basically your question is maybe these things are also determined in the gain. I have not looked at that. That is a good question. Okay. So let's move on to the second part of the talk, which is the other type of model for externalities. So this part is basically we're looking at a probabilistic model for user behavior and we're defining model for externalities based on this probabilistic model, and our focus is going to be as (inaudible) and the main characteristics of the search ads in comparison with the previous part, the lead generation is that the sponsored search ads are listed in a column. Like for example this is Google search, you see that all the advertisements are on the side and they are listed in some order and presumably the higher advertisements have higher value for the advertisers than the lower ones. So we're going to look at the probabilistic model for how users view and click on these ads based on that's motivated by click log analysis and eye tracking experiments and a couple of papers that I mentioned earlier. So this model was proposed by Craswell, Goter, Taylor and Ramsey from MSR for organic search results earlier in the last wisdom conference. And they actually did the click log analysis that confirms that this model is a better model for estimating a click rate and model that I'm going to define in the next slide. And also independently in the context of adductions that this is the time for a bunch of people like Google and it's going to -- they also have written the paper which has some overlap with ours that's going to come up here in the same conference. Okay. So let me define precisely what's going on in sponsor search advertising. For each search the system shows K, typically K is something around 10, 8, 10, 12. They show K ads in assorted order and they click through a different ad as the probability that this ad is they clicked on, and the way it's made it's usually using this assumption that's known as the probability. The probability assumption says that they click through I that's placed in position J, so positions are 1 through K, is the product of two terms. One term only depends on I, okay, so let's call it alpha I, and the only term only depends on J. Let's call it lambda J And so an termination of this assumption is that when the user is looking at the advertisements, the user views the position J with probability lambda J, so this is the probability that this user even sees this position, and then assuming that this user sees this position, she's going to click on the ad in this position with a probability that depends on the quality of this ad, okay. That's called alpha I. So this is a standard assumption and as far as I know, all of the auction engines for sponsor search are basically built based on this assumption and that click to rate learning everything is based on this assumption. Now, what I'm going to talk about is a model that's essentially that's an alternative proposal. We are assuming that each ad is specified by three parameters, one parameter is the value which is what we had before. This is the value that the advertiser receives if there is a click on their ad and notice that I've switched from value per lead that we had in the previous part of the talk to value per click. Now I'm assuming that we have some fixed value per click, which is not an entirely accurate assumption but I'm going to go with this assumption for now. So there is also another parameter which is a probability QI that if a user views the ad, she will click on it. And finally there is a probability CI that if a use views an ad, she will continue viewing the next ad as well. So now this additional parameter CI allows us for situations where when you see an ad and it's really a good ad that already satisfies your propose, you're going to click on it and you're not going to look at the other ads afterwards. And it can also model situations that are completely opposite. Like you see an ad that is so crappy that you just gave up looking at the ads and you don't look at the ads afterwards. So now we assume that the user starts from the top position, the first position. She looks at this first ad with probability one. That's just a matter of scaling. It's not really an assumption. And then with probability C1 where 1 is the index of the advertisements in that spot is going to -- she's going to look at the second ad and so on and so forth. So given this model for advertise their behavior, feel free to interrupt me if there is anything that ->>: (Inaudible) independent of whether or not she collision? Is it ->> Mohammad Mahdian: Very good question. I'm assuming that this probability are independent of the previous ones but this is an assumption without loss of generate because essentially I'm looking at the aggregate probability. So this probability is -- for the purpose of the optimization for either context it might be different actually, but for the purpose of the optimization I only care about the aggregate probability. Right. Since when I'm -- when I'm deciding which ads to put, I don't have the information with their -- this guy is going to click on an ad or not. So I can't make decisions based on that. And as a result, it's enough to have the aggregate probability of activity. But that's a very good question. Most likely the probability is not independent of whether you are clicking or not. Yes? >>: (Inaudible). >> Mohammad Mahdian: So you should see I be always left minus QR because if you are clicking on an ad, you presumably are not going to click on future ads. So I'm not going to make that assumption that sounds reasonable but I'm not going to make that assumption. Yeah. I mean, I don't need that assumption basically. >>: (Inaudible) user really want to buy something they can click on the first page and see if they don't like it they go back. >> Mohammad Mahdian: Right. Right. Exactly. Statistically probably it happens in the small portion of the on o small fraction of the users do this, but presumably maybe there's a large benefit in looking at those small fractions. I'm not going to make an assumption like this basically. Yes? >>: The CI could be strongly dependent on the design I guess. If you could get one click less you open it a new tab you get more multiple ad. >> Mohammad Mahdian: Sure. I mean, I'm sure these things depend a lot on the user interface design, but basically here I'm focusing on a part of it that's really a function of this particular ad that I'm placing there. I want to see what's the effect of this particular ad that I'm placing. And actually just to clarify the connection with the previous work with the Craswell results in the context of organic search results, their model, first of all they don't have the values just because it's for organic search results there are no bids, you just want to maximize the click-to rate, and they also assume that QI is actually precisely equal to CI. Okay. So the rest -- what is it? Oh, one minus, yes, that's correct. So basically what they're assuming is that you're clicking on an ad with probability equal to QI and if you don't click on an ad, on this ad, you're going to go to the next ad. So that's the assumption that they are making which seems a little restrictive, but the thing is that even with this assumption, they are showing that based on the click logs this is a better fit to the click log data than the (inaudible) model. Which is pretty surprising, actually. Okay. So now, given this model, formerly ads 1 through K are displayed in this order, the probability that ad I is clicked on is going to be the product of C1 through CI minus 1 times QI. And therefore the winner of the determination problem becomes to find an ordering of the -- of K ads, of at most K ads that maximizes this function, V1 times Q1 plus V2 times Q1, Q2 plus and so on and so forth, basically the product of this term and the I sum over all I. Now, here as it turns out actually the problem is much easier from a competition or complexity perspective. There is a limit that shows that if there is no limit on the number of ad spots that are shown the optimal ordering is to sort all the ads in decreasing order of VI times QI divided by RO minus CI. Okay. So this is a parameter that the ads needs to be shown based on. So you can think of this as essentially the value of the advertiser times some squashed version of the quick rate of the advertiser, okay. And that's the optimal ordering if you don't have any limit on the number of ads that you can show on the side of the page. Obviously you do have a limit on the number of ads that you can show on the side of the page. But still the probably can be solved. The proof of this element, I'm not going to prove it but it's based on a simple exchange argument. If you've seen proofs of a greedy scheduling algorithms doing optimal, you know exactly what I'm talking about. You assume that you have the optimal ordering and you show that if this ordering is violated among two consecutive elements by switching them, you are going to increase the value. So it's pretty simple. So if you do have a limit on the number of ad starts that you can show, obviously this, the optimal ordering might be somewhat different from this, because for example for the last ad that you are showing, the last slot, there is nothing after this, so you don't really care about the continuation probability. But as it turns out, you can still solve the problem optimally because we can show that still if you select the sort of ads that are shown in the page, these ads should be sorted based on this order, okay, so now the problem becomes only to select the sort of ads that you show on this page. And this can be done using basically dynamic programming approach, given the (inaudible) you can do it with dynamic approach. Okay. So that's pretty much a complete picture for this model that I defined. There are a number of channelizations of the model. First of all you could have position dependent the multipliers so in this model I was assuming that the probability that you go from ad I to ad I plus 1, if you -- that you view at I plus one, assuming that you ordered to view it at I is a term that's only depending on the advertiser that's displayed in slot I. But in general, you could assume that this probability is not only dependent on the ad but also on things that are dependent on the slot itself, okay. So to some extent this address is guidon's question that you have there could be things that are dependent on the slot based on the interface design, for example, if something falls off the page then the probability of going to that ad is significantly lower, no matter what ad is shown at the end. So for this model, assuming that you have this position dependent multiplier as well, there's a simple for approximation algorithm that you can give and the algorithm is simple and intuitive and it seems applicable in practice. Theoretically we can also get a quasi-polynomial time approximation so basically for any factor that you want we can approximate the problem within that factor within time that's not quite polynomial but sort of like. Can to the power log K. So yes? >>: (Inaudible) similar results when (inaudible). >> Mohammad Mahdian: I haven't thought about that. Some of the techniques might apply there, but I have not thought about that. All right. So but from practical perspective the polynomial time approximation is a bit too slow to be applicable practically. And also theoretically another interesting question is that we don't have an MP hardness proof in this case. They've actually tried to prove MP hardness but they haven't been able to. So it's not entirely clear what's the picture theoretically for this problem. Another generalization of this is when you have multiple ad slates. For example usually for a sponsor search you have a top slate for ads and you have a east slate for ads and in this case you might assume that the advertiser -- that the users behave slightly differently so they don't necessarily jump from the bottom ad on the north to the top ad on the east. They might be looking at different ad slates with different probabilities. There is a generalization for each case and in fact we can approximate the problem pretty well if the number of different ad slates is constant, which is pretty realistic. Usually the number of different ad slates at most two or three. Okay. So that's it. I'm just going to conclude by discussing a number of interesting open directions. First of all our contribution was to define models for externalities and non-linear advertising based on assumptions about how users behave and we discussed the computational hardness of the winner determination problem which is the fundamental problem assuming that you have a model for externalities. Now, there are -- this is a very interesting field. This is pretty new. People have not really looked at the externalities by much and this is something that there is a lot of potential in it because the business is huge, it's as I said in 2007 there was 21 billion dollars spent on online advertising which is a huge number. And the whole business is basically based on this assumption that the click-to rates are separable, and this is obviously wrong. There is a lot that could be gained here. There are heuristic approaches that seem to have helped. Like for example if you look at the data, for example, Google shows fewer ads than both Microsoft and Yahoo, whereas the revenue for search is considerably higher than both Microsoft and Yahoo and that's puzzling. And one justification for that is that there is better targeting there. There is this fact even though it's not explicitly taken into account in estimating the click rate still they take into account that showing more ads doesn't necessarily increase your total revenue. So now here on the theoretical front we are still at the beginning, we still need to define the models that are both interesting from a practical point of view and also makes sense theoretically. First of all, there is a question of experimental evaluation of these models. The only result that I know, the published result that I know on this is the work by accuracy we will et al for organic search results and obviously there is much more to be done here. Now, there is also the question of whether we can come up with more general markovian model for user behavior. So basically the model that we are talking about is assuming that some sort of -- the users are following some sort of Markov chain where the parameters of this Markov chain are coming from the advertisers that are placed in different spots. And an interesting question is whether we can generalize this. And presumably users are not just in two states of clicking or not clicking, viewing or not viewing. If we can come up with something more complicated, that might be a more realistic model for how users behave. And there is also the question of these models that I've talked about and were basically two models, one, assuming that the users are acting perfectly rationally, so they see a set of advertisements, they select the one that's maximizing their utility among these, the other one was assuming that the users are basically just Markov chains, okay, they're just transition with given probabilities. And there is a third class of algorithms, there is a paper by Susan Asty (phonetic) and Glen Ellison (phonetic) that is looking essentially at something like this. So there is also an element of signal link here. If you're showing an ad on the top, you're signalling to the user that value equality of this ad is higher than the other ones, okay. So that by itself could increase the probability of this ad being clicked on, independent of the position and also independent of the ad identity. So it would be interesting to look at combinations of these models because the reality is probably somewhere in between. Users don't look at all the ads and select the best ones. Users do care about which ads they perceive as better than others. And also users do look at the basically the confidence vote that you are giving to the different advertisements by placing them higher or lower. So that would be a very interesting theoretical direction as for an experimental direction. There is a question of learning externalities. So one problem with all of these models is that the more complicated your model gets, the harder it becomes to learn the parameters of the model and basically do anything practical based on that, so we have to be careful not to get the models too complicated, and there is the question of learning these parameters of this model. For example, one very specific theoretical question is for the cascade model is there any algorithms similar to the multi unbanded problem that would converge over time to the optimal ranking of the advertisements. I don't know the answer to that. And that's an interesting theoretical question there. There's the connection to the literature on diversity, so in the web search research literature basically there are a lot of papers that start with this assumption that well, we want to -- we wanted to define an ordering of the search results for example and we all know that diversity is good, so it's good to incorporate some element of diversity in the search results. Another question is how should we incorporate this and so on and so forth. But the only thing we start with this assumption that it's good to have diversity. One nice feature of these models for externalities is that they don't start with this assumption but they could result in diversity. Like for example in the rational choice model if you assume that you have different user types that care about like one user type cares about Harry Potter the book, the other user type cares about Harry Potter the movie and these user types actually have different preferences, now here in order to maximize your click to rate without explicitly incorporating any element of diversity, you do have to have a diverse search result. Okay. So that's an interesting way to look at the diversity problem, both for the web search result and also for advertising. Basically looking at the diversity as a way to increase the click-to rate. And I haven't seen anything done based on this approach. Another interesting direction is to look at the long term externalities. Here I was looking basically the short-term externality. If I'm showing this ad next to another ad, how is it going to affect click-to rate of the second half, okay. But there is another effect here, that's the long term externality. If you keep showing good ads to the users, the users will become much more likely to click on your ads, okay. And that's another factor that's distinguishing Google from Yahoo and Microsoft. I mean that's one of the hypotheses why they have a higher revenue per search than Microsoft or Yahoo because if being better at showing ads that are more relevant and as a result the average click to rate of the users are higher on Google versus Microsoft and Yahoo for example. And traditionally for traditional forms of advertising these things have been looked at. The difference between traditional advertising an online advertising is that in traditional advertising there are -- there is not much that could be measured basically, okay, so but in online advertising everything is logged so you can measure basically everything and can try to learn things based on this and optimize your allocation based on this. For traditional advertising and economic literature what has been looked at is basically this truth in advertising regulations. Sometimes advertisers can benefit by forcing the government to essentially establish regulations that does not let the advertisers lie in their advertisements, just because even though it's limiting themselves but in the long run it can benefit them because that would increase the trust. So that's -- there is a similar question in online advertising and it would be interesting to actually theoretically look at this. Finally there's the example of online dating which is another very interesting special case of the lead generation business. It's actually one of the fast test growing segments of the lead generation industry because if you think of it, this is really a lead generation business and it hasn't really been looked at this way. I mean most of the online dating businesses so far are based on fixed fee subscriptions, but this is really -- this is really a matter that people care about. And I've actually talked to people that are of startups that are doing online dating and this is a real question for them, like if somebody is searching for a possible partner, what's search results should they show them, should they show them the most quote unquote desirable people so should they try to diversify this in order to like basically take into account these externalities that these people are imposing on each other. >>: (Inaudible). >> Mohammad Mahdian: Obviously, yeah, there are all those problems as well. Yeah. But anyway, so this issue of externalities in the context of lead generation for online dating is an interesting research problem that has not been looked at. Okay. That's it. (Applause)