Optimal Ad Ranking for Profit Maximization Raju Balakrishnan rajub@asu.edu Subbarao Kambhampati rao@asu.edu School of Computing and Informatics Arizona State University Tempe AZ 85287 ABSTRACT Despite the enormous commercial importance of on-line advertisements (ads), there has been little work done to clarify the basis for ranking and displaying them. Most existing methods rank ads as if the user views each of them in isolation. We will consider a more realistic user model that induces three mutual influences between displayed ads: (i) positional bias (for viewing ads placed higher up) (ii) similar ad fatigue (which reduces interest in an ad when similar ads have already been displayed above it) and (iii) browsing impatience (which accounts for the user abandoning the ad viewing based on the ads already seen). We will show that in general, when the inter-ad similarity is taken into account, optimal ranking is NP-hard. Ignoring inter-ad similarity, we state and prove the optimal ranking function for sorting the ads that is sensitive to the other two factors. We will show that the known ad ranking strategies correspond to restricted special cases of our ranking function. We also provide simulation studies that establish the effectiveness of its generality. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: Online Information Services 1. INTRODUCTION Online keyword advertisements (ads) are a multi-billion dollar business. The basic idea is to show a ranked list of advertisements to a user searching for related information. Individual vendors bid for certain keywords. When these keywords are seen to be relevant to the user’s search, the search engines will show, typically in a separate pane, a list of these ads. If the user clicks on any of the ads that are shown, then the search engine gets the bid amount as the revenue. Confirmed details of the exact ranking strategies used by the various search engines are of course hard to get. Yahoo! is said to rank its ads in terms of their bid amount;1 while Google and Microsoft are said to rank the ads in terms of the expected profit [11, 1], assessed as product of relevance and bid amount. There is little work done towards explaining which models are optimal under what assumptions. The few existing academic efforts on ad ranking all implicitly assume that the list of displayed ads can be decided by selecting individual ads in isolation (see Section 2). Unfortunately, such a ranking is insensitive to the mutual influences between the displayed ads induced by the typical page browsing patterns of users (c.f. [4, 5, 3]): Positional Bias: There is a bias for viewing ads positioned higher up in the list. Similar Ad Fatigue: The perceived relevance of an ad is reduced by the presence of similar ads higher up in the list [9]. Browsing Abandonment: The user is not compelled to view all the ads and may abandon browsing at any point of time either because of impatience or satiation. The abandonment likelihood may thus depend on the (number and content of the) ads already viewed. In this paper, we aim to investigate the optimal ranking of ads, that is sensitive to the mutual influences between displayed ads induced by the user browsing patterns. We start by showing that optimal ranking to maximize expected profit considering similar ad fatigue is NP-hard. For the case where similarities between the displayed ads can be ignored, we provide a simple ranking function for displaying ads that correctly combines the bid amount, ad relevance and abandonment probabilities. After proving the optimality of this ranking, we will show how it subsumes the ranking strategies said to be used by Yahoo!, Google and Microsoft. Our analysis shows these latter ranking strategies to be optimal only under significantly restrictive assumptions. Finally, to gain an understanding of the practical impact of the ranking, we provide a simulation study that compares the proposed strategy to the Google and Yahoo! models. While the immediate motivation for our work is ad ranking, we believe that the methodology can also be adapted to ranking in other e-commerce scenarios such as vendor recommendations. 2. Copyright is held by the owner/author. Proceedings of the 11th Inter- national Workshop on Web and Databases (WebDB 2008), June 13, 2008, Vancouver, Canada. RELATED WORK 1 This strategy was adapted by Yahoo! from Overture, but said to have changed in the past year. Ranking is an extensively researched problem in IR. While there has been some work in the IR community on browsing model sensitive ranking of search results, it is not directly applicable to ad ranking because the latter is influenced by the competing pulls of ad relevance and ad profit. Robertson [12] claimed that a retrieval order based on Probability Ranking Principle (PRP) leads to the largest number of relevant documents in a set of retrieved documents than any other policy. Later Gordon and Lenk [7, 8] identified the required assumptions for the optimality of the ranking according to PRP. The ad placement problem has attracted considerable research, mainly due to the popularity of search engine advertisements. Feng et al. [6] compare the ad-placement strategies of different search engines. Richardson et al. [11] tackle the problem of predicting the click-through rates for rarely clicked ads and Clarke et al. [2] examine the influence of caption features on click-through rates of the ads. Mehta et al. [10] propose a placement of ads that also considers the budget limits of each bidder and the bid amount of ads, formulating the problem as an online bipartite matching problem. All of these efforts ignore the mutual influence between ads induced by the user browsing patterns. Craswell et al. [3] empirically evaluated different models of user search result browsing to analyze the positional bias of click-through rates. They found that the cascade model— a model assuming linear traversal through the results ending with a clicked result—predicts the observed bias accurately. We adapt this model with extensions for our work. 3. PROBLEM SETUP Our aim is to rank the ads to maximize expected profit. The ranked placement of ads is denoted as an ordered set A, A = ha1 , a2 , .., an i, where the subscripts denote the position of ai in the ranked list starting from the highest ranked result first. The profit of ad ai is denoted by $(ai ). We use the basic cascade browsing model suggested by Craswell et al. [3] augmented by two extensions. In the basic cascade model the user views search results from top to bottom, deciding whether to click each result before moving to the next. Each ad ai in the ranked list, is either clicked with probability equal to its relevance R(ai ) or skipped with probability (1−R(ai )). A user who clicks an ad never comes back, and a user who skips always continues. Clicking on a result means the user must have skipped the results above and decided to click on the result. That is, in this model, probability of clicking ai (Pc (ai )) is, Pc (ai ) = R(ai ) i−1 Y (1 − R(aj )) (1) j=1 The static distribution of rapidly decreasing view probabilities in eye tracking studies [5], and the dynamic behavior of eye fixations gradually disseminating down the search results [4] are direct implications of cascade model. Three unrealistic assumptions in this model, as partly acknowledged by Craswell et al. [3] are: A1. The user keeps going down until he clicks a result. (In reality, user may abandon checking the search results without finding the required results.) A2. The click probability of an ad is dependent only on its position and is independent of the ads above it. (In reality, the presence of similar ads in the higher position reduces the click probability.) A3. The user will always click only one document. (In reality, multiple clicks are commonly observed in ads and search.) We base our analysis on a model that relaxes all three assumptions. To relax A1, we allow for the user abandoning search without finding a relevant result. The abandonment probability, γ(ai ), is modeled as a function of the result ai at the position. Incorporating this probability, Equation 1 changes as, Pc (ai ) = R(ai ) i−1 Y 1 − R(ai ) + γ(ai ) (2) j=1 Since the abandonment probability and number of ads clicked may vary depending on the nature of the application, we make no assumptions about the nature of the function γ(ai ) or the number of selections in deriving the proposed ranking strategy.2 To relax A2, we generalize the relevance function R(ai ) to include the mutual influence of similar ads on relevance. The relevance of an ad for the user is reduced by a similar ad placed above in the list, and this reduced residual relevance is denoted by Rr . Residual relevance depends on the set of results higher in the ranked list. Formally, the residual relevance Rr (ai |ha1 , a2 , .., ai−1 i) is the probability of user clicking ai , after seeing ai and all the ads above ai . Substituting residual relevance Rr for relevance function R in Equation 2 gives us context dependent click probabilities. For brevity we denote Rr (ai |ha1 , a2 , .., ai−1 i) by Rr (ai ) in the rest of this paper. Using the residual relevance we can rewrite click probability Pc as Pc (ai ) = Rr (ai ) i−1 Y 1 − Rr (ai ) + γ(ai ) (3) j=1 Regarding A3, though we derive our initial ranking model assuming single click, we prove that the proposed ranking is also optimal for multiple clicks. The user model may be schematically represented as a flow graph as shown in Figure 1. Labels on the edges refer to the probability of the user traversing them. Each vertex in the figure corresponds to a view epoch (see below), and the flow balance holds at each vertex. Starting from the top ad, the probability of the user clicking the first ad is R(a1 ) and probability of him abandoning browsing is γ(a1 ). The user goes beyond the first ad with probability 1 − (R(a1 ) + γ(a1 )) and so on for the subsequent results. 4. OPTIMAL AD RANKING Using the notations introduced, we formally define the ranking problem and derive optimal ranking in this section. The formal problem statement is, Choose the optimal ranking Aopt = ha1 , a2 , .., aN i of N 2 Other than the assumption that the probability of user leaving the search results at a position is independent of the results above. 1 γ ( a1 ) R(a 1 ) 1 − ( R ( a 1 ) + γ ( a 1 )) [ 1 − ( R ( a 1 ) + γ ( a 1 ))] γ ( a 2 ) [ 1 − ( R ( a 1 ) + γ ( a 1 ))] R ( a 2 ) [ 1 − ( R ( a 1 ) + γ ( a 1 ))][ 1 − ( R ( a 2 ) + γ ( a 2 ))] Figure 1: Flow Graph for user browsing the first two ads. The labels are the view probabilities and ai denotes the ad at the ith position ads to maximize the expected profit E($) = N X $(ai )Pc (ai ) (4) i=1 where N is the total number of ads to be ranked. Substituting click probability Pc from Equation 2 in Equation 4 we get, E($) = N X $(ai )Rr (ai ) i=1 i−1 Y [1 − (Rr (ai ) + γ(ai ))] (5) j=1 Note that we use residual relevance Rr here instead of absolute relevance R. Now we consider the optimal ranking of ads to maximize the objective function in Equation 5. Unfortunately, optimal ranking considering residual relevance turns out to be NPHard as proved below. Theorem 1. Ranking to optimize expected profit in Equation 5 considering inter-ad similarities is NP-Hard. The proof is by reduction from the independent set problem. See Appendix A-1 for complete proof. We now focus on optimal placement ignoring similarity. Replacing Rr by R, Equation 5 becomes, E($) = N X $(ai )R(ai ) i=1 i−1 Y [1 − (R(ai ) + γ(ai ))] (6) j=1 The optimal ranking maximizing this expected profit can be shown to be a sorting problem with a special ranking function: Theorem 2. The expected profit in Equation 6 is maximum if the ads are placed in descending order of the value of the ranking function RF , RF (ai ) = $(ai )R(ai ) R(ai ) + γ(ai ) (7) Proof Sketch: The proof is by induction on the number of ads in the set of ads to be ranked. We consider two placements P1 and P2 , where P1 is the placement according to the proposed strategy and P2 is a competing placement. We formulate the difference of expected profit of P2 from expected profit of P1 as (E($(P1 )) − E($(P2 ))). We show that this difference is always greater than or equal to zero, based on inductive hypothesis and on the fact that P1 is ordered in descending order of RF . See Appendix A-2 for complete proof.2 RF can be understood as the profit generated per unit click probability consumed by the ad. Referring to Figure 1, this ordering is intuitive since the top ads in the ranked list have more view probabilities, and placing ads with higher profit per consumed view probability is likely to increase profit. The expected profit in Theorem 2 considers ranking for single click case, and Theorem 3 extends the results to multiple clicks. Theorem 3. The expected profit is maximum even for multiple clicks if the ads are placed in descending order of value of RF as proposed in Theorem 2. Proof Sketch: We proved that ordering according to RF provides maximum expected profit for single click. Multiple clicks are the same as the user restarting her browsing from the result immediately below the last clicked ad. A simple induction on number of clicks based on this idea, using single click as base case, is sufficient to prove that the proposed placement provides maximum expected profit for multiple clicks. See Appendix A-3 for the complete proof.2 Mining the parameters for RF: Since the existing ad ranking strategies already have to mine the ad relevance, the only additional parameter needed to implement RF is the abandonment probability γ(ai ). These can be mined from click logs, similar to click through rates [11]. For example, assuming a single click model, the abandonment probability of the first ad a1 is, OCP2 (a2 ) + R(a1 ) γ(a1 ) = 1 − R(a2 ) where OCP2 (a2 ) is the observed click probability on a2 when a2 is displayed in the second position, i.e. if the ad a is displayed 100 times in the second position and is clicked 5 5 times, OCP2 (a) is 100 . As the computation progresses, the abandonment probabilities will converge to unique values irrespective of the position used to compute OCP . Similarly, the abandonment probabilities of the ads in the lower positions can be mined once we know the abandonment probabilities and relevance of the ads above. Initially, the abandonment probabilities can be calculated for all the ads placed in the top positions. The ads in the positions below can be used to mine probabilities—knowing abandon probabilities of ads above—as computation progresses. 5. ANALYZING EXISTING STRATEGIES In this section we compare the existing ad placement strategies to the ranking function RF in Equation 7. We will see that they all correspond to specific assumptions on the abandonment probability γ(ai ). Special Case 1: Ranking by Bid Amount: If we assume that the user never abandons browsing (i.e., ∀i γ(ai ) = 0), then Equation 7 reduces to RF (ai ) = $(ai ) This means that the ads are ranked purely in terms of their bid amount, a strategy that was attributed to Yahoo! (and Overture) until last year. When γ(ai ) = 0, we essentially have a user with infinite patience and will keep browsing downwards until he finds the relevant ad. So, to maximize profit, it makes perfect sense to rank ads by bid amount. More generally, for small abandonment probabilities, ranking by bid amount (the erstwhile Yahoo! placement) is near optimal. 40 RF Profit Profit x Relevance 35 30 25 Profit → Special Case 2: Ranking by Expected Bid Amount: Another approximation to γ(ai ) is to assume that it is negatively proportional to the relevance of the ad ai –the more relevant the current result, the less likely the user is to abandon the search. Specifically, if we have ∀i γ(ai ) ≈ k − R(ai ) (for some constant k between 0 and 1), then the Equation 7 reduces to, 20 15 10 $(ai )R(ai ) ∝ $(ai )R(ai ) RF (ai ) ≈ k This shows that ranking ads by their individual expected profit, as is supposedly done by Google and Microsoft, is near optimal as long as abandonment probability is negatively proportional to the relevance. 6. SIMULATION STUDIES The discussion in the previous section shows that the existing ad placement strategies are optimal only under more restrictive assumptions on the abandonment probabilities. This suggests that the revenues (and profits) may be improved by ranking using RF . To quantify the potential increases in expected profits offered by RF , we performed a large number of simulation runs comparing it with the other two existing strategies. In our experiments we assigned the relevance values as a uniform random number between 0 and α (0 ≤ α ≤ 1) and values of abandonment probabilities as uniform random between 0 and 1 − α (This will make sure that ∀i R(ai ) + γ(ai ) ≤ 1). The profits (bid amounts) for ads are assigned uniformly random between $0 and $10. Note that uniform random distribution is the maximum entropy probability distribution and makes least assumptions about the distribution of bid amounts. The number of relevant ads (corresponding to the number of bids on a query) is set to 50. Simulated users are made to click between 1 − 10 ads. We performed the simulation for multiple clicks. The number of ads clicked is set as a random number generated in a zipf distribution with exponent 1.5, with ranks increasing from 1 to 10. This is reasonable since a power law is most intuitive for the distribution of the number of clicks. Simulated users browse down the list. Users click an ad with probability equal to the relevance of the ad and abandon search with a probability equal to the abandonment probability of the ad. The set of ads to be placed is created at random for each run. For the same set of ads, three runs— one with each placement strategy—are performed. For each value of α, 200,000 rounds with each placement strategy are executed. The results for the three strategies are shown in Figure 2. We see that the proposed strategy (RF ) is the winner for all values of α. Ranking by profit strategy loses for low values of α and reaches the optimal value for large values of α, as discussed in Section 5. Note that for α = 1 (γ(ai ) = 0) ranking by profit placement gives exactly the same profit as the optimal placement. The profit from RF exceeds the profit from competing strategy by 40 − 50% for some values of α. For example for α = 0.3 P rof it × Relevance (competing strategy) gives an expected profit of $5.36 while RF gives a profit of $7.81 (exceeds by 45.7%) and for α = 0.4 5 0 0 0.1 0.2 0.3 0.4 0.5 α→ 0.6 0.7 0.8 0.9 1 Figure 2: Comparison of proposed placement (RF ) with the existing ad placement strategies. Each point corresponds to 2 × 105 simulation runs. Relevances are uniformly random in [0, α] and abandonment probabilities are a uniformly random in [0, 1 − α]. P rof it × Relevance gives a profit of $7.47, as against $10.15 by RF (exceeds by 35.9%). This is highly significant for a multi billion dollar market and millions of users. 7. CONCLUSION This paper addresses the problem of the optimal ranking of ads considering (i) relevance of ads (ii) reduction of click probability of ads due to the other relevant ads above (iii) abandonment of search due to user impatience (iv) reduction of click probability due to similar ads above. We prove that optimal ranking considering the inter-ad similarity (iv) is NP-Hard. Further, we propose an optimal and tractable ad placement strategy considering the other factors—factors (i), (ii), and (iii). We prove optimality of the expected profit from the proposed placement strategy. We compare the existing placement strategies to the proposed strategy and enumerate the assumptions under which the existing strategies are optimal. A simulation study verifies our results and gives an idea of the amount by which the proposed strategy can outperform the existing ones. In future, we hope to investigate approximation schemes for considering inter-ad similarity. We are also exploring the possibility of validating the ranking model on actual search ad data. Another interesting direction would be to generalize our analysis to consider the advertiser budget limits (c.f. [10]). Finally, although we focused on ad placement, the ranking can be extended to other related e-commerce scenarios such as vendor recommendations (e.g. Amazon’s product recommendations), and search results in e-commerce portals. Acknowledgements: This research is supported in part by the ONR grant N000140610058 and a 2008 Google research award. 8. [1] H. K. Bhargava and J. Feng. Paid placement strategies for internet search engiens. In WWW Proceedings, pages 117–123. ACM, May 2002. [2] C. L. A. Clarke, E. Agichtein, S. Dumais, and R. W. White. The influence of caption features on clickthrough patterns in web search. In Proceedings of SIGIR, pages 135–142. ACM, July 2007. [3] N. Craswell, O. Zoeter, M. Tayler, and B. Ramsey. An experimental comparison of click position bias models. In Proceedings of WSDM, pages 87–94, February 2008. [4] Enquiro-Research. The eye tracking example, http://www.enquiroresearch.com/eye-trackingexamples.aspx. [5] Enquiro-Research. The eye tracking study, http://www.enquiroresearch.com/images/eyetracking2sample.pdf. [6] J. Feng, H. K. Bhargava, and D. Pennock. Comparison of allocation rules for paid placement advertising in search engines. In ICEC Proceedings, pages 294–299. ACM, May 2003. [7] M. G. Gordon and P. Lenk. A utility theory examination of probability ranking principle in information retrieval. In Journal of American Society of Information Science, volume 41, pages 703–714, 1991. [8] M. G. Gordon and P. Lenk. When is probability ranking principle suboptimal? In Journal of American Society of Information Science, volume 42, 1992. [9] T. Hernandez and S. Kambhampati. Improving text collection selection techniques with coverage and overlap statistics. In Proceedings of WWW, 2005. [10] A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani. Adwords and generalized on-line matching. In ACM Transactions on Computational Logic, volume 5, August 2007. [11] M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: Estimating the click-through rate for new ads. In Proceedings of WWW, May 2007. [12] S. E. Robertson. The probability ranking principle in information retreival. In Journal Documentation, volume 33, pages 294–304, 1977. A-1 APPENDIX Proof of Theorem 1 Proof. Independent set problem can be formulated as a ranking problem considering similarities. Consider an unweighed graph G of n vertices {a1 , a2 , ..an } represented as an adjacency matrix with a value 1 in forward diagonal. This conversion is clearly polynomial time. Now, consider the values in the adjacency matrix as binary similarity values between the ads to be ranked. Let the ads have the same relevance, profitability and abandonment probability. In this set of n ads from {a1 , a2 , .., an }, the optimal ranking will have k pairwise independent ads as the top k ads. But the set of k independent ads corresponds to an independent set in graph G. A-2 P1 REFERENCES Proof of Theorem 2 Proof. Induction on number of ads in the set to be ranked. 1 2 u-1 u u+1 n n+1 P2 a1 a2 1 2 au − 1 au au + 1 u+1 au au au n an + 1 u-1 u an an + 1 diff2 a1 a2 diff1 n+1 − 1 + 1 + 2 au Figure 3: Two ad-placement, P1 optimal for n+1 ads according to the proposed strategy, and P2 maximal profit ordering based on inductive hypothesis and the (n + 1)th newly added ad al out of order. dif f1 is decrease in profit due to change in position of al downwards, and dif f2 is increase in in profit due to raise in position of ads—au to a(n+1) —by one in P2 . Base Case: single ad, ordering does not matter, true. Inductive Hypothesis: The profit for a set of n ads is maximized if they are are ordered in the proposed order. Induction: For a set of n + 1 ads, Consider two sequences of placements P1 = ha1 , a2 , .., an+1 i P2 = ha01 , a02 , a0n+1 i Let P1 be the placement in proposed order and P2 be the placement out of order. We have to prove E($1) ≥ E($2) where E($1) and E($2) are expected profits from P1 and P2 respectively. Since P1 is in the proposed order, RF (an+1 ) ≤ ∀i RF (ai ). As our proof is to handle the worst case scenario, we have to prove the hypothesis for the maximum value of E($2). Inductive hypothesis says that the profit is maximum for the set of size n if they are in descending order of RF . So for a worst case scenario we assume that the n ads in P2 are in order of descending RF , and (n + 1)th out of order. Since we are having the same set of ads in P1 and P2 we name them using the position in P1 . This placement is shown in Figure 3. Here P1 and P2 are same except for the fact that au is present in the last position in P2 . instead of at the uth position in P1 . The entire set of ads au+1 to an+1 in P1 is shifted up by one position in P2 . So we need to prove that the reduction in profit due to change in position in au (which is denoted by diff1 in Figure 3) is greater than increase in profit due to lifting of one position for au+1 to an+1 (diff2 in Figure 3). Note that the profit generated by a1 to au−1 is exactly same for P1 and P2 . We have to prove diff1 ≥ diff2 . Let us denote the expected profit of an ad ai (or a group of ads) from P1 as $1 (ai ) and expected profit from P2 by $2 (ai ). $1 (au ) = R(au )$(au ) u−1 Y 1 − R(ai ) + γ(ai ) i=1 Qn+1 $2 (au ) = R(au )$(au ) 1 − R(ai ) + γ(ai ) 1 − R(au ) + γ(au ) i=1 (Denominator accounts for the absence of au ) diff1 $1 (au ) − $2 (au ) Substituting by values of $1 (au ) and $2 (au ), "u−1 Y = R(au )$(au ) 1 − R(ai ) + γ(ai ) − ≤ # Taking the common factor outside, u−1 Y = R(au )$(au ) 1 − R(ai ) + γ(ai ) × i=1 " n+1 Y 1− # 1 − R(ai ) + γ(ai ) i=u+1 Qn Note: = 1 − R(ai ) + γ(ai ) 1 − R(au ) + γ(au ) i=u j=u+1 Canceling Common Factor from RHS, n+1 Y 1− 1 − R(ai ) + γ(ai ) − ! n+1 Y 1 − R(ai ) + γ(ai ) i=u+1 i=u+1 Let S be sequence of ads from au+1 to an+1 $1 (S) = n+1 X R(ai )$(ai ) i=u+1 $2 (S) = diff2 = diff2 = i−1 Y 1 − R(ai ) + γ(ai ) i=u+1 Qi−1 j=1 1 − R(ai ) + γ(ai ) 1 − R(au ) + γ(au ) i=u+1 R(ai )$(ai ) R(ai )$(ai ) i=u+1 " i−1 Y 1 − R(aj ) + γ(aj ) # (A-2) Dividing diff1 -diff2 by common factor, Let diff1 − diff2 ∆ = Qu−1 1 − R(ai ) + γ(ai ) i=1 " # n+1 Y ∆ = R(au )$(au ) 1 − 1 − R(ai ) + γ(ai ) − i=u+1 n+1 X i−1 Y 1 − (r1 .r2 ....rn ) − [(1 − r1 ) + (1 − r2 ).r1 + (1 − r3 ).r1 .r2 + ... + (1 − rn ).r1 .r2 .rn−1 ] = 1 − (r1 .r2 ....rn ) + [(1 − r1 ) + (r1 − r1 .r2 ) + (r1 .r2 − r1 r2 r3 ) + ... +r1 .r2 ..rn−1 − r1 .r2 ..rn−1 .rn ] Terms will cancel out and this function will be 0 on simplification, irrespective of the values of the ri s. This means r∆ ≥ 0 i.e the difference E($1 ) − E($2 ) ≥ 0. This completes the induction. A-3 1 − R(aj ) + γ(aj ) × Proof of Theorem 3 Proof. Induction on number of clicks. Base Case: Single click, proved in Theorem 2. Inductive Hypothesis: True for n clicks. 1 −1 For n+1th click, user starts browsing down at the position 1 − R(au ) + γ(au ) next to the nth clicked ad. let nth clicked ad be ac . Since We have to prove ∆ > 0 there is only one click remaining, optimal ordering of ads is in the descending order of RF by base case. But we know Simplifying and dividing by R(au ) + γ(au ) " # that the ads above ac+1 (a1 to ac ) are no longer relevant to n+1 Y R(au )$(au ) = 1− 1 − R(ai ) + γ(ai ) − the user since he has seen them already. Since the relevance R(au ) + γ(au ) values of these ads are zero now (hence RF s of these ads are i=u+1 ( n+1 ) also zero), the optimal order is to remove the ads from a1 to i−1 X Y ac and reorder the remaining ads in descending order of RF . R(ai )$(ai ) 1 − R(aj ) + γ(aj ) (A-3) Since the relevance and abandonment probabilities of these i=u+1 j=u+1 ads—ads except a1 to ac —remain unchanged, the resulting Since au+1 to an+1 is placed below au in P1 , optimal sequence will be the sub-sequence of original ranking R(au )$(au ) R(ai )$(ai ) n+1 starting from ac to the end of the list, which is exactly the ∀i=u+1 ≤ R(ai ) + γ(ai ) R(au ) + γ(au ) sequence the user traverses in original ranked list after nth click on ac i.e. the original ranked list is optimal for n + 1 clicks. i=u+1 " r∆ R(ai )$(ai ) = × j=1 1 −1 1 − R(au ) + γ(au ) ( For the above function, substitute the values ri = 1 − R(au+i ) + γ(au+i ) RHS j=u+1 Now, RHS is, $2 (S) − $1 (S) n+1 X Y i−1 1 − R(aj ) + γ(aj ) R(ai ) + γ(ai ) n+1 X j=1 Pn+1 R(au )$(au ) R(ai ) + γ(ai ) (A-4) R(au ) + γ(au ) Note that the value of r∆ in A-3 is strictly decreasing with value of $(ai )R(ai ) factor in second term. Using the condition in constraint A-4 for $(ai )R(ai ) in second term of Equation A-3 we get, # " n+1 Y R(au )$(au ) − 1− 1 − R(ai ) + γ(ai ) r∆ ≥ R(au ) + γ(au ) i=u+1 ( n+1 X R(au )$(au ) R(ai ) + γ(ai ) R(a u ) + γ(au ) i=u+1 (A-1) ) i−1 Y 1 − R(aj ) + γ(aj ) i=1 Qn+1 i=1 1 − (R(ai ) + γ(ai )) 1 − R(au ) + γ(au ) ∀n+1 i=u+1 R(ai )$(ai ) ⇒ = j=u #)