Harini Sridharan Stephen Duraski Behavioral Targeting (BT) is a technique that uses a user’s web-browsing behavior to determine which ads to display to a user. BT is a technique used by online advertisers to increase the effectiveness of their campaigns. As of the writing of this paper, it is underexplored in academia how much BT can truly help online advertising in search engines. First, we aim to empirically answer the question of whether Behavioral Targeting truly has the ability to help online advertising. Second, we aim to answer the question of how much BT can help online advertising using commonly used evaluation metrics. Finally, we aim to answer the question of which BT strategy can work better than others for ads delivery. We use 7 days’ ads click-through log data coming from a commercial search engine, dated from June 1st to 7th 2008. The dataset includes web page clicks and ad clicks of users. We do not include any demographic or geographic data to be clear of any privacy concerns. “BT uses information collected on an individual's web-browsing behavior, such as the pages they have visited or the searches they have made, to select which advertisements to display to that individual. Practitioners believe this helps them deliver their online advertisements to the users who are most likely to be influenced by them.” [15] We measured the effectiveness of BT using the click- through rate (CTR). 1. Does BT truly have the ability to help online advertising? To answer this question, we validate the basic assumption of BT, i.e. whether the users who clicked the same ad always have similar browsing and search behaviors and the users who clicked different ads have relatively different Web behaviors. 2. How much can BT help online advertising using commonly used evaluation metrics? To answer this question, we use the difference between ads CTR before and after applying BT strategies as the measurement, i.e. the degree of CTR improvement is considered as a measurement of how much BT can help online advertising. The statistical t-test is utilized to secure the significance of our experiment results 3. What BT strategy works better than others for ads delivery? We consider two types of BT strategies, which are (1) represent user behaviors by users’ clicked pages and (2) represent user behaviors by users’ search queries respectively. In addition, how long the user behaviors have occurred in the log data is also considered for user representation. We adopt the classical Term Frequency Inverse Document Frequency (TFIDF) indexing [8] by considering each user as a document and considering each URL as a term for mathematical user representation. A user is represented by a real-valued matrix 𝑈 ∈ 𝑅 𝑔∗𝑙 where g is the total users and l is the total URLs visited. A user is a row of U which is a real value vector with each user represented as: 𝑙 𝑢𝑖𝑗 = (log #𝑡𝑖𝑚𝑒𝑠 𝑢𝑠𝑒𝑟 𝑖 𝑐𝑙𝑖𝑐𝑘𝑒𝑑 𝑈𝑅𝐿 𝑗 + 1 ∗ log #𝑢𝑠𝑒𝑟 𝑐𝑙𝑖𝑐𝑘𝑒𝑑 𝑈𝑅𝐿 𝑗 𝑤ℎ𝑒𝑟𝑒 𝑖 = 1,2, … 𝑔, 𝑗 = 1,2, … 𝑙. We also build the user behavioral profile by simply considering all terms that appear in a user’s search queries as his previous behaviors. With this method we can represent each user in the Bag of Words (BOW) model. We use Porter Stemming [3] to stem terms and then remove stop words and terms that only appear one in a user’s query text. 470,712 terms are removed and 294,208 are reserved We use the same TFIDF indexing to index the users by query terms. Many commercial BT systems use long-term user behavior, while many others use short term behavior. There is no existing evidence to show which strategy is better. As a preliminary survey, we will consider 1 days’ user behavior as their short term profile, and 7 days’ user behavior as their long-term profile. We will validate and compare four different BT strategies in this paper. They are: 1. LP: using Long term user behavior all through the seven days and representing the user behavior by Pageviews 2. LQ: using Long term user behavior all through the seven days and representing the user behavior by Query terms 3. SP: using Short term user behavior (1 day) and representing user behavior by Pageviews 4. SQ: using Short term user behavior (1 day) and representing user behavior by Query terms 7 days CTR data from June 1st to 7th 2008. We use user IDs associated with cookies stored on the users’ OS to identify individual users. No other user information, such as demographic or geographic data, is used. To filter robots, any user with >100 clicks per day is filtered out. Data covers 6,426,633 unique users and 335,170 unique ads within 7 days. Ads with <30 clicks over 7 days are removed, leaving 17,901 ads, the results are averaged over that number. Let 𝐴 = 𝑎1 , 𝑎2 , … 𝑎𝑛 be the set of n ads in our dataset. For each ad 𝑎𝑖 suppose 𝑄𝑖 = {𝑞𝑖1 , 𝑞𝑖2 , … 𝑞𝑖𝑛 } are all the queries which have displayed or clicked 𝑎𝑖 . Users who have clicked or displayed 𝑎𝑖 is 𝑈𝑖 = {𝑢𝑖1 , 𝑢𝑖2 , … 𝑢𝑖𝑛 } To show whether 𝑢1 has clicked 𝑎𝑖 we use 𝛿 𝑢𝑖𝑗 1 𝑖𝑓 𝑢𝑖𝑗 𝑐𝑙𝑖𝑐𝑘𝑒𝑑 𝑎𝑖 = 0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 In this work, we used two common clustering algorithms, k-means [10] and CLUTO [7] for user segmentation. Suppose the users are segmented into K segments according to their behaviors. We use the function: 𝐺 𝑈𝑖 = 𝑔1 𝑈𝑖 , 𝑔2 𝑈𝑖 , … , 𝑔𝐾 𝑈𝑖 , 𝑖 = 1,2, … , 𝑛 to represent the distribution of 𝑈𝑖 under a given user segmentation results where 𝑔𝐾 𝑈𝑖 stands for all users in 𝑈𝑖 who are grouped into the 𝑘 𝑡ℎ user segment. Thus the user can be represented by 𝑔𝑘 = 𝑖=1,2,…,𝑛 𝑔𝑘 (𝑈𝑖 ) We first represent the users by their behaviors using different types of BT strategies. After that, we group the users according to their behaviors by the commonly used clustering algorithms. Finally, we evaluate how much BT can help online advertising by delivering ads to good user segments Within- and Between- Ads User Similarity The basic assumption with BT is that users with similar browsing behavior will have similar interests and therefore be inclined to click the same ads. If this is true, the similarity between users who clicked the same ad must be larger than the similarity between users who clicked different ads Click-through rate Once we have validated the basic assumption that similar users click similar ads we have to show that BT can help online advertising. Ad performance is generally measured with either click-through rate or revenue, since advertiser revenue is difficult for us to obtain, we use CTR. If user segments exist where the CTR substantially improves over the same ad shown without user segmentation, then BT is valuable. The improvement of CTR after user segmentation can only validate the precision of BT strategies in finding potentially interested users. We can calculate the precision and recall. The larger the precision, the more accurate we can segment the clickers of 𝑎𝑖 . The larger the recall, the better the coverage we achieve in collecting all clickers of 𝑎𝑖 . The integration of precision and recall give use the F- measure, the higher the F-measure, the better performance we have achieved with BT. Users who clicked 𝑎𝑖 are positive instances, users were displayed but did not click 𝑎𝑖 are negative instances, Precision and Recall are defined as: 𝑃𝑟𝑒 𝑎𝑖 𝑔𝑘 = 𝐶𝑇𝑅 𝑎𝑖 𝑔𝑘 𝑢𝑖𝑗 ∈𝑔𝑘 (𝑈𝑖) 𝛿(𝑢𝑖𝑗 ) 𝑅𝑒𝑐 𝑎𝑖 𝑔𝑘 = 𝑚𝑖 𝑗=1 𝛿(𝑢𝑖𝑗 ) Integrated F-measure: 2𝑃𝑟𝑒 𝑎𝑖 𝑔𝑘 𝑅𝑒𝑐 𝑎𝑖 𝑔𝑘 𝐹 𝑎𝑖 𝑔𝑘 = 𝑃𝑟𝑒 𝑎𝑖 𝑔𝑘 + 𝑅𝑒𝑐 𝑎𝑖 𝑔𝑘 Intuitively, if the clickers of an ad 𝑎𝑖 dominate some user segments and seldom appear in other user segments, we can easily deliver our targeted ads to them by selecting the segments they dominated. However, suppose the clickers of 𝑎𝑖 are uniformly distributed in all user segments, if we aim to deliver the targeted ads to more interested users, we have to deliver the ad to more users who are not interested in this ad simultaneously. The larger the Entropy is, the more uniformly the users, who clicked ad 𝑎𝑖 , are distributed among all the user segments. The smaller the Entropy is, the better results we will achieve. The usage of within and between ads was to validate whether the users who clicked the same ad may have similar behaviors and that the users who clicked different ads had relatively different behaviors. 𝑖𝑆𝑤 𝑎𝑖 n 𝑖 𝑠 𝑠 𝑎 .𝑎 𝑏 𝑖 𝑠 𝑆𝑏 = be the average 𝑛2 within and between ads of the dataset collected, the average ratio of this was calculated and detailed results for the user representation strategy was tabulated as follows. Let 𝑆𝑤 = 𝑺𝒘 𝑺𝒃 R LP 0.1417 0.0252 28.9217 LQ 0.2239 0.0196 44.2908 SP 0.1532 0.0281 24.5086 SQ 0.2594 0.0161 91.1890 The fact that average Sw is greater than average Sb implies that users who clicked the same ad are more similar than the ones that clicked different ads. • To validate whether the difference between Sw and Sb, statistical paired t-test was implemented and the results were all less than 0.05. This in turn implied that, statistically, the within ad user similarity is always larger than the between ads similarity. Using clustering algorithms and grouping users into 20,40,80 and 160 clusters(irrespective of the clustering algorithm used), ∆ 𝑎𝑖 = 𝐶𝑇𝑅 𝑎𝑖 𝑔 ∗ 𝑎𝑖 −𝐶𝑇𝑅 𝑎𝑖 was calculated, which represents the CTR 𝐶𝑇𝑅𝑎 𝑖 improvement degree of ad 𝑎𝑖 by user segmentation in BT. The ads CTR improved by as much as 670%. The reason why the short term user behaviors are more effective then the long term user behavior for targeted advertising is that users have multiple interests that always change rapidly. Search queries can work a little better than page clicking for BT, as the queries have a strong correlation to the ads displaying while the page clicks have no strong correlation to that in the dataset we analyzed. In this work, we try and provide a systematic study to understand the click-through log of a commercial search engine, so the one can validate and compare different strategies of behavioral targeting. This also happens to be the first systematic study for BT on real world ads click-through in academia. 1. Demographic and geographic data were not considered during the study due to privacy concerns. 2. Ranking of user segments for a given ad has not been explored. 3. In general, user behavior modelling for BT is underexplored and hence we did not have a lot of resources to start with. 4. Study of algorithms for large sets of user data and their rapidly changing user behavior was not possible as BT requires large scale user data which is incremental. The following conclusions can be drawn from our experiments: 1. Users who clicked the same ad will be more similar than users who clicked different ads. 2. 2. The CTR of ads can be averagely improved as high as 670% over all the ads we collected, using fundamental clustering algorithms. 3. 3.Tracking short-term user behavior can perform better than tracking long-term user behavior for the user representation strategies. Modeling the Impact of Short and Long Term Behavior on Search Personalization Paul N. Bennett, Ryen W. White, Wei Chu, Susan T. Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui This is a paper from 2012 that studies customizing results in search engines based on user behavior. While their focus was not ads, it did come to the conclusion that short term behavior was more effective than long term behavior when a session has been ongoing. Learning to Rank audience for behavioral targeting in display ads - Jian Tang, Ning Liu, Jun Yun, Yelong Shen , B Gao, S Yan,Ming Zhang Online Targeting, behavioral advertising and privacy - Avi Goldfarb,Catherine E Tucker Transfer learning for behavioral targeting - Tianqi Chen , Jun Yan , Guirong Xue , Zheng Chen Analyzing content-level properties of the web adversphere - Yong Wang , Daniel Burgener , Aleksandar Kuzmanovic , Gabriel MaciáFernández ISP-enabled behavioral ad targeting without deep packet inspectionGabriel Maciá-Fernández , Yong Wang , Rafael Rodríguez-Gómez , Aleksandar Kuzmanovic Linking visual concept detection with viewer demographics Adrian Ulges , Markus Koch , Damian Borth Using advanced user segmentation algorithms can lead to better results. Study of better user representation strategies such as search sessions, content of user clicked pages and user browsing trials can be explored.