Mining Cross-network Association for YouTube Video Promotion Ming Yan Institute of Automation, Chinese Academy of Sciences May 15, 2014 Outline Motivation Three-stage Framework Some Visualization Further Discussion Background • Large quantities of videos are consumed in YouTube and the trend is growing year by year. • More than 1 billion unique users visit YouTube each month. • Over 6 billion hours of video are watched each month on YouTube. • 100 hours of video are uploaded to YouTube every minute. • YouTube exhibits limited propagation efficiency and many videos remain unknown to the wide public. • Long tail effect for the video view count distribution. • Short active life span for most videos. Background • YouTube video popularity limited by its internal mechanism. • • • • Internal search Related video recommendation Channel subscription Front page highlight • External referrers such as social media websites arise to be important sources to lead users to YouTube videos. • Twitter has been quickly growing as the top referrer source for web video discovery. Motivation • For specific YouTube video, to identify proper Twitter followees with goal to maximize video dissemination to the followers. Twitter followee YouTube video watch Got 1 billion views in 5 months Twitter follower Challenge • The heterogeneous knowledge association between YouTube video and Twitter followee • user-perceived • How to define the “properness” of candidate Twitter followee for a specific YouTube video • interestness • virtual cost Our Twitter followee identification scheme actually expects to find the optimal Twitter followee whose followers are more likely to show interest to the target video. User-perceived Solution • Illustration example better promotion referrer follow follow User Association view favor view Framework • Three Stages Heterogeneous Topic Modeling Twitter users 𝓤𝑻 Input Following 𝒖 ACM Multimedia 2014 @acmmm14 Username @TwitterID … NBA @NBA LDA Britney Spears @britneyspears Bill Gates @BillGates 𝒇𝒐𝒍𝒍𝒐𝒘𝒆𝒆 𝓤𝒖 Twitter user distribution 𝑼𝑇 … 𝑝(𝒛𝑇 |𝑢) Topic Modeling Approach • On YouTube Side: Propose an inverse Corr-LDA model to discover the YouTube video multimodal topics. YouTube video distribution 𝑽 … 𝑝(𝒛𝑌 |𝑣) iCorr-LDA 𝒗 𝒇 𝒘 YouTube videos 𝓥 • YouTube video 𝒗 ∈ 𝓥 : [𝒘𝒗 , 𝒇𝒗 ] • Twitter users 𝒖 ∈ 𝓤𝑻 with their follower set Output • Twitter user distribution 𝑈 𝑇 • YouTube user distribution 𝑉 𝑁 • On Twitter Side: 𝑦 𝑓 Standard LDA on Twitter followeefollower social graph. 𝑀 user as document 𝑧 as word 𝛼 user’s 𝑤 𝜃 followees |𝒱| 𝜇 𝜎 𝛽 Cross-network Topic Association overlapped users 𝑇 𝓤 Input 𝑌 𝓤𝑜 𝓤 𝑈𝑇 𝒛𝑇 𝒖𝑇 𝓤𝑜 • Twitter user and video distribution 𝑈 𝑇 and 𝑉 (output of stage 1) • YouTube, Twitter and the overlapped user set 𝑢 𝑌 , 𝑢 𝑇 , 𝑢𝑜 • YouTube user interested video set 𝑣𝑢 ℱ 𝒛𝑌 𝒖𝑌 Association Mining Output • Distribution transfer function ℱ: 𝐮𝑌 → 𝐮𝑇 𝑉 Aggregation … 𝑝(𝒛𝑌 |𝑢) (𝐮𝑌 : the aggregated YouTube user distribution) YouTube user distribution 𝑼𝑌 Interested videos username 𝓥𝑢 Approach • YouTube User Aggregation • Association Mining Cross-network Topic Association • YouTube User Aggregation 𝑝(𝑧𝑘𝑌 |𝑣) 𝒗𝟏 𝑤1 𝑝 𝑧𝑘 𝑢𝑖 𝒗𝟐 𝒖 user 𝒖’s … 𝑤2 interested videos 𝒗𝒏 𝑤𝑛 𝑝 𝑧𝑘 𝑢𝑖 = 𝑣∈𝑉𝑢 𝑁𝑣 𝑓 + 𝑁𝑣 (𝑤) ∙ 𝑝(𝑧𝑘𝑌 |𝑣) 𝑁 𝑓 + 𝑁(𝑤) 𝑁𝑣 𝑓 , 𝑁𝑣 (𝑤) : the total number of keyframes and words in video 𝒗 𝑁 𝑓 , 𝑁(𝑤) : the total number of keyframes and words in 𝒖’s video set 𝑉𝑢 Cross-network Topic Association • Association Mining Goal: • To obtain the association between the YouTube video space and Twitter user space. (i.e. ℱ: 𝐮𝑌 → 𝐮𝑇 ) Approach: • Transition Probability-based Association • Regression-based Association • Latent Attribute-based Association overlapped users 𝓤𝑇 𝓤𝑜 𝓤𝑌 𝒛𝑇 Explicit association/transition matrix: 𝐴 𝐴 = 𝑎𝑖𝑗 , 𝑠. 𝑡. 𝐮𝑌 → 𝐮𝑇 𝓤𝑜 𝒛𝑌 Association Mining Cross-network Topic Association • Transition Probability-based Association • Regression-based Association 𝑈𝑜𝑇 , 𝑈𝑜𝑌 : The overlapped users’ distribution matrix in Twitter and YouTube q=1: lasso problem and can be effectively solved by LARS and feature sign algorithm q=2: ridge regression problem and with analytical solution as Cross-network Topic Association • Latent Attribute-based Association (non-linear) • only on overlapped users • on all users • Innovation: To discover shared latent structure behind the two topic spaces. (After projected to the latent attribute spaces, user’s YouTube and Twitter distribution share the same coefficient.) • Only on overlapped users shared latent user attribute By some simple transfer, it can be efficiently solved by the sparse coding algorithm. Cross-network Topic Association • Latent attribute discovery on all users (plenty of nonoverlapped users are considered in this scheme) • Objective function 𝑌 𝑇 ] 𝑆 𝑌 = 𝑆𝑜 , 𝑆𝑛𝑜𝑛 , 𝑆 𝑇 = [𝑆𝑜 , 𝑆𝑛𝑜𝑛 • Iteratively solved via three sub-problems Referrer Identification test YouTube video Input • Distribution transfer function ℱ • Test videos 𝒗𝒕 • Twitter followee set 𝑢 𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑒 𝒗𝒕 𝒇 𝒘 𝑝(𝒛𝑌 |𝑢𝑡 ) Output • Twitter followee rank for each video 𝑣 ∈ 𝒗𝒕 Distribution Transfer Approach • Direct product-based matching • Weighted product-based matching 𝑝(𝒛𝑇 |𝑢𝑡 ) Matching … 𝑓𝑜𝑙𝑙𝑜𝑤𝑒𝑒 𝓤𝑡 candidate Twitter followees Referrer Identification • Direct product-based matching • Weighted product-based matching • Ranking SVM algorithm is used to train the weights: • Feature: • Training label: a designed properness score In charge of the coverage of the interested audiences In charge of the virtual cost • With the learnt model parameter ℎ∗ : Some Visualization Further Discussion Some Extensible Application Examining the value of Twitter followees (Our work can be viewed as valuing Twitter followee w.r.t. promotion efficiency to YouTube videos) (e.g. the followee has a lot of young female followers) Advertising (Advertising media selection for our work) (e.g. anchor text generation (i.e., optimizing video description for promotion), advertising slot bid (i.e., followee reshare time selection)) Other user-bridged cross network application Challenge Data hard to get! Tweet Topic 1 user Taobao Topic recommend Video 2 Advertisement