Modeling Dynamic Multi-topic Discussions in Online Forums Hao Wu, Jiajun Bu, Chun Chen, Can Wang, Guang Qiu, Lijun Zhang and Jianfeng Shen* Zhejiang University, China *Zhejiang Health Information Center, China July 13, AAAI’2010 Atlanta, GA, USA Social Media • Web 2.0 applications socialize users online • Online Forums – Distinct platform for knowledge sharing and information exchange Reveal how information propagates on Internet. Modeling the process of topic discussions and predicting user activity is an interesting problem! 2 Benefits of Modeling • Understand online human interactions and group forming Social network analysis • Improve applications e.g., recommender • Track new ideas and technology • Mine opinions about products User review 3 Environment of Online Forums • Great complexity 433,839 threads 13,599,245 posts • Randomness What are the mechanisms underlying user’s participation ? From which perspective to view the process of topic discussion How to make use of the – Usually no well-defined friendships or co-authorships property of topics and temporal feature for modeling Modeling Dynamic Multi-topic – Free to posting Discussions is challenging ! – Topic drifts in a single thread How to measure the importance of a user in discussions 4 Outline • • • • Motivation and Intuitions Topic Flow Models Experimental Results Summary 5 Topic Flow Model (TFM) The new comer reads some of the previous comments before posting. The information (topic) flows from early participant to late participant . Reply Link Topic Flow Topic diffuses through the underlying social networks 6 Basic Topic Flow Model (B-TFM) Thread Document: d D j : Rijd Frequency of i j : Cd Frequency of i Social Network i Thread Documents D Peer-influence wij dD Rijd Self-preference yi dD Cid Normalization Random Walk S D W (1 )11 / n With Restart 1 q yi / y T Topic Flow ParticipationRank: measures the susceptibility of a user to a ‘infective’ topic p( t 1) ST pt (1 )q p* (1 )(I ST )1 q 7 Topic-specific TFM (T-TFM) Different interaction patterns according to different topics iPhone FIFA World Cup w dD P( z | d ) R z ij d ij yiz dD P( z | d )Cid Using Latent Dirichlet Allocation [Blei 2003] 8 Time-sensitive T-TFM (TT-TFM) • Forgetting Mechanism past now Time lapses now wijz dD exp( td ) P( z | d ) Rijd yiz dD exp( td ) P( z | d )Cid Time Lapse Factor 9 Evaluation: Prediction • ParticipationRank p (indicator) – The willingness of a user in participation to discussion of a topic Train Predict Ranking ? p zZ dD P ( z | d )p * F * z Synthesize For T-TFM and TT-TFM Whether a user joins in discussion? (post at least once ) 10 Outline • • • • Motivation and Intuitions Topic Flow Models Experimental Results Summary 11 Experiments • Dataset (www.honda-tech.com) – Two communities: Drag Racing and Honda/Acura – Across one year, from 09/01/2008 to 08/31/2009. posted more than the average number of posts per user. 12 • Evaluations Results – Divide the data into 12 continuous time windows – Generate ranking for each one month data, and predict user posting activity in the following one week 13 Model Selection • = 0.3 and 0.1 • T = 30 and 40 • = 0.01 14 Summary • An intuitive model of discussions in online forums • Topic Flow Models (TFM) – Consider both peer-influence and self-preference – Property of latent topics – Temporal feature: forgetting mechanism • Evaluation on prediction of user activity • Future work: – Utilize the web structure of online forum – More data sets e.g., – Build recommendation system 15 Thanks! Any Question? 16