Personalizing Web Page Recommendation via Collaborative Filtering and Topic-Aware Markov Model Qingyan Yang, Ju Fan, Jianyong Wang, Lizhu Zhou Database Research Group, DCS&T, Tsinghua University Agenda Motivation Recommender framework Experimental evaluation Conclusions 4/8/2015 DB Group, DCS&T, Tsinghua University 2 Motivation Recommender framework Experimental evaluation Conclusions 4/8/2015 DB Group, DCS&T, Tsinghua University 3 Motivation • The Web is explosively growing ▪ By the end of 2009 (source: the 25th Internet Report, 2010) ◦ 33,600,000,000 Web pages in China ◦ Twice as many as that in 2003 • Finding desired information is more difficult. ▪ Users often wander aimless on the Web without visiting pages of his/her interests ▪ Or spend a long time on finding the expected information. 4/8/2015 DB Group, DCS&T, Tsinghua University 4 Web page recommendation 4/8/2015 DB Group, DCS&T, Tsinghua University 5 Web page recommendation • Objective ▪ To understand users' navigation behavior ▪ To show some pages of users' interests at a specific time • Existing popular solutions ▪ Markov model and its variants ▪ Temporal relation is important. If the browsing sequence is "A B C … A B C … A B C", Then C is recommended when A and B are visited one after another 4/8/2015 DB Group, DCS&T, Tsinghua University 6 Limitations • No personalized recommendations ▪ All users receive the same results • Topic information of pages is neglected. ▪ Two pages, which are sequentially visited, may be very different in terms of topics. 4/8/2015 DB Group, DCS&T, Tsinghua University 7 PIGEON: our solution • Personalized Web page recommendation • Two novel features ▪ Personalization ◦ Meet preference of different users I am a blog about finance 4/8/2015 DB Group, DCS&T, Tsinghua University 8 PIGEON: our solution • Two novel features ▪ Personalization ▪ Topical coherence ◦ To be relevant to users' present missions 4/8/2015 DB Group, DCS&T, Tsinghua University 9 Motivation Recommender framework Experimental evaluation Conclusions 4/8/2015 DB Group, DCS&T, Tsinghua University 10 Recommender framework 4/8/2015 DB Group, DCS&T, Tsinghua University 11 Data representation • Navigation graph A 2 B 1 2 3 D 4 2 2 H E C K G 2 I 6 1 Weight: relation frequency L J 1 F 2 Edge: jump relation M Web page Jump relation 4/8/2015 Time User ID IP address Target Source (09:44:44) (0e0c…) (211.90.-.-) A () (09:44:58) (0e0c…) (211.90.-.-) B A (10:14:29) (0e0c…) (211.90.-.-) G A DB Group, DCS&T, Tsinghua University 12 Topic discovery • Basic idea ▪ We assume that pages with similar URLs or evolved in jump relations are topically relevant. • URLs Features ▪ Keywords. e.g., http://dblp.uni-trier.de/db/index.html ▪ Expanded by Manifold-based keyword propagation • Web page clustering ▪ Each cluster represents one topic 4/8/2015 DB Group, DCS&T, Tsinghua University 13 Example A 3 2 B 2 2 D G 4 2 2 1 K E C H 2 6 I L J 1 1 M F 4/8/2015 DB Group, DCS&T, Tsinghua University 14 Topic-Aware Markov Model • Take n-grams as states. e.g., n=2 ABCD B CA A C C A, B D B AB BC CD DB CA AB BC CD AC CC CA DB CA BD DB Temporal state Topical state • Web page preference score ▪ Maximum likelihood estimation ▪ e.g., P(D|BC) = f(BCD)/f(BC) = 1/2 4/8/2015 DB Group, DCS&T, Tsinghua University 15 Personalized Recommender • Collaborative filtering ▪ Basic idea X s~(u; p) = k u0 4/8/2015 si m(u; u0)s(u0; p) user similarities u : acti ve user; p : W eb page Web page preference DB Group, DCS&T, Tsinghua University 16 User Similarity • User profile ▪ A set of topics • Similarity measurement ▪ Topic similarity ▪ Maximum weight matching si m(u1 ; u2 ) = 4/8/2015 0.9 1.0 0.8 0:9 + 0:8 + 1:0 = 0:9 3 DB Group, DCS&T, Tsinghua University 17 Motivation Recommender framework Experimental evaluation Conclusions 4/8/2015 DB Group, DCS&T, Tsinghua University 18 Experiment settings • Data set ▪ 1,402,371 records of 375 users in 34 days ▪ First 30 days for training and 4 days for testing • Metrics are precision and recall • Comparative methods Temporal 4/8/2015 Topical Baseline Y TAMM Y Y PIGEON Y Y DB Group, DCS&T, Tsinghua University Personalized Y 19 Experimental evaluation 1st-order model 4/8/2015 2nd-order model DB Group, DCS&T, Tsinghua University 20 Motivation Recommender framework Experimental evaluation Conclusions 4/8/2015 DB Group, DCS&T, Tsinghua University 21 Conclusions • Taking user similarities into account, we could recommend Web pages to meet different users' preferences. • We discover users' interested topics using an effective graph-based clustering algorithm. • We devise a topic-aware Markov model to learn navigation patterns which contribute to the topically coherent recommendations. 4/8/2015 DB Group, DCS&T, Tsinghua University 22 THANKS 4/8/2015 DB Group, DCS&T, Tsinghua University 23