Contact Prediction, Routing and Fast Information Spreading in Social Networks Kazem Jahanbakhsh Computer Science Department University of Victoria August 2012 1 Outline • • • • • • • • Problem Definition and the Context Routing in Mobile Social Settings Human Mobility and Contact Event Collecting Contact Data Contact Prediction Hidden Contact Prediction Fast Information Spreading Conclusions, Major Contributions and Future Work 2 Problem Definition Message routing, human contact prediction and fast information spreading in the context of human social networks. 3 Routing in Mobile Social Settings • Motivation: First empirical evaluation of Milgram's experiment in mobile settings • Social Profile: Set of social characteristics for a user: o Affiliation, Hometown, Language, Nationality, Interests and so on • Goal: Designing an efficient routing algorithm • Efficiency: Minimizing message forwardings & Maximizing the probability of message delivery • Assumptions & Constraints: • • Message delivery in physical proximity Sender knows the destination social profile 4 Social-Greedy Routing Algorithms • Approach: a greedy strategy by computing similarities between people social profiles o Social-Greedy I: Sender forwards the message “m” to nodes socially closer to destination. o Social-Greedy II & III: Variations of Social-Greedy I. • Our work is different from previous work because we only make use of social profiles of people for routing! • Real Data: Infocom 2006 contact trace - 79 people - a brief version of social profiles 5 Total Delivery Cost Successful Delivery Ratio (%) SDR & Cost 70 80 60 70 Epidemic Bubble Rap Social-Greedy I LABEL 50 60 50 40 40 30 30 20 20 Epidemic Bubble Rap Social-Greedy I LABEL Waiting 10 10 00 00 22 44 66 TTL(hour) (hour) TTL 8 10 Performance Results for Different Routing Schemes (TTL=9h) 6 Human Mobility & Contact Data Kenny Eric Eric 10:00AM 10:10AM Eric Contact Event: 10:00-10:10 AM Kenny 10:00AM Kenny 10:10AM 7 Contact Graphs Kyle Eric Butters Katy Jack Kenny Sara 8 Collecting Data from Different Social Settings 9 Real Data Descriptions Dataset Inf 05 Inf 06 MIT Camb Roller Sensors 41 79 97 36 62 Length 3 days 4 days 246 days 11 days 3 hours Scanning Time 120 sec 120 sec 300 sec 600 sec 15 sec Ext. Nodes 206 4321 20698 11367 1050 Total Cont. 227657 28216 285512 41587 132511 Ext. Cont. 57056 5757 183135 30714 72365 Ext. Cont. % 25% 20% 64% 74% 55% Dataset No. of Nodes No. of Edges Facebook 63731 817090 10 Contact Prediction: Problem Definition and Assumptions 11 Social Information & SmallWorld Network Properties • Birds of a Feather (Homophily) • Using Social Profiles: o Jacard Social Similarity (Jac) o Social Foci Similarity (Foci) o Max Social Similarity (Max) • Using Contact Graphs: o Transitivity: • Number of Common Neighbors (NCN) o Low Diameter : • Shortest Path (SP) • Random Walk (RW) • How to reconstruct? 12 Contact Prediction Results Performance of Contact Graph Structure Performance of Social Similarities 1 1 NCN 0.9 0.9 SP 0.3 RW 0.8 0.8 Correct prediction % Correct prediction % 0.7 0.7 0.6 0.6 Random 0.2 0.1 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.5 0.5 0.4 0.4 0.3 0.3 decreasing social similarity 0.2 0.2 Jacard Focus 0.1 0.1 MAX Random 00 00 0.1 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.9 0.9 11 Population%% Population Infocom 2006 13 Hidden Contact Prediction 14 Hidden Contact Prediction: Reconstruction Algorithm • Methods: o o o o Time-Spatial Locality: NCN, Jacard & MIN Contact Rates: Popularity Social Similarity: Foci & Jacard Social Similarity-NCN: Foci-NCN • Algorithm: • For each L k compute and store quadruples (u,v,k,sim(u,v)) in Lsim • Sort Lsim in a descending order using similarity scores • Output the first Rank number of quadruples 15 Hidden Contact Prediction Results Infocom 2006 16 Supervised Learning Approach • Techniques: o Logistic Regression o K-Nearest Neighbor (KNN) • Extracted Features: o o o o Contact Graph-based (Degree, Product of degrees, NCN) Contact Duration Social Profiles Static Sensors Prediction Results (Logistic Regression/KNN) Session Type Keynote Lunch Break Coffee Break TPR 0.18/0.24 0.37/0.40 0.41/0.43 FPR 0.03/0.08 0.04/0.07 0.02/0.02 Accuracy 81%/78% 84%/81% 92%/92% RMSE 0.42/0.40 0.39/0.36 0.26/0.24 17 Fast Information Spreading in Social Networks • Input: social graph G=(V,E) & a unique message for each node • Communication Model: synchronized • Constraints: no global information & one contact per round • Termination: when every node receives all messages • Goal: analyzing running times of three information spreading algorithms 18 Information Spreading Algorithms • Random push-pull: o In each round, every node randomly chooses one of its neighbors for message exchange • Doerr: o In each round, every node randomly chooses one of its neighbors except the one that has been just contacted • Censor: Hybrid strategy: o Even rounds: each node runs random push-pull o Odd rounds: each node chooses one of its neighbors in a sequential manner from its Bottleneck List 19 Empirical Results from Facebook Graph 4 6 Number of Finished Nodes 5 Number of Finished Nodes TotalRunning Running Time Total Time 4 x 10 x 10 7 Random Push−Pull Doerr 6 Censor well connected core 5 4 4 3 3 v w 2 2 1 1 Random Push−Pull 1−whisker Doerr u x 0 0 0 0 Cx 20 5 10 40 15 60 20 Round #25 Cu Censor 80 30 100 35 120 40 45 Round # Running Times Without 1-whiskers Running Times on Original Facebook Graph 20 Conclusions & Future Work • Major Contributions: • Social-Greedy Algorithm: o Suitable for bootstrapping wireless devices • Contact Prediction: o Social Similarity methods, SP and RW outperform random o Foci-NCN provides the best precision results o Supervised learning is an effective technique for contact prediction • Information spreading: o Censor performs well for spreading information in social networks • Future Work: o Proposing more efficient predictors for large geographical spaces o Final Goal: Predicting where people go next and who they will meet there! 21 Hidden Contacts Prediction Results Performance Evaluation (no of external nodes = 73) 0.5 NCN Jac Min Pop Rand The Percentage of True Positives 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 3 4 5 6 7 8 log2 Rank MIT Campus 9 10 11 12 22 Supervised Learning Results Session Type Keynote Lunch Break Coffee Break degree 4 5 5 degree 7 7 7 degree prod. 3 3 6 ncn 1 1 2 total overlap 2 2 1 social 5 6 4 ncsn 6 4 3 Ranking Features 23 Examples of 1-Whiskers 24