Diffusion in (Social) networks Rajesh Sharma http://rajshpec.github.io/ rajesh.sharma@unibo.it October, 2014 This presentation is based on several works, including some with: Prof. Danilo Montessi (University of Bologna, Italy), Prof. Matteo Magnani (Uppsala University, Sweden) Prof. Anwitaman Datta (NTU, Singapore), Prof. Mostafa Salehi (University of Tehran, Iran) *Some slides’ content from Jure Leskovec ‘s course work. 1 Agenda • Preliminary – Overview of Networks – Diffusion on Networks in Monoplex • Models, Algorithms etc. • Algorithm for diffusion in decentralized settings. • Diffusion on Networks in Multilayer Networks. • Models, Algorithms etc. • Conclusion & Future work. Networks: collection of objects where some pairs of objects are connected by links Protein-protein Transportation: Metro ISP: Router etc Human Diseases Sexual contact Food Web Friendship Recipe Co-citation Network Really Matters • If you want to understand the structure of the Web, it is hopeless without working with the Web’s topology. • If you want to understand the spread of diseases, can you do it without social networks? • If you want to understand dissemination of news or evolution of science, it is hopeless without considering the information networks. Networks & Diffusion Networks HumanHuman Network Diffusion Idea, Innovation Innovation Goods Transportation Network Comm. Network Eg: OSN, Internet, Mobile Virus Rumor Behavior Affect of Diffusion in ML Networks Internal Entity • Diffusion process happening in a network affecting internal entities. • Example: – Influence (product, behavior etc) External Entity • A diffusion process happening in a network affecting external entity • Example: – Effect of tweets on stock prices Diffusion Dynamics: What can be done? A) Models: • Decision Based Models – Independent Contagion Model – Threshold Model – Questions: • • • Finding Influential Nodes Detecting cascades Epidemic Based Models – SIS: Susceptible-InfectedSusceptible (e.g., Flu) – SIR : Susceptible Infected Recover (e.g., chicken pox) – Question: • Virus will take over the network? B) Explanatory/Empirical Analysis • • Infer the underlying spreading cascade. Questions – – How Diffusion look like Cascades look like ? C) Algorithms – Influence maximization – Outbreak detection – etc Information Dissemination: Algorithm • Objectives – Effective • High precision (low spam) & recall (good coverage) – Efficient • Low latency, low duplication • Challenges : Decentralized settings – No global list, no explicit subscriptions or coordination • Intuition – Use social links in each hop • Locally available (interest) information • Less likely to be spammed • Easier accountability 9 Approach/Algorithm • Two logically independent mechanisms/phases – Control phase (runs in the background) • collect neighbor nodes’ information (interest, degree) • dissemination behavior (forwarding behavior, activeness) – Propagation of messages using selective gossip [4] Anwitaman Datta and Rajesh Sharma, GoDisco: Selective Gossip based Dissemination of Information in Social Community based Overlays, ICDCN 2011 [ best paper award in Networking track] 10 Intuitions for designing selective gossip • Social science principals – Reciprocity based incentives – Social triads to reduce duplicates • Feedback – Learning & adapting to neighbor interests • Interest communities – Naturally clustered • But there may be isolated islands 11 Information agent (IA) categories • Interest Classification : – main Category (MC) – subcategory (SC) • Order of preference – shared main category – irrelevant but good forwarding history – irrelevant but well connected (high degree) 12 Approach • If any Relv Nbrs – Forward to all relevant nbrs d 0 p b a h • Duplication saving : social triad e • a & b don’t send each other • Not for cases like c • What about non-relv Nbrs c i m j • With probability p l k – Send to e (closely related) n • α, β, γ can be change • Feedback mechanism • Boundary nodes – αh + βd + γa (h – history, d degree, a-activeness ) – C selects j – j starts a Random Walk 13 Message Dissemination 14 More on Information Dissemination • Swarm Particle Approach [2] • • • Communities: Multi-Dimensional Network (based on relations) Particle swarm technique - Mobility (particles/agent can move), Orthogonal to GoDisco ( as multi-dim and mobility). • GoDisco++ [3] – Took best out of ICDCN 2011 and 2012 approaches. – Social sciences plus multi-dimensional network. . [3] Rajesh Sharma and Anwitaman Datta , Decentralized information dissemination in multidimensional semantic social overlays, ICDCN 2012, Hongkong. [4] Rajesh Sharma and Anwitaman Datta. GoDisco++: A Gossip algorithm for information dissemination in multi-dimensional community networks. Journal of Pervasive and Mobile Computing, Oct, 2012 15 Multilayer Networks • Multiplex networks – Every node is present in every network. – multiple types of Relationships. • Interconnected networks – Not every node is present in every network. – Multiple networks. • Model – Diffusion Modeling: cascade process • C1: (v4,l2) • C2 : (v4,l1) • Diffusion network: Aggregation of cascades C1 and C2 [5] Spreading processes in Multilayer Networks, Mostafa Salehi, Rajesh Sharma, Moreno Marzolla, Danilo Montesi, Payam Siyari, and Matteo Magnani, under review at IEEE Transactions on Network Sceience & Engg. 4 possibilities of diffusion in ML • Same-node inter-layer – Cascade switches layer but remains on the same node – Facebook post is shared on Twitter • Other-node inter-layer – Cascade continues spreading to another node in another layer – The spread of a disease in an interconnected network of cities • Other-node intra-layer – Cascade continues spreading through the same layer. – Retweeting a post in Twitter • Same-node intra-layer – ?? Dependent variables used in different diffusion studies Milgram Experiment. (late 1960s) • The navigation problem – Small world community. • The experiment set up – One target (Massachusetts) – Many originators. (Nebraska) – Acquaintance chains of Letters • Output – Six degrees of Separation • New version (2003) by Dodds et al. – Multiple source and Targets – Web based experiment History of Diffusion (Time Line) 1967 1975 Milgram Navigation in small world [1] Epidemic model [2] 1978 Granoveter: Threshold Model 1993 1998 1999 2014 2001 ?? Internet AIDS impact on Swedish population. SW: Small World 2015 SF: Scale Free Vesigpinani: underlying n/w is important Wiki, Friendster, Myspace, FB, Blogs, Flickr, Youtube, smartphones. Milgram Reloaded! • Attempt to understand the navigation process • Multiple networks (FB, Twitter, WhatsApp etc) • Across the Globe • Multiple originators Output: Average path length, Network usage • Multiple targets (geographically), orig < -- >target impact • Multi Lingual T1 T2 T4 O2 O4 T3 T5 O1 O5 O3 T6 Milgram Reloaded! • What data we will ask* – – – – Who are you : Email ID or Phone No Network: Through what network you received it. Who sent you: ID of the person Which networks are you going to use to move the message towards its destination ? • Web Link: http://m.web.cs.unibo.it/ • If you have comments or feedback. Please contact: – rajesh.sharma@unibo.it or rajshpec@gmail.com Reasoning about Networks • How do we reason about networks? – Empirical: Study network data to find organizational principles • How do we measure and quantify networks? – Mathematical models: Graph theory and statistical models • Models allow us to understand behaviors and distinguish surprising from expected phenomena. – Algorithms: for analyzing graphs • Hard computational challenges Networks: Structure & Process • What do we study in networks? – Structure and evolution: • What is the structure of a network? • Why and how did it come to have such structure? – Processes and dynamics: • Networks provide “skeleton for spreading of information, behavior, diseases • How do information and diseases spread? Networks: Impact • Companies: Google (382.61B), Cisco (125.29B), Facebook (207.04B), Twitter (25.32B), LinkedIn (28.9B) • Predicting Epidemics : Flu • Intelligence and fighting (cyber) terrorism: Find the leaders/hubs of terrorist org/regimes • Financial Impact: Recession in Europe (who is lending whom) Networks: Size Matters • Network data: Orders of magnitude – 436-node network of email exchange at a corporate • research lab [Adamic-Adar, SocNets ‘03] – 43,553-node network of email exchange at an • university [Kossinets-Watts, Science ‘06] – 4.4-million-node network of declared friendships on a • blogging community [Liben-Nowell et al., PNAS ‘05] – 240-million-node network of communication on • Microsoft Messenger [Leskovec-Horvitz, WWW ’08] – 800-million-node Facebook network [Backstrom et al. ‘1 Group Activity • Big data : Network (and non network) data (mostly from web). – Understand and analysis • Few Examples: – Impact of Tweets on : • Financial patterns. • Reputation of Companies – Community patterns in networks: Information dissemination. – GPS data : insurance fraud Thank you !! Questions? Rajesh Sharma University of Bologna http://rajshpec.github.io/ rajesh.sharma@unibo.it Research Group: http://sigsna.net/impact/