Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Carnegie Mellon University Virginia Tech. Christos Faloutsos Carnegie Mellon University Graph Analytics wkshp B. A. Prakash; C. Faloutsos From: VLDB’12 tutorial • http://vldb.org/pvldb/vol5/p2024_badityapra kash_vldb2012.pdf B. Aditya Prakash Graph Analytics wkshp B. A. Prakash; C. Faloutsos 2 Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Graph Analytics wkshp B. A. Prakash; C. Faloutsos 3 Dynamical Processes over networks are also everywhere! Graph Analytics wkshp B. A. Prakash; C. Faloutsos 4 Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology ........ Graph Analytics wkshp B. A. Prakash; C. Faloutsos 5 Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] Diseases over contact networks Graph Analytics wkshp B. A. Prakash; C. Faloutsos CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts 6 Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Graph Analytics wkshp Problem: Given k units of disinfectant, whom to immunize? B. A. Prakash; C. Faloutsos 8 Why do we care? (1: Epidemiology) ~6x fewer! CURRENT PRACTICE [US-MEDICARE NETWORK 2005] OUR METHOD Graph Analytics wkshp Prakash; C. Faloutsos 9 Hospital-acquired inf. tookB. A.99K+ lives, cost $5B+ (all per year) Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Graph Analytics wkshp B. A. Prakash; C. Faloutsos 10 Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Graph Analytics wkshp Social MediaB. Marketing A. Prakash; C. Faloutsos 11 Why do we care? (3: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action Graph Analytics wkshp B. A. Prakash; C. Faloutsos 12 High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? products/viruses Q. How do opinions spread? transmit s/w patches Q. How to market better? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 13 Research Theme ANALYSIS Understanding POLICY/ ACTION DATA Large real-world networks & processes Graph Analytics wkshp Managing B. A. Prakash; C. Faloutsos 14 Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION DATA Modeling # patient transfers Graph Analytics wkshp B. A. Prakash; C. Faloutsos How to control out-breaks? 15 Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION DATA Modeling Tweets spreading Graph Analytics wkshp B. A. Prakash; C. Faloutsos How to market better?16 In this tutorial Given propagation models: Q1: Will an epidemic happen? ANALYSIS Understanding Graph Analytics wkshp B. A. Prakash; C. Faloutsos 17 In this tutorial Q2: How to immunize and control out-breaks better? POLICY/ ACTION Managing Graph Analytics wkshp B. A. Prakash; C. Faloutsos 18 In this tutorial Q3: How do #hashtags spread? DATA Large real-world networks & processes Graph Analytics wkshp B. A. Prakash; C. Faloutsos 19 Outline • • • • • Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Learning Models (Empirical Studies) Conclusion Graph Analytics wkshp B. A. Prakash; C. Faloutsos 20 Part 1: Theory • Q1: What is the epidemic threshold? • Q2: How do viruses compete? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 21 A fundamental question Strong Virus Epidemic? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 22 example (static graph) Weak Virus Epidemic? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 23 Problem Statement # Infected above (epidemic) below (extinction) Separate the regimes? time Find, a condition under which – virus will die out exponentially quickly – regardless of initial infection condition Graph Analytics wkshp B. A. Prakash; C. Faloutsos 24 Threshold (static version) Problem Statement • Given: –Graph G, and –Virus specs (attack prob. etc.) • Find: –A condition for virus extinction/invasion Graph Analytics wkshp B. A. Prakash; C. Faloutsos 25 Threshold: Why important? • • • • Accelerating simulations Forecasting (‘What-if’ scenarios) Design of contagion and/or topology A great handle to manipulate the spreading – Immunization – Maximize collaboration ….. Graph Analytics wkshp B. A. Prakash; C. Faloutsos 26 Part 1: Theory • Q1: What is the epidemic threshold? – Background – Result and Intuition (Static Graphs) – Proof Ideas (Static Graphs) – Bonus: Dynamic Graphs • Q2: How do viruses compete? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 27 “SIR” model: life immunity (mumps) • Each node in the graph is in one of three states – Susceptible (i.e. healthy) – Infected – Removed (i.e. can’t get infected again) Prob. δ t=1 Graph Analytics wkshp t=2 B. A. Prakash; C. Faloutsos t=3 28 Terminology: continued • Other virus propagation models (“VPM”) – SIS : susceptible-infected-susceptible, flu-like – SIRS : temporary immunity, like pertussis – SEIR : mumps-like, with virus incubation (E = Exposed) ….…………. • Underlying contact-network – ‘who-can-infectwhom’ Graph Analytics wkshp B. A. Prakash; C. Faloutsos 29 Related Work R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press, 1991. A. Barrat, M. Barthélemy, and A. Vespignani. Dynamical Processes on Complex Networks. Cambridge University Press, 2010. F. M. Bass. A new product growth for model consumer durables. Management Science, 15(5):215–227, 1969. D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in real networks. ACM TISSEC, 10(4), 2008. D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010. A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of epidemics. IEEE INFOCOM, 2005. Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free networks and the effective immunization. arXiv:cond-at/0305549 v2, Aug. 6 2003. H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000. H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer Lecture Notes in Biomathematics, 46, 1984. J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer viruses. IEEE Computer Society Symposium on Research in Security and Privacy, 1991. J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE Computer Society Symposium on Research in Security and Privacy, 1993. R. Pastor-Santorras and A. Vespignani. Epidemic spreading in scale-free networks. Physical Review Letters 86, 14, 2001. ……… ……… ………Analytics wkshp Graph All are about either: • Structured topologies (cliques, block-diagonals, hierarchies, random) • Specific virus propagation models • Static graphs B. A. Prakash; C. Faloutsos 30 Part 1: Theory • Q1: What is the epidemic threshold? – Background – Result and Intuition (Static Graphs) – Proof Ideas (Static Graphs) – Bonus: Dynamic Graphs • Q2: How do viruses compete? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 31 How should the answer look like? • Answer should depend on: – Graph – Virus Propagation Model (VPM) • But how?? – Graph – average degree? max. degree? diameter? – VPM – which parameters? – How to combine – linear? quadratic? exponential? 2 2 ( d avg d avg ) / d max ? ….. d avg diameter ? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 32 Static Graphs: Our Main Result • Informally, For, any arbitrary topology (adjacency matrix A) any virus propagation model (VPM) in standard literature • the epidemic threshold depends only 1. on the λ, first eigenvalue of A, and 2. some constant CVPM , determined by the virus propagation model λ CVPM No epidemic if λ * CVPM < 1 In Prakash+ ICDM 2011 (Selected among best papers). Graph Analytics wkshp B. A. Prakash; C. Faloutsos 33 Our thresholds for some models • s = effective strength • s < 1 : below threshold Models SIS, SIR, SIRS, SEIR SIV, SEIV Effective Strength (s) s=λ. s=λ. SI1I2 V1 V2 (H.I.V.) s Graph Analytics wkshp 1v2 2 = λ . v2 v1 B. A. Prakash; C. Faloutsos Threshold (tipping point) s=1 34 Our result: Intuition for λ “Official” definition: • Let A be the adjacency matrix. Then λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)]. “Un-official” Intuition • λ ~ # paths in the graph A k ≈ u k .u • Doesn’t give much intuition! A k (i, j) = # of paths i j of length k Graph Analytics wkshp B. A. Prakash; C. Faloutsos 35 Largest Eigenvalue (λ) better connectivity λ≈2 λ≈2 N = 1000 Graph Analytics wkshp higher λ λ= N λ = N-1 λ= 31.67 λ= 999 B. A. Prakash; C. Faloutsos N nodes 36 Footprint Fraction of Infections Examples: Simulations – SIR (mumps) Effective Strength Time ticks (a) Infection profile Graph Analytics wkshp (b) “Take-off” plot PORTLAND graph 31 million links, 6 million nodes B. A. Prakash; C. Faloutsos 37 Footprint Fraction of Infections Examples: Simulations – SIRS (pertusis) Time ticks Effective Strength (a) Infection profile Graph Analytics wkshp (b) “Take-off” plot PORTLAND graph 31 million links, 6 million nodes B. A. Prakash; C. Faloutsos 38 Part 1: Theory • Q1: What is the epidemic threshold? – Background – Result and Intuition (Static Graphs) – Proof Ideas (Static Graphs) – Bonus: Dynamic Graphs • Q2: How do viruses compete? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 39 Proof Sketch General VPM structure Model-based λ * CVPM < 1 Topology and stability Graph Analytics wkshp B. A. Prakash; C. Faloutsos Graph-based 40 Models and more models Model Used for SIR Mumps SIS Flu SIRS Pertussis SEIR Chicken-pox …….. SICR Tuberculosis MSIR Measles SIV Sensor Stability SI1I2 V1 V2 H.I.V. ………. Graph Analytics wkshp B. A. Prakash; C. Faloutsos 41 Ingredient 1: Our generalized model Endogenous Transitions Susceptible Infected Exogenous Transitions Endogenous Transitions Graph Analytics wkshp Vigilant B. A. Prakash; C. Faloutsos 42 Special case Susceptible Infected Vigilant Graph Analytics wkshp B. A. Prakash; C. Faloutsos 43 Special case: H.I.V. SI1I2 V1 V2 “Non-terminal” “Terminal” Multiple Infectious, Vigilant states Graph Analytics wkshp B. A. Prakash; C. Faloutsos 44 Ingredient 2: NLDS+Stability • View as a NLDS – discrete time – non-linear dynamical system (NLDS) size mN x 1 . . . . . Graph Analytics wkshp. Probability vector Specifies the state of the system at time t size N (number of nodes in the graph) B. A. Prakash; C. Faloutsos S I V 45 Ingredient 2: NLDS + Stability • View as a NLDS – discrete time – non-linear dynamical system (NLDS) size mN x 1 . . . . . Graph Analytics wkshp. Non-linear function Explicitly gives the evolution of system B. A. Prakash; C. Faloutsos 46 Ingredient 2: NLDS + Stability • View as a NLDS – discrete time – non-linear dynamical system (NLDS) • Threshold Stability of NLDS Graph Analytics wkshp B. A. Prakash; C. Faloutsos 47 Special case: SIR S size 3N x 1 S I I R R = probability that node i is not attacked by any of its infectious NLDS neighbors Graph Analytics wkshp B. A. Prakash; C. Faloutsos 48 Fixed Point 1 1 . 0 0 . 0 0 . Graph Analytics wkshp State when no node is infected Q: Is it stable? B. A. Prakash; C. Faloutsos 49 Stability for SIR Stable under threshold Graph Analytics wkshp Unstable above threshold B. A. Prakash; C. Faloutsos 50 See paper for full proof General VPM structure Model-based λ * CVPM < 1 Topology and stability Graph Analytics wkshp B. A. Prakash; C. Faloutsos Graph-based 51 Part 1: Theory • Q1: What is the epidemic threshold? – Background – Result and Intuition (Static Graphs) – Proof Ideas (Static Graphs) – Bonus: Dynamic Graphs • Q2: How do viruses compete? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 52 Dynamic Graphs: Epidemic? Alternating behaviors DAY (e.g., work) adjacency matrix 8 8 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 53 Dynamic Graphs: Epidemic? Alternating behaviors NIGHT (e.g., home) adjacency matrix 8 8 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 54 Model Description Healthy • SIS model N2 Prob. β – recovery rate δ – infection rate β N1 X Prob. δ Infected N3 • Set of T arbitrary graphs day N night N Graph Analytics wkshp N , weekend….. N B. A. Prakash; C. Faloutsos 55 Our result: Dynamic Graphs Threshold • Informally, NO epidemic if eig (S) = Single number! Largest eigenvalue of The system matrix S In Prakash+, ECML-PKDD 2010 Graph Analytics wkshp B. A. Prakash; C. Faloutsos <1 S = 56 Infection-profile log(fraction infected) MIT Reality Mining Synthetic ABOVE ABOVE AT AT BELOW BELOW Time Graph Analytics wkshp B. A. Prakash; C. Faloutsos 57 Footprint (# infected @ “steady state”) “Take-off” plots Synthetic MIT Reality EPIDEMIC Our threshold EPIDEMIC NO EPIDEMIC Our threshold NO EPIDEMIC (log scale) Graph Analytics wkshp B. A. Prakash; C. Faloutsos 58 Part 1: Theory • Q1: What is the epidemic threshold? • Q2: What happens when viruses compete? – Mutually-exclusive viruses – Interacting viruses Graph Analytics wkshp B. A. Prakash; C. Faloutsos 59 Competing Contagions iPhone v Android Blu-ray v HD-DVD Attack v Retreat Biological common flu/avian flu, pneumococcal inf etc Graph Analytics wkshp B. A. Prakash; C. Faloutsos 60 A simple model • Modified flu-like • Mutual Immunity (“pick one of the two”) • Susceptible-Infected1-Infected2-Susceptible Virus 2 Virus 1 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 61 Question: What happens in the end? Number of Infections green: virus 1 red: virus 2 Footprint @ Steady State Footprint @ Steady State ASSUME: Virus 1 is stronger than Virus 2 Graph Analytics wkshp B. A. Prakash; C. Faloutsos = ? 62 Question: What happens in the end? Footprint @ Steady State Number of Infections green: virus 1 red: virus 2 Footprint @ Steady State ?? Strength Strength = 2 Strength Strength ASSUME: Virus 1 is stronger than Virus 2 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 63 Answer: Winner-Takes-All Number of Infections green: virus 1 red: virus 2 ASSUME: Virus 1 is stronger than Virus 2 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 64 Our Result: Winner-Takes-All Given our model, and any graph, the weaker virus always dies-out completely 1. The stronger survives only if it is above threshold 2. Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2) 3. Strength(Virus) = λ β / δ same as before! In Prakash+ WWW 2012 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 65 Real Examples [Google Search Trends data] Reddit v Digg Graph Analytics wkshp Blu-Ray v HD-DVD B. A. Prakash; C. Faloutsos 66 Part 1: Theory • Q1: What is the epidemic threshold? • Q2: What happens when viruses compete? – Mutually-exclusive viruses – Interacting viruses Graph Analytics wkshp B. A. Prakash; C. Faloutsos 67 A simple model: SI1|2S • Modified flu-like (SIS) • Susceptible-Infected1 or 2-Susceptible • Interaction Factor ε – Full Mutual Immunity: ε = 0 – Partial Mutual Immunity (competition): ε < 0 – Cooperation: ε > 0 Virus 1 Graph Analytics wkshp & Virus 2 B. A. Prakash; C. Faloutsos 68 Question: What happens in the end? ε=0 Winner takes all ε=1 Co-exist independently 0.8 0.7 0.6 0.5 0.4 0.3 0.2 k1 k2 0.1 0 0 100 200 300 400 Time 500 600 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 k1 k2 0.1 i1,2 0 700 800 Footprint (Fraction of Population) 0.9 Footprint (Fraction of Population) Footprint (Fraction of Population) 0.9 ε=2 Viruses cooperate 20 40 60 80 Time 100 120 0.8 0.7 0.6 0.5 0.4 0.3 0.2 k1 k2 0.1 i1,2 0 140 20 40 60 80 Time 100 120 What about for 0 < ε <1? Is there a point at which both viruses can co-exist? ASSUME: Graph Analytics wkshp Virus 1 is stronger than VirusB.2A. Prakash; C. Faloutsos 69 140 Footprint (Fraction of Population) Answer: Yes! There is a phase transition 0.8 0.6 0.4 0.2 k1 k2 0 0 ASSUME: Virus 1 iswkshp stronger than B.Virus 2Faloutsos Graph Analytics A. Prakash; C. 100 200 300 400 Time 500 600 700 70 800 Footprint (Fraction of Population) Answer: Yes! There is a phase transition 0.8 0.6 0.4 0.2 k1 k2 0 i1,2 20 ASSUME: Virus 1 iswkshp stronger than B.Virus 2Faloutsos Graph Analytics A. Prakash; C. 40 60 80 Time 100 120 140 71 Footprint (Fraction of Population) Answer: Yes! There is a phase transition 0.8 0.6 0.4 0.2 k1 k2 0 i1,2 0 ASSUME: Virus 1 iswkshp stronger than B.Virus 2Faloutsos Graph Analytics A. Prakash; C. 50 100 150 Time 200 250 72 300 Our Result: Viruses can Co-exist Given our model and a fully connected graph, there exists an εcritical such that for ε ≥ εcritical, there is a fixed point where both viruses survive. 1. The stronger survives only if it is above threshold 2. Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2) 3. Strength(Virus) σ = N β / δ In Beutel+ KDD 2012 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 73 Real Examples [Google Search Trends data] Hulu v Blockbuster Graph Analytics wkshp B. A. Prakash; C. Faloutsos 74 Real Examples [Google Search Trends data] Chrome v Firefox Graph Analytics wkshp B. A. Prakash; C. Faloutsos 75 Outline • • • • • Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Learning Models (Empirical Studies) Conclusion Graph Analytics wkshp B. A. Prakash; C. Faloutsos 76 Part 2: Algorithms • Q3: Whom to immunize? • Q4: How to detect outbreaks? • Q5: Who are the culprits? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 77 Full Static Immunization Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal). ? ? k=2 ? ? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 78 Part 2: Algorithms • Q3: Whom to immunize? – Full Immunization (Static Graphs) – Full Immunization (Dynamic Graphs) – Fractional Immunization • Q4: How to detect outbreaks? • Q5: Who are the culprits? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 79 Challenges • Given a graph A, budget k, Q1 (Metric) How to measure the ‘shieldvalue’ for a set of nodes (S)? Q2 (Algorithm) How to find a set of k nodes with highest ‘shield-value’? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 80 Proposed vulnerability measure λ λ is the epidemic threshold “Safe” “Vulnerable” “Deadly” Increasing λ Increasing vulnerability Graph Analytics wkshp B. A. Prakash; C. Faloutsos 81 A1: “Eigen-Drop”: an ideal shield value Eigen-Drop(S) Δ λ = λ - λs 9 9 11 10 Δ 9 10 1 1 4 4 8 8 2 2 5 5 6 Original Graph Graph Analytics wkshp 7 3 7 3 B. A. Prakash; C. Faloutsos 6 Without {2, 6} 82 (Q2) - Direct Algorithm too expensive! • Immunize k nodes which maximize Δ λ S = argmax Δ λ • Combinatorial! • Complexity: – Example: • 1,000 nodes, with 10,000 edges • It takes 0.01 seconds to compute λ • It takes 2,615 years to find 5-best nodes! Graph Analytics wkshp B. A. Prakash; C. Faloutsos 83 A2: Our Solution • Part 1: Shield Value – Carefully approximate Eigen-drop (Δ λ) – Matrix perturbation theory • Part 2: Algorithm – Greedily pick best node at each step – Near-optimal due to submodularity • NetShield (linear complexity) – O(nk2+m) n = # nodes; m = # edges In Tong, Prakash+ ICDM 2010 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 84 Our Solution: Part 1 • Approximate Eigen-drop (Δ λ) • Δ λ ≈ SV(S) = – Result using Matrix perturbation theory – u(i) == ‘eigenscore’ ~~ pagerank(i) Graph Analytics wkshp B. A. Prakash; C. Faloutsos A u(i) u =λ. u 85 P1: node importance Original Graph Graph Analytics wkshp P2: set diversity Select by P1 B. A. Prakash; C. Faloutsos Select by P1+P2 86 Our Solution: Part 2: NetShield • We prove that: SV(S) is sub-modular (& monotone non-decreasing) Corollary: Greedy algorithm works 1. NetShield is near-optimal (w.r.t. max SV(S)) 2. NetShield is O(nk2+m) • NetShield: Greedily add best node at each step Graph Analytics wkshp Prakash;NetShield C. Faloutsos Footnote: near-optimal meansB. A.SV(S ) >= (1-1/e) SV(S Opt) 87 Experiment: Immunization quality Log(fraction of infected nodes) PageRank Betweeness (shortest path) Degree Lower is better Graph Analytics wkshp Acquaintance Eigs (=HITS) NetShield Time B. A. Prakash; C. Faloutsos 88 Part 2: Algorithms • Q3: Whom to immunize? – Full Immunization (Static Graphs) – Full Immunization (Dynamic Graphs) – Fractional Immunization • Q4: How to detect outbreaks? • Q5: Who are the culprits? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 89 Full Dynamic Immunization • Given: Set of T arbitrary graphs day N N night N , weekend….. N • Find: k ‘best’ nodes to immunize (remove) In Prakash+ ECML-PKDD 2010 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 90 Full Dynamic Immunization • Our solution – Recall theorem – Simple: reduce Matrix Product day night (=λ ) • Goal: max eigendrop Δ λ Δ λ = λ befor e λ after • No competing policy for comparison • We propose and evaluate many policies Graph Analytics wkshp B. A. Prakash; C. Faloutsos 91 Performance of Policies Footprint after k=6 immunizations 34 32 30 28 26 24 22 20 Graph Analytics wkshp Lower is better Footprint B. A. Prakash; C. Faloutsos MIT Reality Mining 92 Part 2: Algorithms • Q3: Whom to immunize? – Full Immunization (Static Graphs) – Full Immunization (Dynamic Graphs) – Fractional Immunization • Q4: How to detect outbreaks? • Q5: Who are the culprits? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 93 Fractional Immunization of Networks B. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos Under Submission Graph Analytics wkshp B. A. Prakash; C. Faloutsos 94 Full Static Immunization Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal). k=2 ? ? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 95 Fractional Asymmetric Immunization • Fractional Effect [ f(x) =0.5 x] • Asymmetric Effect # antidotes = 3 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 96 Fractional Asymmetric Immunization • Fractional Effect [ f(x) =0.5 x] • Asymmetric Effect # antidotes = 3 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 97 Fractional Asymmetric Immunization • Fractional Effect [ f(x) =0.5 x] • Asymmetric Effect # antidotes = 3 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 98 Fractional Asymmetric Immunization Drug-resistant Bacteria (like XDR-TB) Another Hospital Hospital Graph Analytics wkshp B. A. Prakash; C. Faloutsos 99 Fractional Asymmetric Immunization =f Drug-resistant Bacteria (like XDR-TB) Another Hospital Hospital Graph Analytics wkshp B. A. Prakash; C. Faloutsos 100 Fractional Asymmetric Immunization Problem: Given k units of disinfectant, how to distribute them to maximize hospitals saved? Another Hospital Hospital Graph Analytics wkshp B. A. Prakash; C. Faloutsos 101 Our Algorithm “SMARTALLOC” ~6x fewer! [US-MEDICARE NETWORK 2005] • Each circle is a hospital, ~3000 hospitals • More than 30,000 patients transferred CURRENT PRACTICE Graph Analytics wkshp B. A. Prakash; C. Faloutsos SMART-ALLOC 102 Wall-Clock Time Running Time > 1 week ≈ > 30,000x speed-up! Lower is better Graph Analytics wkshp 14 secs Simulations B. A. Prakash; C. Faloutsos SMART-ALLOC 103 Lower is better Experiments PENN-NETWORK SECOND-LIFE ~5 x K = 200 Graph Analytics wkshp B. A. Prakash; C. Faloutsos ~2.5 x K = 2000 104 Part 2: Algorithms • Q3: Whom to immunize? • Q4: How to detect outbreaks? • Q5: Who are the culprits? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 105 Outbreak detection • Problems of finding sources of contamination in water networks and finding “hot” stories on blogs are isomorphic. – Minimize time to detection, population affected – Maximize probability of detection. – Minimize sensor placement cost. Blo gs Links Graph Analytics wkshp Po sts B. A. Prakash; C. Faloutsos Information cascade 106 • J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance. "Costeffective Outbreak Detection in Networks” KDD 2007 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 107 CELF: Main idea • Given a graph G(V,E) • and a budget of B sensors • and data on how contaminations spread over the network: – for each contamination i we know the time T(i, u) when it contaminated node u • Minimize time to detect outbreak • CELF algorithm uses submodularity and lazy evaluation Graph Analytics wkshp B. A. Prakash; C. Faloutsos 108 Blogs: Comparison to heuristics Benefit (higher= better) Graph Analytics wkshp B. A. Prakash; C. Faloutsos 109 “Best 10 blogs to read” NP - number of posts, IL- in-links, OLO- blog out links, OLA- all out links • • • • • • • • • • • k 1 2 3 4 5 6 7 8 9 10 PA score 0.1283 0.1822 0.2224 0.2592 0.2923 0.3152 0.3353 0.3508 0.3654 0.3778 Graph Analytics wkshp Blog http://instapundit.com http://donsurber.blogspot.com http://sciencepolitics.blogspot.com http://www.watcherofweasels.com http://michellemalkin.com http://blogometer.nationaljournal.com http://themodulator.org http://www.bloggersblog.com http://www.boingboing.net http://atrios.blogspot.com NP 4593 1534 924 261 1839 189 475 895 5776 4682 B. A. Prakash; C. Faloutsos IL 4636 1206 576 941 12642 2313 717 247 6337 3205 OLO 1890 679 888 1733 1179 3669 1844 1244 1024 795 OLA 5255 3495 2701 3630 6323 9272 4944 10201 6183 3102 110 Part 2: Algorithms • Q3: Whom to immunize? • Q4: How to detect outbreaks? • Q5: Who are the culprits? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 111 • B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ Under Submission Graph Analytics wkshp B. A. Prakash; C. Faloutsos 112 Problem definition 2-d grid ‘+’ -> infected Who started it? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 113 Problem definition 2-d grid ‘+’ -> infected Who started it? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 114 Culprits: Exoneration Graph Analytics wkshp B. A. Prakash; C. Faloutsos 115 Who are the culprits • Two-part solution – use MDL for number of seeds – for a given number: • exoneration = centrality + penalty • our method uses smallest eigenvector of Laplacian submatrix • Running time = – linear! (in edges and nodes) Graph Analytics wkshp B. A. Prakash; C. Faloutsos 116 Culprits: Results Graph Analytics wkshp B. A. Prakash; C. Faloutsos 117 Outline • • • • • Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Learning Models (Empirical Studies) Conclusion Graph Analytics wkshp B. A. Prakash; C. Faloutsos 118 Part 3: Empirical Studies • Q6: How do cascades look like? • Q7: How does activity evolve over time? • Q8: How does external influence act? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 119 Cascading Behavior in Large Blog Graphs Blogs Posts Information cascade Links How does information propagate over the blogosphere? J. Leskovec, M.McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SDM 2007. Graph Analytics wkshp B. A. Prakash; C. Faloutsos 120 Cascades on the Blogosphere B1 B2 B1 1 1 a B2 1 B3 1 B4 Blogosphere blogs + posts d c b e e B3 c 2 B4 Blog network links among blogs 3 d e Post network links among posts a Cascade is graph induced by a time ordered propagation of information (edges) Cascades 3 - 121Analytics wkshp Graph b B. A. Prakash; C. Faloutsos Blog data 45,000 blogs participating in cascades All their posts for 3 months (Aug-Sept ‘05) 2.4 million posts ~5 million links (245,404 inside the dataset) Number of posts Number of posts 3 - 122Analytics wkshp Graph Time C. [1Faloutsos day] B. A. Prakash; Popularity over time # in links 1 2 3 lag: days after post Post popularity drops-off – exponentially? @t @t + lag Graph Analytics wkshp B. A. Prakash; C. Faloutsos 123 Popularity over time # in links (log) days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 124 Popularity over time # in links (log) -1.6 days after post (log) Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 • close to -1.5: Barabasi’s stack model • and like the zero-crossings of a random walk Graph Analytics wkshp B. A. Prakash; C. Faloutsos 125 CMU SCS -1.5 slope J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF] Graph Analytics wkshp B. A. Prakash; C. Faloutsos 126 Topological patterns: Cascades Procedure for gathering cascades: Find all initiators (nodes with out-degree 0) Follow in-links Produces directed acyclic graph Count cascade shapes (use our multi-level graph isomorphism testing algorithm) a a b b c d c d e 3 - 127Analytics wkshp Graph d e e B. A. Prakash; C. Faloutsos c b e a Topological Observations How do we measure how information flows through the network? Common cascade shapes extracted using algorithms in [Leskovec, Singh, Kleinberg; PAKDD 2006]. Graph Analytics wkshp B. A. Prakash; C. Faloutsos 128 Topological Observations What graph properties do cascades exhibit? Cascade size distributions also follow power law. Observation 2: The probability of observing a cascade on n nodes follows a Zipf distribution: p(n) n-2 Count a=-2 Graph Analytics wkshp B. A. Prakash; C. Faloutsos Cascade size (# of nodes) 129 Topological Observations What graph properties do cascades exhibit? Stars and chains also follow a power law, with different exponents (star -3.1, chain -8.5). a=-8.5 Count Count a=-3.1 Size of star (# nodes) Graph Analytics wkshp Size of chain (# nodes) B. A. Prakash; C. Faloutsos 130 Blogs and structure • Cascades take on different shapes (sorted by frequency): How can we use cascades to identify communities? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 131 PCA on cascade types ~44,000 blogs • Perform PCA on sparse matrix. • Use log(count+1) • Project onto 2 PC… Graph Analytics wkshp ~9,000 cascade types ………… slashdot boingboing 4.6 2.1 3.2 1.1 … … … … … 4.2 B. A. Prakash; C. Faloutsos .09 3.4 .07 5.1 2.1 .67 1.1 .07 .01 132 PCA on cascade types • Observation: Content of blogs and cascade behavior are often related. • Distinct clusters for “conservative” and “humorous” blogs (hand-labeling). M. McGlohon, J. Leskovec, C. Faloutsos, M. Hurst, N. Glance. Finding Patterns in Blog Shapes and Blog Evolution. ICWSM 2007. 28 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 133 PCA on cascade types • Observation: Content of blogs and cascade behavior are often related. • Distinct clusters for “conservative” and “humorous” blogs (hand-labeling). M. McGlohon, J. Leskovec, C. Faloutsos, M. Hurst, N. Glance. Finding Patterns in Blog Shapes and Blog Evolution. ICWSM 2007. 29 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 134 Part 3: Empirical Studies • Q6: How do cascades look like? • Q7: How does activity evolve over time? • Q8: How does external influence act? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 135 Rise and fall patterns in social media • Meme (# of mentions in blogs) – short phrases Sourced from U.S. politics in 2008 “you can put lipstick on a pig” “yes we can” Graph Analytics wkshp B. A. Prakash; C. Faloutsos 136 Rise and fall patterns in social media • Can we find a unifying model, which includes these patterns? • four classes on YouTube [Crane et al. ’08] • six classes on Meme [Yang et al. ’11] 100 100 100 50 50 50 0 0 100 50 100 50 0 0 Graph Analytics wkshp 0 0 100 50 100 50 50 100 0 0 0 0 100 50 100 50 100 50 50 100 B. A. Prakash; C. Faloutsos 0 0 137 Rise and fall patterns in social media • Answer: YES! 0 20 40 60 80 Time 20 40 60 80 Time 100 120 Value 20 40 60 80 Time 50 0 20 40 60 80 Time 100 120 50 0 100 120 Original SpikeM 100 Value Value 50 0 0 100 120 Original SpikeM 100 50 Original SpikeM 100 20 40 60 80 Time 100 120 Original SpikeM 100 Value Value 50 Original SpikeM 100 Value Original SpikeM 100 50 0 20 40 60 80 Time 100 120 • We can represent all patterns by single model In Matusbara+ SIGKDD 2012 Graph Analytics wkshp B. A. Prakash; C. Faloutsos 138 Main idea - SpikeM - 1. Un-informed bloggers (uninformed about rumor) - 2. External shock at time nb (e.g, breaking news) - 3. Infection (word-of-mouth) Time n=0 Time n=nb Infectiveness of a blog-post at age n: b f (n) Time n=nb+1 f (n) = b * n-1.5 - Strength of infection (quality of news) - Decay function (how infective a blog posting is) Graph Analytics wkshp B. A. Prakash; C. Faloutsos β Power Law 139 SpikeM - with periodicity • Full equation of SpikeM n é ù DB(n +1) = p(n +1)× êU(n)× å (DB(t) + S(t))× f (n +1- t) + e ú ê ú ë t=n û b Periodicity 12pm Peak activity Bloggers change their activity over time activity 3am Low activity p(n) (e.g., daily, weekly, yearly) Time n Graph Analytics wkshp B. A. Prakash; C. Faloutsos 140 Details • Analysis – exponential rise and power-raw fall Liner-log Rise-part SI -> exponential Log-log Graph Analytics wkshp SpikeM -> exponential B. A. Prakash; C. Faloutsos 141 Details • Analysis – exponential rise and power-raw fall Liner-log Fall-part SI -> exponential SpikeM -> power law Graph Analytics wkshp B. A. Prakash; C. Faloutsos Log-log 142 Tail-part forecasts • SpikeM can capture tail part Graph Analytics wkshp B. A. Prakash; C. Faloutsos 143 “What-if” forecasting (1) First spike (2) Release date (3) Two weeks before release ? ? e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date Graph Analytics wkshp B. A. Prakash; C. Faloutsos 144 “What-if” forecasting –SpikeM can forecast not only tail-part, but also rise-part! (1) First spike (2) Release date (3) Two weeks before release • SpikeM can forecast upcoming spikes Graph Analytics wkshp B. A. Prakash; C. Faloutsos 145 Part 3: Empirical Studies • Q6: How do cascades look like? • Q7: How does activity evolve over time? • Q8: How does external influence act? Graph Analytics wkshp B. A. Prakash; C. Faloutsos 146 Tweets Diffusion: Problem Definition • Given: – Action log of people tweeting a #hashtag – A network of users • Find: – How external influence varies with #hashtags? ?? ?? ? ? Graph Analytics wkshp ? B. A. Prakash; C. Faloutsos ? 147 Tweet Diffusion: Data • Yahoo! Twitter firehose • More than 750 million tweets (> 10 Tera-bytes) • Test-bed of > 6000 machines – Hadoop+PIG system ver 0.20.204.0 • Took top 500 hashtags (by volume) in Feb 2011 • Network of users: – connecting user X to user Y if X directed at least 3 @-messages to Y (or RT-ed a tweet) Graph Analytics wkshp B. A. Prakash; C. Faloutsos 148 Tweet Diffusion • Propagation = Influence + External • Developed a model – takes the previous observations into account – with parameters representing external influence • Learn from previous data – EM-style alternating minimizing algorithm • Group tags according to learnt params Graph Analytics wkshp B. A. Prakash; C. Faloutsos 149 Results: External Influence vs Time “External Effects” Long-running tags #nowwatching, #nowplaying, #epictweets “Word-of-mouth” #purpleglasses, #brits, #famouslies #oscar, #25jan Bursty, external events #openfollow, #ihatequotes, #tweetmyjobs time “Word-of-mouth” Not trending Can also use for Forecasting, Anomaly Detection! Graph Analytics wkshp B. A. Prakash; C. Faloutsos 150 Outline • • • • • Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Learning Models (Empirical Studies) Conclusion Graph Analytics wkshp B. A. Prakash; C. Faloutsos 151 Conclusions • Epidemic Threshold – It’s the Eigenvalue • Fast Immunization – Max. drop in eigenvalue, linear-time near-optimal algorithm • Bursts: SpikeM model – Exponential growth, Power-law decay Graph Analytics wkshp B. A. Prakash; C. Faloutsos 152 Theory & Algo. Biology Physics Comp. Systems ML & Stats. Social Science Propagation on Networks Econ. 153 Graph Analytics wkshp B. A. Prakash; C. Faloutsos Publications * 1. ** 2. * 3.4. 5. * * * * 6. 7. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.) Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain 8. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C. 9. Parsimonious Linear Fingerprinting for Time Series (Lei Li, B. Aditya Prakash and Christos Faloutsos) - In VLDB 2010, Singapore 10. EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In PAKDD 2010, Hyderabad, India 11. BGP-lens: Patterns and Anomalies in Internet-Routing Updates (B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis Faloutsos and Christos Faloutsos) – In ACM SIGKDD 2009, Paris, France. 12. Surprising Patterns and Scalable Community Detection in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In IEEE ICDM Large Data Workshop 2009, Miami 13. FRAPP: A Framework for high-Accuracy Privacy-Preserving Mining (Shipra Agarwal, Jayant R. Haritsa and B. Aditya Prakash) – In Intl. Journal on Data Mining and Knowledge Discovery (DKMD), Springer, vol. 18, no. 1, February 2009, Ed: Johannes Gehrke. 14. Complex Group-By Queries For XML (C. Gokhale, N. Gupta, P. Kumar, L. V. S. Lakshmanan, R. Ng and B. Aditya Prakash) – In Graph Analytics wkshp B. A. Prakash; C. Faloutsos 154 IEEE ICDE 2007, Istanbul, Turkey. Acknowledgements Collaborators Christos Faloutsos Roni Rosenfeld, Michalis Faloutsos, Lada Adamic, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu, Varun Gupta, Jilles Vreeken, Graph Analytics wkshp Deepayan Chakrabarti, Hanghang Tong, Kunal Punera, Ashwin Sridharan, Sridhar Machiraju, Mukund Seshadri, Alice Zheng, Lei Li, Polo Chau, Nicholas Valler, Alex Beutel, Xuetao Wei B. A. Prakash; C. Faloutsos 155 Acknowledgements Funding Graph Analytics wkshp B. A. Prakash; C. Faloutsos 156 Dynamical Processes on Large Networks B. Aditya Prakash Christos Faloutsos Analysis Graph Analytics wkshp Policy/Action B. A. Prakash; C. Faloutsos Data 157