Discrete Probability on Graphs: Estimation, Reconstruction of & Optimization on Networks Elchanan Mossel UC Berkeley At: IPAM Mar 2007 Outline: Stochastic Models on Networks • Disclaimer: Big field ; Biased choice of examples … - an applied view. • Part 0: Two types of Network problems. • Part I: Estimation of statistical quantities in Gibbs-measures / Markov Random Fields • part II: Reconstruction of Stochastic Networks from observations. – - Tree Networks. Directed Acyclic graphs. • part III: Optimization over stochastic models defined on networks - Which functions of stochastic models can be (approximately) optimized efficiently? January 15, 2007 2/31 Part 0: Two Types of Network Problems Two types of Network problems • Type 1: Structural Network problems. • Type 2: Distributional Network problems. • This talk: Mostly Distributional network problems. • Examples of Structural Network problems: • “Clustering”: Partition a graph G = (V,E) to V = V1,…,Vk such that each Vi is “big” and there is a small number of edges between Vi and Vj for i j. • “Ranking”: Given a random walk on a finite set, find the stationary distribution. • Spectral Techniques are applicable for both problems. January 15, 2007 4/31 A hard Structural Network Problem • The “Graph Isomorphism Problem” Given two graphs (G,E) and (H,F) is there an “isomorphism”, f : G ! H one to one s.t. (v1,v2) 2 E iff (f(v1),f(v2)) 2 F. • Clear: if two graphs isomorphic, then they have same spectral structure, but this is not enough … • Other open problems exits in this area … • Example of recent work: January 15, 2007 5/31 part I: Estimation in Markov Random Fields Gibbs Measures / Graphical Models A Gibbs Measure on a (finite) graph G=(V,E) is given by Node potentials (v : v 2 V) and Edge Potentials (e : e 2 E) The probability of = ((v) : v 2 V) 2 A|V| is given by P[] = Z-1 £ v 2 V v[(v)] £ e=(v,u) 2 Ee[(v),(u)] G Gibbs measures introduced in Statistical Physics. Essential in Machine Learning. Also known as Markov Random Fields, Graphical Models etc. January 15, 2007 Diffusion of Influence in Social Networks 7 Message Passing Algorithms / The Replica Method Statistical Problem: Given a Gibbs measure: estimate P[(0) = a] Equivalent to many other Inference Problems. Computational View: Problem can be NP hard (to approximate) even in very simple cases. Statistical Physics view: Find Dynamics / Markov Chains that have P as stationary measure. Statistical Physics Insight: G Rapid Convergence of Dynamics spatial correlation decay. A very active area of research ; Fascinating Challenges. Artificial Intelligence / Neuroscience / Replica view: Solve problem by “Message Passing” January 15, 2007 Diffusion of Influence in Social Networks 8 Message Passing Algorithms / The Replica Method Message Passing Algorithms are used to estimate probabilities on graphical models. Examples: Warning Propagation, Sum-Product, Belief Propagation etc. All of these algorithms do exact calculation for an associated computation tree. Example: Belief Propagation (BP) is a popular method in AI/Coding for estimating marginal probabilities P[(0) = a] for a Gibbs measure G. It is equivalent [TatikondaJordan02] to calculating marginal probabilities P[(0) = a] on the computation tree T(G). Question: How come message passing algorithms work in practice? January 15, 2007 Diffusion of Influence in Social Networks G T 9 Message Passing Algorithms in Coding In coding: BP is used to decode Low Density Parity Check Codes (LDPC) [Gallager62] Proved to be efficient [Luby-Mitzenmacher-ShokrollahiSpielman-98, Richardson-Urbanke-01] Message passing algorithms work “because:” LDPC factor graphs are locally “tree-like” & Individual constraints “push” toward the correct code word. Actual analysis uses recursion of random variables on the tree. January 15, 2007 Diffusion of Influence in Social Networks 10 Message Passing Algorithms -Random 3-SAT x1 x2 x3 x4 x5 x6 x7 n x8 m = n WalkSAT Survey propagation Satisfiable Belief propagation Satisfiable Myopic Not satisfiable Not satisfiable PLR 0 January 15, 2007 1.63 3.52 3.95 Diffusion of Influence in Social Networks 4.27 4.51 11 Message Passing Algorithms for Random 3-SAT Message passing algorithms work because: Random-SAT graphs are locally “tree-like” Far away variables are uncorrelated: Speculation 1: For Belief Propagation: Variables are un-correlated in a “standard sense” when · 3.95 Thm: (Maneva-M-Wainwright-05): Survey Propagation is just Belief Propagation on an extended Markov Random Field. Speculation 2: For Survey Propagation: Variables are un-correlated in the extended Markov Random Field for all . Speculations 1 & 2 are under heated discussions between Physicists, Computer Scientists and Mathematicians … 15, 2007 M. January Talagrand in Social Networks G. ParisiDiffusion of Influence B. Selman 12 Decay of correlation for 3-SAT extended MRF {0, 1}n assignments Partial assignments 0110 1011 01101 # stars 01101 January 15, 2007 Diffusion of Influence in Social Networks 13 part II: Reconstructing Stochastic Network from observations Main Problem: • How to reconstruct the network topology from observations at a (sub)-set of the nodes? • The Example: Reconstructing Trees. January 15, 2007 15 Two Tree Inference Problems In Evolution: Phase Transition: Given a tree of species / mothers, can we infer ancestral sequence at the root from contemporary samples? Trade-off between noise and duplication? Reconstructing Evolution: Is it possible to reconstruct evolutionary history from genetic sequences? January 15, 2007 16 Defn: Markov Model on a Tree …001100011101000011000100… s(r) Ising/BSC/CFN Model: Tree: T = (V,E) Node states: pra s(v) 0,1: v V s(a) Number of leaves: n 0: Purines (A,G) 1: Pyrimidines (C,T) January 15, 2007 prc 0 pab Mutation probabilities: 0 pe 1 2 : e E 0 s(b) pb1 0 s(1) pa3 0 pb2 0 1 s(2) s(3) 1 s(c) pc4 pc5 0 1 s(4) s(5) 17 Defn: Phylogenetic Reconstruction Problem Phylogenetic Reconstruction: Given: k i.i.d. samples at the n leaves Task: fully reconstruct the model, i.e. find tree and mutation probabilities (and, if possible, do so efficiently) Studied in: Biology (dozens of books, 1000s of papers) [Felsenstein’04] TCS (Learning): [Ambainis-DesperFarach-Kannan’97], [FarachKannan’96], [Cryan-GoldbergGoldberg’02] [M-Roch ] Combinatorial Phylogeny: [Erdos‘98], [M’07] January Steel-Szekely-Warnow’97, 15, 2007 s(1) s(2) 0 0 1 0 1 0 0 1 0 0 s(3) 1 0 0 1 0 s(4) s(5) 1 1 1 1 0 1 1 1 1 1 + prc pc5 18 Phase Transition for the Ising model LOW Temp “typical” boundary bias 2 2 > 1 HIGH Temp “typical” boundary no bias 22 < 1 The transition at 2 2 = 1 was proved by: [Bleher-Ruiz-Zagrebnov’95], [Ioffe’96],[Evans-Kenyon-Peres-Schulman’00], [Kenyon-Mossel-Peres’01],[Martinelli-Sinclair-Weitz’04], [Borgs-Chayes-M-Roch’06]. Also, “spin-glass” case studied by [Chayes-Chayes-Sethna-Thouless’86]. Solvability for 2 2 > 1January was first by [Higuchi’77] (and [Kesten-Stigum’66]). 15, proved 2007 2 (M ) 19 n = # of leaves k = # of samples Steel’s Favorite Conjecture Reconstruction Problem Phylogeny N conj Y conj N proof k = n(1) k =(log n) k = n(1) [M’03 (J. Comp. Biol.)] Y proof k =(log n) Random Cluster Model: [M-Steel’04 (Math. Biosciences.)] CFN Model: [M-04’ (Transaction of AMS)], January 15, 2007 [Daskalakis-M-Roch’ (STOC06)] 20 Polynomial Lower Bound at High Mutations Proof: Conditional Independence + Data Processing Lemma X=T L Known q-L ? ? *k Known *k In fact: [M’06: (IEEE. Comp. Bio. & BioInfo)]: “Shallow Part” of the tree can be efficiently reconstructed when k = O(log n) for all mutation rates. Also in practice [Daskalakis-Hill-Jaffe-Mihaescu-M-Rao (Recomb06)] January 15, 2007 21 Reconstruction from short sequences Th [Daskalakis-M-Roch (STOC’06)]: If T is a tree on n leaves s.t. For all e, min < (e)< max and 22min > 1, max < 1. Then there exists a polynomial time algorithm that uses sequences of length k = O(log n – log ) to reconstruct the topology with probability 1- in polynomial time where the constant depends on (min, max). January 15, 2007 22 Proof: Distance Methods Associate to each edge e the weight ln (12pe) For any two leaves i and j: ln(1 – 2 pi,j) = ln (1 – 2 pe) where the sum is over all e in the path connecting a to b. Reconstruction Algorithm: r (ra) a (ab) Estimate pi,j from sequences b Deduce the topology of the tree (b1) Problem: Need exp. long sequences ESSW: “log n” radius neighborhoods determine the tree ) poly(n) sequence length 1 suffices. January 15, 2007 (rc) (a3) (b2) 2 c (c4) 3 4 (c5) 5 23 Back Four-Point Method D(a, c) D(b, d ) D(a, b) D(c, d ) a c a b b d c d January 15, 2007 0 0 a b d c 0 24 Balanced Trees Two-Step Algorithm [M, 2004]: 1) Reconstruct one (or a few) level(s) 2) Infer sequences at roots 3) Start over January 15, 2007 25 General Trees [Daskalakis, M, Roch, 2006] January 15, 2007 26 Blindfolded Cherry Picking Need “only” one extra step in the algorithm Main Loop: 1) Distance estimation 2) Identify cherries from the next level 3) Sequence reconstruction 4) Detect “fake cherries” January 15, 2007 27 Blindfolded Cherry Picking I: Edge Disjointness Non Edge-Disjoint Reconstruction January 15, 2007 True Tree 28 Blindfolded Cherry Picking II: Weight Estimation January 15, 2007 29 Blindfolded Cherry Picking III: Collisions January 15, 2007 30 Tree Reconstruction in a Nutshell Tree reconstruction can be solved from very short sequences There exists a good estimator for root reconstruction Similar Techniques apply to other tree networks – for example Reconstructing Multicast Networks (Liang-M-Yu, BhamidiRajagopal-Roch) January 15, 2007 31 Back to General Problem: • How to reconstruct the network topology from observations at a (sub)-set of the nodes? • Example 3: Reconstructing Markov Random Fields from observations at a subset of the nodes ??? January 15, 2007 32/31 part III: Optimization over Stochastic Networks Motivating Problem • Problem: – • Examples: – – – – – • • Optimization over stochastic models defined on networks. Which Genes to knock out in order to kill a cancer cell? Which computers to immune in order make a networks robust? Which computers to attack in order to fail the network? Which individuals to immune to stop a disease from spreading. Viral Marketing: Which individuals to expose to a product so as to maximize its distribution? One case Study: Influence in Social Networks Joint work with Sebastien Roch. January 15, 2007 34/31 models of collective behavior • examples: – – – • joining a riot adopting a product going to a movie model features: – – – binary decision cascade effect network structure January 15, 2007 35/31 viral marketing • referrals, word-of-mouth can be very effective – ex.: Hotmail • viral marketing – – goal: mining the network value of potential customers how: target a small set of trendsetters, seeds • example [Domingos-Richardson’02] – – collaborative filtering system use MRF to compute “influence” of each customer January 15, 2007 36/31 independent cascade model • when a node is activated – – it gets one chance to activate each neighbour probability of success from u to v is pu,v 0.5 0.33 0.25 0.5 1.0 0.5 0.5 0.5 1.0 0.75 0.5 0.5 January 15, 2007 0.25 0.5 37/31 generalized models • graph G=(V,E); initial activated set S0 • generalized threshold model [Kempe-Kleinberg-Tardos’03,’05] – – – • activation functions: fu(S) where S is set of activated nodes threshold value: u uniform in [0,1] dynamics: at time t,set St to St-1 and add all nodes with fu(St-1) u (note the process stops after (at most) n-1 steps) generalized cascade model [KKT’03,’05] – when node u is activated: • gets one chance to activate each neighbours • probability of success from u to v: pu(v,S) where S is set of nodes who have already tried (and failed) to activate u – • assumption: the pu(v,.)’s are “order-independent” theorem [KKT’03] - the two models are equivalent January 15, 2007 38/31 influence maximization • definition - the influence (S) given the initial seed S is the expected size of the infected set at termination (S) E S Sn1 • definition - in the influence maximization problem (IMP), we want to find the seed S of fixed size k that maximizes the influence S* argmax (S) : S V, S k • theorem [KKT’03] - the IMP is NP-hard – reduction from Set Cover: ground set U = {u1,…,un} and collection of cover S ,…,S subsets 1 m u1 u2 u3 un ui S j … … independent cascade model S1 S2 S3 (ui ,S j ) E Sm January 15, 2007 S, S k, (S) n k ? 39/31 submodularity • definition - a set function f : V -> R is submodular if for all A, B in V f (A) f (B) f (A B) f (A B) • example: f(S) = g(|S|) where g is concave • interpretation: “discrete concavity” or “diminishing returns”, indeed submodularity equivalent to S T,v V, f (T {v}) f (T) f (S {v}) f (S) • threshold models: – it is natural to assume that the activation functions have diminishing – returns supported by observations of [Leskovec-Adamic-Huberman’06] in the context of viral marketing January 15, 2007 40/31 main result • theorem [M-Roch’06; first conjectured in KKT’03] - in the generalized threshold model, if all activation functions are monotone and submodular, then the influence is also submodular • corollary [M-Roch’06] - IMP admits a (1 - e-1 - )-approximation algorithm (for all > 0) – this follows from a general result on the approximation of submodular functions [Nemhauser-Wolsey-Fisher’78] • known special cases [KKT’03,’05]: – – linear threshold model, independent cascade model decreasing cascade model, “normalized” submodular threshold model S T, pu (v,S) pu (v,T) or equiv. January 15, 2007 f u (S {v}) f u (S) f u (T {v}) f u (T) 1 f u (S) 1 f u (T) 41/31 related work • sociology – – threshold models: [Granovetter’78], [Morris’00] cascades: [Watts’02] • data mining – – viral marketing: [KKT’03,’05], [Domingos-Richardson’02] recommendation networks: [Leskovec-Singh-Kleinberg’05], [LeskovecAdamic-Huberman’06] • economics – game-theoretic point of view: [Ellison’93], [Young’02] • probability theory – – – Markov random fields, Glauber dynamics percolation interacting particle systems: voter model, contact process January 15, 2007 42/31 proof sketch coupling • we use the generalized threshold model • arbitrary sets A, B; consider 4 processes: – – – – (At) started at A (Bt) started at B (Ct) started at AB (Dt) started at AB • it suffices to couple the 4 processes in such a way that for all t Ct At Bt Dt At Bt • indeed, at termination An1 Bn1 An1 Bn1 An1 Bn1 Cn1 Dn1 (note this works with |.| replaced with any w monotone, submodular) January 15, 2007 44/31 proof ideas • our goal: Ct At Bt (1) Dt At Bt (2) • antisense coupling – – – – obvious way to couple: use same u’s for all 4 processes satisfies (1) but not (2) “antisense”: using u for (At) and (1-u) for (Bt) “maximizes union” we combine both couplings • piecemeal growth – – seed sets can be introduced in stages we add AB then A\B and finally B\A • need-to-know – – not necessary to pick all u’s at beginning can unveil only what we need to know: v f v St2 , f v St1 ? January 15, 2007 45/31 piecemeal growth • process started at S: (St) • partition of S: S(1),…,S(K) • consider the process (Tt): – – – – pick u’s run the process with seed S(1) until termination add S(2) and continue until termination add S(3) and so on • lemma - the sets Sn-1 and TKn-1 have the same distribution January 15, 2007 46/31 antisense coupling • disjoint sets: S, T • partition of S: S(1),…,S(K) • piecemeal process with seeds S(1),…,S(K),T: (St) • consider the process (Tt): – – – pick u’s run piecemeal process with seeds S(1),…,S(K) until termination add T and continue with threshold values v ' 1 v f v TKn 1 • lemma - the sets S(K+1)n-1 and T(K+1)n-1 have the same distribution January 15, 2007 47/31 need-to-know • proof of lemma – – – run the first K stages identically in both processes note that for all v not in SKn-1 = TKn-1, v is uniformly distributed in [fv(TKn-1),1] but v’ = 1 - v + fv(TKn-1) has the same distribution v f v St2 , f v St1 ? simulation 1 January 15, 2007 simulation 2 48/31 proof I ANTI January 15, 2007 49/31 proof II ANTI January 15, 2007 50/31 proof III • new processes have correct final distribution • up to time 2n-1, Bt = Ct and At = Dt so that Ct At Bt Dt At Bt • for time 2n, note that B2n1 D2n1 B2n B2n1 (T \ S) D2n D2n1 (T \ S) • so by monotonicity and submodularity f v (B2n ) f v (B2n1) f v (D2n ) f v (D2n1) • then proceed by induction January 15, 2007 51/31 general result • we have proved: theorem [Mossel-R’06] - in the generalized threshold model, if all activation functions are submodular, then for any monotone, submodular function w, the generalized influence w (S) E S [w(Sn1)] is submodular • Note: A closure property for sub-modular functions! January 15, 2007 52/31 Future Research Directions • Study optimization problems for other stochastic models defined on networks. • And another annoying problem where discrete probability may help: • Are there (easily computable? Probabilistic?) invariants of unlabelled graphs that uniquely determine them? • Motivation: Can one efficiently check if two graphs are isomorphic? January 15, 2007 53/31