Constrained Graph Construction Problems in Network Modeling Zoltán Toroczkai Department of Physics, University of Notre Dame Collaborators: Sponsors: Szabolcs Horvát (U. Notre Dame) István Miklós (Rényi Inst. Math.) Peter L. Erdős (Rényi Inst. Math.) Kevin E. Bassler (U. Houston) Charo del I Genio (U. Warwick) Hyunju Kim (Arizona State) László Székely (U. South Carolina) Éva Czabarka (U. South Carolina) Chess Puzzle: Swap the positions of white knights with those of the black knights a 1 2 3 4 b c d a b c d I. Network representation 1 II. Redirected the process of thought to the “where” pathway. 2 3 4 b1 c3 a2 d1 c4 b2 c1 d3 b3 a1 c2 b1 c3 a2 d1 c4 b2 c1 b3 a1 c2 d3 Not optimal: it only respects the relationships. Optimized layout: minimal edge crossings, minimal wire length. This representation allows us to infer and exploit GLOBAL Information quickly. Global information is necessary for finding solutions fast (esp. NP-hard problems). “Dumb” algorithms: representation independent very inefficient. “Smart” algorithms: exploit the structure of the data / global information. How do we know that there is global information in a dataset? How do we extract it? OPEN! This is very typical, e.g. : Interareal network in the macaque cortex b1 c3 a2 d1 c4 b2 c1 d3 N.T. Markov, et al. Science 342(6158), 1238406 (2013) b3 a1 c2 I understand a network if I can generate it (or similar versions of it). We are looking for the essential factors that generate the global information within the structure. Essential factors may appear through Constraints Indeed, for the cortical network: Wiring costs and cortical geometry + M. Ercsey-Ravasz et al., Neuron 80, 184-197 (2013). N.T. Markov, et al. Science 342(6158), 1238406 (2013) - Many features and network measures captured - What is not captured: noise, or structures that need new constraints/info Data Driven Network Modeling constraints ensemble Typical scenarios Data Want o Partial info o Good guesses about the rest o Complete o Plausible constraints capturing the data Constraints can be imposed: • precisely/verbatim – Sharp constraints • “softly”, via ensemble averages – Average constraints This setting defines a set of fundamental problems related to ensemble-based modeling of complex networks. Sharp Constraints Consider: - the set of all simple graphs on N nodes: ( ) - a set of graph measures, or observables (the constraints): Def. 1 : Sharply constrained ensemble: i.e., all members of the ensemble have the same values precisely for the corresponding graph measures as given by the constraints. There are 4 main problem classes related to network modeling with sharp constraints: Existence: Under what conditions on Construction: How to build any (or all) members of Sampling: How to sample by some distribution (uniformly) members of Counting: How to compute or estimate , ? ? ? ? Typically studied problems: Degree Sequence Specifies the number of neighbors for all nodes. for undirected graphs for directed graphs for bipartite graphs Joint Degree Matrix (JDM) A JDM specifies the number of edges between nodes of given degrees, for all degree pairs. Partition the nodes into groups of given degrees (classes): Then: A JDM is a stronger constraint than the degree One can think of the JDM as specifying “two-point correlations” as well between nodal degrees. sequence which it also determines uniquely: Applications are for e.g., in social networks which are distinguished by positive degree correlations (assortative networks). A.N. Patrinos & S.L. Hakimi. Discr. Math. 15, 347 (1976). I. Stanton & A. Pinar. ACM J. Exp. Alg. 17(3), 3.5 (2012). Existence Def. 2 : If , we say that the constraint is said to realize Degree Sequence . is graphical. Any graph in this case is called a graphical realization of . Well known, characterized. Erdős-Gallai (EG)/Fulkerson-Ryser type theorems E.g., 1) must be even and 2) must hold for all Havel-Hakimi algorithm: Given a graphical sequence, choose a node , and connect all its stubs to other nodes with the largest residual degrees. Repeat until all stubs are connected into edges. Joint Degree Matrix (JDM) Theorem. A 1) matrix is a graphical JDM iff: 2) 3) E. Czabarka, A. Dutle, P.L. Erdos, I. Miklos. Disc. Appl. Math. 181, 283 (2015). a clean and short proof to this EG type theorem. Others have also provided similar characterizations (Stanton-Pinar, Amanatidis-Green-Mihail) Construction o Direct construction: sequentially connect stubs (half-edges). How do we build any graph from ? Efficiently? o Switches/Swaps: start from a realization then move edges around via some operations (e.g., edge swaps/switches) to arrive at another member . What operations guarantee that all members can be reached this way? Degree Sequences o Direct construction Theorem (KTEMS): Provides necessary and sufficient conditions for graphicality of degree sequences that are restricted with forbidden edges forming a k-star on an arbitrary node i : - non-edges (forbidden links) Undirected graphs: H. Kim, Z. Toroczkai, P.L. Erdös, I. Miklós and L.A. Székely. J. Phys. A: Math. Theor. 42, 392001 (2009). J. Blitzstein, P. Diaconis. Internet Mathematics, 6(4), 487–520 (2010) Directed graphs: P.L. Erdös, I. Miklós and Z. Toroczkai. Elec. J. Comb. 17(1), R66 (2010). o Switches/Swaps 1 2 1 2 3 4 3 4 Swap the ends of two independent edges (2-swap): - This preserves the degree sequence and connects (Ryser) - Start from a graphical realization (e.g., H-H made), then do 2-swaps. Joint Degree Matrix (JDM) o Direct construction Def. : Let vector be the degree of node towards . The degree “spectrum” of node is the . - Generate a degree spectrum (any), then build all bipartite graphs between the degree classes, then create all simple graphs within every degree class. P.L. Erdős, I. Miklós, C. I. Del Genio, K.E. Bassler & Z. Toroczkai. New. J. Phys. 17, 083052 (2015). o Switches/Swaps 1 2 1 2 Same degree class - Restricted Swap Operation (RSO): - The RSO preserves the JDM and connects 3 4 É. Czabarka, A. Dutle, P.L. Erdős, I. Miklós. Disc. Appl. Math. 181, 283 (2015). 3 4 Sampling o Direct construction based importance sampling o Markov Chain Monte Carlo (MCMC) based on switches - Sample a in steps (“in poly-time”). Requirements: - Obtain pseudo-random realizations via MCMC switching in poly-mixing time. Degree Sequence o Direct construction based C.I. Del Genio, H. Kim, Z. Toroczkai and K.E. Bassler. PLoS ONE, 5(4) e10012 (2010). H. Kim, C.I. Del Genio, K.E. Bassler and Z. Toroczkai. New J. Phys. 14, 023012 (2012). - undirected - directed o MCMC based on edge swaps This is the most studied, in particular the Mixing Time Problem Definitions: “supergraph” whose nodes are all the graphical realizations in A “super-edge” means that a 2-swap in the graph . takes it to graph . a Markov chain with transition matrix The MCMC is a random walk on with probability transition matrix . Let be the eigenvalues of and Thus to show fast mixing one needs to find a polynomial upper bound (in the size of the graphs N – nr of nodes) on the mixing time, or the relaxation time: Conjecture (Kannan, Tetali, Vempala, 1999): This is still open! The switch MCMC based on 2-swaps mixes rapidly over the set of all realizations of any graphical degr sequence. - They have shown it only for regular bipartite graphs (same degrees everywhere). R. Kannan, P. Tetali and S. Vempala. Rand. Struct. Alg. 14 (4), 293-308 (1999) - Cooper, Dyer and Greenhill has shown it for arbitrary regular undirected graphs . C. Cooper, M. Dyer and C. Greenhill. Comp. Prob. Comp. 16 (4), 557-593 (2007) - Greenhill proved it for regular directed graphs. C. Greenhill. Electronic J. Comb. 16 (4), 557-593 (2011) - C. Greenhill proved it for general bounded maximum degree undirected graphs Proc. 26th ACM-SIAM Symp. Discr. Alg., New York-Philadelphia, pp. 1564-1572 (2015). http://arxiv.org/abs/1412.5249 Additionally: 1) Miklós, Erdős and Soukup have just proved it for half-regular bipartite graphs I. Miklós, P.L. Erdős & L. Soukup. Electronic J. of Comb. 20 (1), #P16, 1-51, (2013). (A very technical proof on over 50 pages). Can we generate graphs uniformly at random that realize a given graphical degree sequence such that all realizations avoid creating edges from a forbidden subgraph? They answered this question affirmatively for the following constraints: where is a half-regular bi-degree sequence such that and are arbitrary for is a k-star centered on node . is a 1-factor (a perfect matching) between the two node classes Theorem: There is switch MCMC that is mixing fast (in poly-time) in the state space of all realizations . P.L. Erdős, S.Z. Kiss, I. Miklós and L. Soukup. PLOS ONE, #e0131300 (2015). http://arxiv.org/abs1301.7523v2. Joint Degree Matrix (JDM) Theorem: The space of all realizations of any given JDM is connected via RSOs. The RSO-based MCMC is irreducible. É. Czabarka, A. Dutle, P.L. Erdős, I. Miklós. Disc. Appl. Math. 181, 283 (2015). Question: is the RSO-based MCMC mixing rapidly (poly-time in N ) on the set of all realizations of a JDM? Theorem: The restricted swap operation Markov chain mixes rapidly over the balanced realizations of any JDM, i.e., , where N is the number of nodes. Def. : A realization of a JDM is balanced iff for all : P.L. Erdős, I. Miklós & Z. Toroczkai. SIAM Discr. Math. 29, 481 (2015) . http://arxiv.org/abs/1307.5295 All graphical JDMs admit balanced realizations. A JDM realization is balanced if the degrees of nodes within a degree class towards another degree class are as uniformly distributed as possible and this is true for all degree classes. Counting Compute or estimate How constraining (or “non-random”) is Computational hardness: Counting U. Sampling ? “A is harder than B”: Construction Existence Def. 3: Fully Polynomial Almost Uniform Sampler (FPAUS): (sampling) - An MCMC algorithm that generates graph samples almost uniformly, in poly-time. Fully Polynomial Randomized Approximation Scheme (FPRAS): - An algorithm that estimates (counting) in poly-time. • M.R. Jerrum, L.G. Valiant and V.V. Vazirani. Theor. Comput. Sci. 43, 169 (1986). FPRAS FP Exact U Sampler •V.V. Vazirani. Approximation Algorithms. Springer (2003). • http://www.cc.gatech.edu/~vigoda/MCMC_Course/Sampling-Counting.pdf Def. 4 : A problem is self-reducible if the solutions to any of its instances can be recursively generated from solutions to smaller instances of the same problem, s.t. the number of branches at each recursion step is polynomially bounded by the size of the problem instance. Implications: Exact Counter M.R. Jerrum, L.G. Valiant and V.V. Vazirani. Theor. Comput. Sci. 43, 169 (1986) Exact U Sampler Thus, if we have an an FPAUS we can estimate efficiently FPRAS FPAUS The classical degree-based graph construction problem is not self-reducible. Theorem: The degree sequence problem constrained by a 1-factor and a k-star is self-reducible. P.L. Erdős, S.Z. Kiss, I. Miklós and L. Soukup. PLOS ONE, #e0131300 (2015). http://arxiv.org/abs1301.7523v2. This implies that that an FPRAS can be constructed allowing to estimate . Soft Constraints: Maximum Entropy Ensembles Soft constraints: Find a distribution P(G) over the set of all graphs such that the ensemble average obeys: Graph measures: E.g: # of edges |, # of , ... There are many ways to choose probabilities P(G) that satisfy these! How do we choose the P(G) ? - are the constraints, e.g. given by data. E. T. Jaynes, Physical Review 106, 620 (1957). The Maximum Entropy Principle: Choose the distribution that maximizes the information entropy subject to the constraints and where . The parameters control . In practice, is typically found numerically for a given Equivalent treatment: use distributions over measures instead of over graphs: , nr of graphs in with property . . (# of edges) sparse dense The Degeneracy problem (Strauss 1986): The sampled graphs may not be representative of the averages. This happens when o How does is not unimodal. (# of edges) become bimodal/multimodal ? o What can we do to eliminate/minimize this issue? Using the MaxEnt: Example: terrorist cells Exact enum: pairs interacted triples collaborated The probability that the 9 cells form a connected network? What is the most likely network? none connected! Disctd. Conctd. Yet MaxEnt says that it is connected with 0.6 probability! (but none has 17 edges and 19 triangles!) MaxEnt has been used extensively: It is applicable to systems of any size E. T. Jaynes, Physical Review 106, 620 (1957); ibid. 108, 171 (1957) Tool to study mesoscale systems! R.V. Chamberlin. Phys. Rev. Lett. 82, 2520 (1999); R.V. Chamberlin. Nature 408, 337 (2000). Nanothermodynamics: R.V. Chamberlin. Science 298, 1172 (2002); R. Balian. From Microphysics to Macrophysics: Methods and Applications of Statistical Physics (Springer) 2007. Many applications: - Image reconstruction: S.F. Gull, G.J. Daniell. Nature 272, 686 (1978) [real-space images from x-ray scattering data] Fluorescence of L-tryptohan: A.K. Livesey, J.C. Brochon. Biophys. J. 52, 693 (1987) - Conformational states of poly-(L-proline) from single molecule Foester energy transfer resonance data: L.P.. Watkins, H. Chang, H. Yang. J. Phys. Chem. A 110, 5191 (2006). - Folding kinetics of dihydrofolate reductase: P.J. Steinbach, R. Jonescu, C.R. Matthews. Biophys. J. 82, 2244 (2002). - CO ligand rebinding to a heme protein: P.J. Steinbach, K. Chu, H. Frauenfelder, et al. Biophys. J. 61, 235 (1991). - Gene regulatory networks: A.M. Walczak, G. Tkacik, W. Bialek. Phys. Rev. E. 81, 041905 (2010). - Infotaxis of moths: M. Vergassola, E. Villermaux, B.I. Shraiman. Nature 446, 406 (2007). - And many many others.... THEOREM: The MaxEnt model is non-degenerate if and only if the density of states function is log-concave . Sz. Horvát, É. Czabarka, & Z. Toroczkai. Phys. Rev. Lett., 114 158701 (2015). For the terrorist network # of two-stars: Another example # of edges: A solution: We still use the same data as in the degenerate model, however, we consider a one-to-one transformation such that the corresponding density of states function is log-concave and thus the corresponding Can still work in the same coordinate system non-degenerate model How to choose model is non-degenerate. but the states are sampled by the with constraints ? The typical reason for why is not log-concave is because its domain is not convex. Any transformation that convexifies the domain is good! . (⟨m|⟩, ⟨mv⟩) model (⟨m|2⟩, ⟨mv⟩) model A data network example: Zachary’s Karate Club (ZKC) Consider Fit: is degenerate! After linearization to obtain a convex domain. Let us try to predict the number of triangles Recall: The distribution of triangles by the same model is also bimodal. Both and appear with very low probability in this model. The linearized (or convexified) model It predicts: Both and appear with high probability in this model. produces a unimodal distribution.