Decay of Correlations and Inference in Graphical Models OF TECHNOLOGY by SEP 2 5 201 Sidhant Misra LIBRARIES B.Tech., Electrical Engineering, Indian Institute of Technology, Kanpur (2008), SM., Electrical Engineering and Computer Science, MIT (2011) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2014 @ Massachusetts Institute of Technology 2014. All rights reserved. Author........ Signature redacted Department of Electrical Engineering and Computer Science August 28, 2014 Signature redacted Certified by .... Signature redacted David Gamarnik Professor Thesis Supervisor Accepted by......... I '6 C) Leslie A. Kolodziejski Chairman, Department Committee on Graduate Theses Decay of Correlations and Inference in Graphical Models by Sidhant Misra Submitted to the Department of Electrical Engineering and Computer Science on August 28, 2014, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract We study the decay of correlations property in graphical models and its implications on efficient algorithms for inference in these models. We consider three specific problems: 1) The List Coloring problem on graphs, 2) The MAX-CUT problem on graphs with random edge deletions, and 3) Low Rank Matrix Completion from an incomplete subset of its entries. For each problem, we analyze the conditions under which either spatial or temporal decay of correlations exists and provide approximate inference algorithms in the presence of these conditions. In the course of our study of 2), we also investigate the problem of existence of a giant component in random multipartite graphs with a given degree sequence. The list coloring problem on a graph g is a generalization of the classical graph coloring problem where each vertex is provided with a list of colors. We prove the Strong Spatial Mixing (SSM) property for this model, for arbitrary bounded degree triangle-free graphs. Our results hold for any a > a* whenever the size of the list of each vertex v is at least aA(v) + P where A(v) is the degree of vertex v and # is a constant that only depends on a. The result is obtained by proving the decay of correlations of marginal probabilities associated with the vertices of the graph measured using a suitably chosen error function. The SSM property allows us to efficiently compute approximate marginal probabilities and approximate log partition function in this model. Finding a cut of maximum size in a graph is a well-known canonical NP-hard problem in algorithmic complexity theory. We consider a version of random weighted MAX-CUT on graphs with bounded degree d, which we refer to as the thinned MAX-CUT problem, where the weights assigned to the edges are i.i.d. Bernoulli random variables with mean p. We show that thinned MAX-CUT undergoes a 3 computational hardness phase transition at p = pc = d 11: the thinned MAX-CUT problem is efficiently solvable when p < pc and is NP-hard when p > pc. We show that the computational hardness is closely related to the presence of large connected components in the underlying graph. We consider the problem of reconstructing a low rank matrix M from a subset of its entries Mo. We describe three algorithms, Information Propagation (IP), a sequential decoding algorithm, and two iterative decentralized algorithms named the Vertex Least Squares (VLS), which is same as Alternating Minimization and Edge Least Squares (ELS), which is a message-passing variation of VLS. We provide sufficient conditions on the structure of the revelation graph for IP to succeed and show that when M has rank r = 0(1), this property is satisfied by Erdos-Renyi graphs with edge probability Q ((,g-) 'r). For VLS, we provide sufficient conditions in the special case of positive rank one matrices. For ELS, we provide simulation results which shows that it performs better than VLS both in terms of convergence speed and sample complexity. Thesis Supervisor: David Gamarnik Title: Professor 4 Acknowledgments First and foremost, I am extremely grateful to my advisor, Professor David Gamarnik, for his guidance and support. His patient and systematic approach were critical in helping me navigate my way through the early stages of research. He was very generous with his time; our discussions often lasted more than two hours, where we delved into the finer technical details. I learnt tremendously from our meetings and they left me with renewed enthusiasm. David has also been an extremely supportive mentor, and has offered a lot of encouragement and guidance in my career decisions. I would like to thank my committee members Professor Patrick Jaillet and Professor Devavrat Shah for offering their time and support. Patrick brought fruitful research opportunities my way, and also gave me the wonderful opportunity to be a part of the Network Science course as a TA with him. Devavrat was kind enough to mentor me in my early days at MIT and introduced me to David. His continued encouragement and his valuable research insights have been incredibly helpful. My time at MIT would not have been as enjoyable without the many friends and colleagues I met at LIDS. I would like to thank Ying Liu for all his help, the fun chats over lunch, and the topology and algebra study sessions. I would also like to thank my officemates James Saunderson and Matt Johnson for their enjoyable company and many stimulating white board discussions. This thesis would not have been possible without the support of my parents. Their unconditional love and encouragement has helped me overcome the many challenges I have encountered throughout the course of my PhD, and for that I am eternally grateful. Finally, I am very grateful to Aliaa for being my comfort and home, and for making my journey at MIT a wonderful experience. 5 6 Contents 1 2 Introduction 11 1.1 Graphical Models and Inference ......................... 11 1.2 Decay of Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Organization of the thesis and contributions . . . . . . . . . . . . . . 15 Strong Spatial Mixing for List Coloring of Graphs 27 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Definitions and Main Result ....................... 30 2.3 Preliminary technical results ................... 2.4 Proof of Theorem 2.3 ............................... 2.5 Conclusion ................................ . ... 32 38 . 49 3 Giant Component in Random Multipartite Graphs with Given Degree Sequences 51 3.1 Introduction ................................ 51 3.2 Definitions and preliminary concepts ...... 3.3 Statements of the main results .......................... 59 3.4 Configuration Model ................................ 63 3.5 Exploration Process ............................ 66 7 .............. 55 Supercritical Case . . . . . . . . . . . . 3.7 Size of the Giant Component 3.8 Subcritical Case . . . . . . . . . . . . . 94 3.9 Future Work . . . . . . . . . . . . . . . 97 70 . . . . ... 87 . . . 3.6 4 MAX-CUT on Bounded Degree Graphs wi th Random Edge Deletions 4.1 99 . . . . . . . . . . . . . . . Introduction ................. 4.2 Main Results ................... 99 ................... 102 4.3 Proof of Theorem 4.1 ............... ................ 103 4.4 Proof of Theorem 4.2 ............... S. . . .......... 105 Construction of gadget for reduction 4.4.2 Properties of 7, 4.4.3 Properties of MAX-CUT on 74, . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . 4.5 4.4.1 . . . . . . . . . . . .. ................... 107 . . . . . . . . . . . . . . . 113 . . . . . . . . . . . . . . . 126 127 5 Algorithms for Low Rank Matrix Completion 5.1 . 105 127 Introduction .......................... 5.1.1 Formulation . . . . . . . . . . . . . . . . ..... 5.1.2 Algorithms 127 129 5.2 Information Propagation Algorithm . . . . . . . . . . . . 130 5.3 Vertex Least Squares and Edge Least Squares Algorithms 135 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.5 Discussion and Future Directions . . . . . . . . . . . . . 144 . . . . . . . . . . . . . . . . . . . . . . . . . . 8 List of Figures 4-1 Illustration of the bipartite graphs J, and J,, in W associated with an edge (u, v) E C . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . 107 5-1 RMS vs number of iteration (normalized) for VLS and ELS 5-2 ELS: Failure fraction vs c for r = 2 with planted 3-regular graph (n = 100) 5-3 . . . . . 143 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 ELS: Failure fraction vs c for r = 3 with planted 4-regular graph (n = 100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5-4 VLS: Failure fraction vs c for r = 2 with planted 3-regular graph (n = 100) . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 146 9 10 Chapter 1 Introduction 1.1 Graphical Models and Inference Graphical models represent joint probability distributions with the help of directed or undirected graphs. These models aim at capturing the structure present in the joint distribution, such as conditional independencies, via the structure of the representing graph. By exploiting the structure it is often possible to produce efficient algorithms for performing inference tasks on these models. This has led to graphical models being useful in several applications, e.g., image processing, speech and language processing, error correcting codes, etc. In this thesis we will be primarily concerned with undirected graphical models, where the joint probability distribution of a set of random variables X 1 , ... represented on an undirected graph g , X, is = (V, E). The random variables X are asso- ciated with the vertices in V. Undirected graphical models represent independencies via the so-called graph separation property. In particular, let A, B, C C [n] be disjoint vertices in V. Let XA, XB and Xc denote sub-vectors of X corresponding to 11 indices in A, B and C respectively. Then XA and XB are independent conditioned on XC whenever there is no path from A to B that does not pass through C, i.e., C separatesA and B. The Hammersely-Clifford theorem [24] says that the joint distribution on g satisfies P(x) > 0, then it can be factorized as P(x) = fl0k(xc), (1.1) cEC where C is the set of all maximal cliques of g and Z = E, Ec 0,5(xe) is the normalizing constant, known as the partitionfunction of the distribution. Broadly speaking, inference in graphical models refers to the following two problems: a) Computation of marginalor conditionalmarginalprobabilitiesand b) Computation of the Maximum Aposteriori Probability (MAP) assignment. The first problem involves an integration or counting operation of the form P(x 2 ) (1.2) =c(xc). eEC j:Ai The second problem of computing the MAP assignment is an optimization problem of the form XMAP = arg maxflOc(x). (1.3) cEC Both of these problems are typically computationally difficult. In fact, in the general setting, computing MAP assignments is known to be an NP-complete problem and computing marginals is known to be a #P-complete problem. In such cases, it is often desirable to produce approximate solutions in polynomial time by constructing 12 Polynomial Time Approximation Schemes (PTAS). More precisely, if we denote the quantity of interest, i.e., marginal probabilities or MAP assighments associated with each vertex of a graph on n vertices to be (v), then for a fixed e > 0, a PTAS computes an approximate solution O(v) which is at most a multiplicative factor of (1 E) away from 4(v) in time polynomial in n. If the algorithm also takes time polynomial in 1/c, then it is called a Fully Polynomial Time Approximation Scheme (FPTAS). Approximation algorithms can be either randomized (in which case abbreviations PRAS and FPRAS are commonly used) or deterministic in nature. For some graphical models, it may not be possible to find a PTAS, unless P = NP. There is a substantial body of literature [30], [15], [1]. exploring the existence of PTAS for NP-hard problems. 1.2 Decay of Correlations The decay of correlations property is a property associated with either the graphical model itself (spatial correlation decay) or an iterative algorithm A that computes the quantity of interest q(v) (temporal or computation tree correlation decay). It describes a long range independence phenomenon, where the dependence on boundary/initial conditions fades as one moves away from the boundary/initial time. (a) Spatial correlation decay: The impact on 4(v) resulting from conditioning on values of vertices that are at a distance greater than d from v decays as d increases. More precisely, for any vertex v let B(v, d) denote the set of all vertices of G at distance at most d from v. Let x,, and X,2 be two assignments 13 on X(v,r) . Then S-c(d) <) <1+E(d) -t(V; Xa,2 )- (1.4) where E(d) is some decaying function of d that dictates the rate of correlation decay. (b) Temporal correlation decay: The impact of initial conditions on the quantity $(v) computed by d iterations of the algorithm A decays as d increases. Similar to spatial correlation decay, if we denote by xz and 4) two initial conditions provided to the algorithm A, then temporal decay of correlations refers to the statement for 0(v) analogous to (1.4), i.e., 1-E(d) o(v; V (0)) < 1+ e(d). (1.5) The decay of correlations property has its origins in statistical physics, specifically in the study of interacting spin systems. In this context, the spatial decay of correlations property is referred to as spatialmixing where it is used to describe the long range decay of correlations between spins with respect to the Gibbs measure associated with the system. Spatial mixing has its implications on the uniqueness of the so called infinite volume Gibbs measure [13], [21], [49]. The onset of spatial mixing is associated with a phase transition from multiple Gibbs measures to a unique Gibbs measure. On the other hand, temporal decay of correlations is referred to as temporal mixing in statistical physics, where it is most often used to describe the mixing time of the so called heat bath Glauber Dynamics. The Glauber Dynamics is a Markov 14 chain on the state space of the spin system whose steady state distribution is the Gibbs measure. Temporal mixing attracted a lot of attention in theoretical computer science, because it can be used to produce approximation schemes based on Markov Chain Monte Carlo (MCMC) methods. Spatial mixing as well has been used to produce approximation schemes which are deterministic in nature and are based on directly exploiting the spatial long range independence. In the context of theoretical computer science, this provides insight into the hardness of algorithmic approxima, tions in several problems. There is a general conjecture that the onset of spatial mixing (phase transition to the unique Gibbs measure regime) coincides with the phase transition in the hardness of approximation. Specifically, the approximation problem is tractable (solvable in polynomial time) if and only if the corresponding Gibbs measure exhibits the decay of correlations property. 1.3 Organization of the thesis and contributions In this section, we describe the problems we investigate in the thesis, along with a brief survey of the related literature. We state our main results and describe briefly the technical tools we used to establish them. We also state a few relevant open problems. STRONG SPATIAL MIXING FOR THE LIST COLORING PROBLEM, CHAPTER 2 Formulation The list coloring problem is a generalization of the classical graph coloring problem. In the list coloring problem on a graph g = (V, C), with maximum degree A, each vertex v is provided with a list of colors L(v) it can choose from, where L(v) C 15 {1, 2, ... q}, where {1, 2,.. , q} is the superset of all colors. In a valid coloring, each vertex is assigned a color from its own list such that no two neighbors in g are assigned the same color. It is well known that in this general setting, deciding whether a valid coloring exists is NP-hard and counting the number of proper colorings is #P-hard. One can associate a natural graphical model with the list coloring problem which represent the uniform distribution on the space of all valid colorings. The partition function of this graphical model is the total number of valid colorings. As mentioned earlier, computing the partition function, and hence performing the inference task of computing the marginal probabilities exactly is computationally intractable. There is a large body of literature describing randomized approximation schemes to compute marginal probabilities and the number of valid colorings based on a Markov Chain Monte Carlo (MCMC) method. Most results are based on establishing fast mixing of the underlying Markov chain known as the Glauber Dynamics [31], [261, [51], [42]. Deterministic approximation schemes based on decay of correlations in the computation tree have also been studied in the literature [18]. These results are established with an assumption on the relation between the number of colors available and the degree, i.e., ILI > aA + #. Over time, considerable research effort has been directed towards relaxing the assumption by decreasing the value of a required in the aforementioned condition. In fact, it is known for the special case of A-regular trees that spatial decay of correlations exists as early as a = 1 and 8 = 2, which also marks the boundary of phase transition into the unique Gibbs measure regime. It is conjectured that a = 1 and 6 = 2 is sufficient for existence of spatial decay of correlations and existence of approximation algorithms in the list coloring problem. Contributions In this thesis, we establish the existence of a strong version of spatial decay of cor16 relations called Strong Spatial Mixing (SSM) for the list coloring problem whenever a > a* ; 1.76. Ours is the most general conditions under which strong spatial mix- ing has been shown to exist in the literature, and is a step towards establishing SSM for a = 1 as conjectured. As a corolary of our result, we also establish the uniqueness of Gibbs measure for the list coloring problem and the existence of approximation schemes for computing marginal probabilities in this regime. Our proof technique has two main components. The first component is a recursion we derive that relates the marginal probabilities at a vertex to the marginal probabilities of its neighbors. This recursion is a variation of the one derived in [18] where our recursion deals with the ratio of marginals rather than directly with the marginals themselves. The second component is the construction of an error function which is chosen suitably such that it interacts well with our recursion. To prove our result, we then show that the distance between two marginals induced by two different boundary conditions satisfies a certain contractionproperty with respect to the error function. Open Problems There are several problems still to be addressed. One is to tighten our result and establish SSM for a = 1 and 6 = 2 as conjectured. Second, the SSM result we establish, allows the construction of a PTAS for computing marginal probabilities, and computing the exponent of the partition function up to a constant factor. However, it does not directly lead to an FTPAS for the partition function. It is still unresolved whether SSM by itself is sufficient for constructing FPTAS. 17 GIANT COMPONENT IN RANDOM MULTIPARTITE GRAPHS, CHAPTER 3 Formulation The problem of the existence of a giant component in random graphs was first studied by Erdos and R6nyi. In their classical paper [16], they considered a random graph model on n and m edges where each such possible graph is equally likely. They showed that if m/n > 1 + c, then with high probability as n -+ oo, there exists a component of size linear in n in the random graph and that the size of this component as a fraction of n converges to a fixed constant. The degree distribution of the classical Erd6s-R6nyi random graph has asymptotically Poisson degree distribution. However in many applications the degree distribution associated with an underlying graph does not satisfy this property. For example, many so-called "scalefree" networks exhibit power law distribution of degrees. This motivated the study of random graphs generated according to a given degree sequence. The giant component problem on a random graph generated according to a given degree sequence was considered by Molloy and Reed [43]. They showed that if a random graph has asymptotic degree sequence given by {pj}L 1 , then with high probability it has a giant component whenever the degree sequence satisfies 'jj(j - 2)p > 0, along with some additional regularity conditions including the bounded second moment condition. A key concept in their proof was the construction of the so called exploration process which reveals the random component of a vertex sequentially. They showed that whenever the degree sequence satisfies the aforementioned condition, the exploration process has a strictly positive initial drift, which when combined with the regularity conditions is sufficient to prove that at least one of the components must be of linear size. They also show that there is no giant component whenever 18 {IM!, < 0 and in [44] also characterize the size of the giant component. Since then, there has been several papers strengthening the results of Molloy and Reed and several beautiful analysis tools have been invented along the way. We defer a more detailed literature review to Chapter 3. We mention here a relatively recent paper by Bollobas and Riordan [81, where they use a branching process based analysis that offers several advantages, including tighter probability bounds and relaxing the finite second moment assumption. In this thesis, we study random multipartite graphs with p parts with given degree distributions. Here p is a fixed positive integer. Each vertex is associated with a degree vector d, where each of its component di, i E [p] dictates the number of neighbors of the vertex in the corresponding part i of the graph. Several real world networks naturally demonstrate a multipartite nature. The author-paper network, actor-movie network, the network of company ownership, the financial contagion model, heterogenous social networks, etc. are all multipartite [46], [9], [27 in nature. Examples of biological networks which exhibit multipartite structure include drug target networks, protein-protein interaction networks and human disease networks [22], [56], [45. In many cases evidence suggests that explicitly modeling the multipartite structure results in more accurate models and predictions. We initiated our study of the giant component problem in multipartite graphs because it was needed as a part of the proof of our hardness result in Chapter 4 for the MAX-CUT with edge deletions problem. However, the problem of giant component is also of independent interest and we devote a whole chapter to it. Contributions We provide exact conditions on the degree distribution for the existence of a giant component in random multipartite graphs with high probability. In the case where 19 a giant component exists, we also provide a characterization of its size in terms of parameters of the degree distribution. Our proofs involve a blend of techniques from Molloy and Reed [43] and Bollobas and Riordan [8] along with results from the theory of multidimensional branching processes. We show that whenever the matrix of means M of the so called edge-biased degree distribution satisfies the strong connectivity property, then the existence of a giant component is governed by the Perron-Frobenius eigenvalue Am of M. Whenever AM > 1, we show that a certain Lyapunov function associated with the exploration process of Molloy and Reed has a strictly positive drift. The Lyapunov function we use is one often used in the analysis of multi type branching processes, which is a weighted f, norm Lyapunov function constructed by using the Perron-Frobenius eigenvector of M. To establish results regarding the size of the giant component, we use a coupling argument relating from Bollobas and Riordan [8], relating the exploration process to the multi type branching process associated with the edge-biased degree distribution. Open Problems Our result only addresses the super-critical and sub-critical case, but leaves the critical case unresolved. This has been studied in detail for random graphs with degree distributions for the uni-partite case [34]. It may be possible to extend these results to the multipartite case as well. 20 MAX-CUT ON GRAPHS WITH RANDOM EDGE DELETIONS, CHAPTER 4 Formulation The problem of finding a cut of maximum size in an arbitrary graph g = (V, C) is a well known NP-hard problem in algorithmic complexity theory. In fact, even when restricted to the set of 3-regular graphs, MAX-CUT is NP-hard with a constant factor approximability gap [5], i.e., it is NP-hard to find a cut whose size is at least (1- Ec,.itia) times the size of the maximum cut, where c ,.mg = 0.997. The weighted MAX-CUT is a generalization of the MAX-CUT problem, where each edge e E E is associated with a weight we, and one is required to find a cut such that the sum of the weights of the edges in the cut is maximum. We study a randomized version of the weighted MAX-CUT problem, where each weight W is a Ber(p) random variable, for some 0 5 p < 1, and the weights associated with distinct edges are independent. Since the weights are binary, weighted MAX-CUT on the randomly weighted graph is equivalent to MAX-CUT on a thinned random graph where the edges associated with zero weights have been deleted. We call this problem the thinned MAX-CUT. The variable p controls the amount of thinning. It is particularly simple to analyze thinned MAX-CUT, when p takes one of the extreme values, i.e., p = 1 and p = 0. When p = 1, all edges are retained and the thinned graph is same as the original graph and finding the maximum cut remains computationally hard. On the other hand, when p = 0, the thinned graph has no edges, and the MAX-CUT problem is trivial. This leads to a natural question of whether there is a hardness phase transition at some value of 0 < p = pc < 1. Our study of the thinned MAX-CUT problem is motivated from the point of view of establishing a correspondence between phase transition in hardness of computation 21 and decay of correlations. Our formulation of the thinned MAX-CUT problem was inspired by a similar random weighted maximum independent set problem studied by Gamarnik et. al in [19]. Contributions We identify a threshold for hardness phase transition in the thinned MAX-CUT problem. We show that on the set of all graphs with degree at most d, the phase transition occurs at p, = d_1. We show that this phase transition coincides with a phase transition in decay in correlations resulting from the connectivity properties of the random thinned graph. This result is a step towards showing the equivalence of spatial mixing and hardness of computation. For p < pc we show that the random graph resulting from edge deletions undergoes percolation, i.e., it disintegrates into disjoint connected components of size O(log n). The existence of a polynomial time algorithm to compute MAX-CUT then follows easily. For p > pc, we show NP-hardness by constructing a reduction from approximate MAX-CUT on 3-regular graphs. Our reduction proof uses the ran- dom bipartite graph based gadget W similar to [501, where it was used to establish hardness of computation of the partition function of the hardcore model. Given a 3-regular graph g, the gadget W is d-regular and is constructed by first replacing each vertex v of g by a random bipartite graph J, of size 2n' consisting of two parts R, and S, and then adding connector edges between each pair of these bipartite graphs whenever there is an edge between the corresponding vertices in g. The key result in our reduction argument is that the gadget W satisfies a certain polarization property which carries information about the MAX-CUT on the 3-regular graph g used to construct it. Namely, we show that any maximal cut of the thinned graph -, obtained from W must fully polarize each bipartite graph and include all its edges 22 in the cut, i.e., either assign the value 1 to almost all vertices in R, and 0 to almost all vertices in S, or vice versa. Then, we show that the cut Cg of g obtained by assigned binary values to its vertices v based on the polarity of J, must be at least (1 - ec,.tes)-optimal. To establish this polarization property we draw heavily on the results and proof techniques from Chapter 3 about the giant component on random multipartite graphs. Open Problems While our result finds the exact hardness phase transition threshold for the thinned MAX-CUT problem, the same remains unresolved for the thinned maximum independent set problem in [19] from which our problem was originally inspired. There, only half of the result has been established, as in the region where decay of correlations and an approximate algorithm exists has been identified. It remains open whether in the absence of decay of correlations, computing the maximum independent set is computationally hard. ALGORITHMS FOR LOW RANK MATRIX COMPLETION, CHAPTER 5 Formulation Matrix completion refers to the problem of recovering a low rank matrix from a subset of its entries. This problem arises in a vast number of applications that involve collaborativefiltering, where one attempts to predict the unknown preferences of a certain user based on collective known preferences or a large number of users. It attracted a lot of attention due to the famous Netfliz prize, which involved reconstructing the unknown movie preferences of Netflix users. 23 In matrix completion, there is an underlying low rank matrix M E R" of rank r < n , i.e., M = a)' where a, P E Rn"x. The value of the entires of M on a subset a indices S C [n] x [n] is revealed. Denoting by Mg the subset of revealed entries of M, the two major questions of matrix completion are: (a) Given Mo, is it possible to reconstruct M? (b) Can the reconstruction in (a) be done efficiently? Without any further assumptions, matrix completion is an ill posed problem with multiple solutions and is in general NP-hard [41]. However under certain additional conditions, the problem has been shown to be tractable. The most common assumption adopted in the literature is that the matrix M is "incoherent" and the subset E is chosen uniformly at random. The incoherence condition was introduced in [10], [11}, ivhere it was shown that convex relaxation resulting in nuclear norm minimization succeeds with further assumptions on the size of E. In [35] and [361, the authors use an algorithm consisting of a truncated singular value projection followed by a local minimization subroutine on the Grassmann manifold and show that it succeeds when JE == (nrlog n). In [28], it was shown that the local minimization in [35] can be successfully replaced by Alternating Minimization. The use of Belief Propagation (BP) for matrix factorization has also been studied by physicists in [33] heuristically, where they perform a mean field analysis on the performance of BP. Matrix completion can be recast as a bi-convex least squares optimization problem over a graph g = (V, 6) whose edges represent the revealed entries of M. On this graph, Alternating Minimization can be interpreted as a local algorithm, where the optimization variables are associated with the vertices V. In each iteration, the variable on a vertex is updates using the value of its neighbors. Since in [28], Alternating Minimization is preceded by a Singular Value Projection (SVP), we call it 24 a warm-start Alternating Minimization. In this thesis, we investigate, Alternating Minimization, which we call Vertex Least Squares (VLS), when it is used with a cold start. We also propose two new matrix completion algorithms. Contributions We analyze VLS with a cold start and prove that in the special case of positive rank one matrices, it can successfully reconstruct M from Mg. More specifically, we show that if M = af' with a,# > 0 and the graph G is connected, has bounded degree and diameter of size O(log n), then VLS reconstructs M up to a Root Mean Square (RMS) error of e in time polynomial in n. We propose a new matrix completion algorithm called Edge Least Squares (ELS), which is a message passing variation of VLS. We show through simulations that ELS performs significantly better than VLS, both in terms of sample complexity and time till convergence. The superior cold start performance of ELS suggests that ELS with warm start can be perhaps very successful, and would be better than VLS with warm start. We also provide a simple direct decoding algorithm, which we call Information Propagation (IP). We prove that under certain strong connectivity properties of g, Information Propagation can recover M in linear time. We show that when r = 0(1), the required strong connectivity property satisfied by a bipartite Erdos-Renyi graph g(n,p) with p = Q ((In)1/r) Open Problems It remains an open problem to provide a theoretical analysis to prove the convergence of ELS. The full power of cold start VLS is also unresolved and it may be possible to extend our proof for the rank one case to the higher rank case. Additionally, it may 25 be possible to provide theoretical analysis to demonstrate the superior performance of ELS over VLS. 26 Chapter 2 Strong Spatial Mixing for List Coloring of Graphs 2.1 Introduction In this chapter we study the problem of list colorings of a graph. We explore the strong spatial mixing property of list colorings on triangle-free graphs which pertains to exponential decay of boundary effects when the list coloring is generated uniformly at random from the space of valid list-colorings. This means fixing the color of vertices far away from a vertex v has negligible impact (exponentially decaying correlations) on the probability of v being colored with a certain color in its list. Strong spatial mixing is an instance of the spatial decay of correlations property. A related but weaker notion is weak spatial mixing. Strong spatial mixing is stronger than weak spatial mixing because it requires exponential decay of boundary effects even when some of the vertices near v are conditioned to have fixed colors. Because of this added condition, strong spatial mixing is particularly useful in computing 27 conditional marginal probabilities. Jonasson in [32] established weak spatial mixing on Kelly (regular) trees of any degree A whenever the number of colors q is greater than or equal to A +1. How- ever the weakest conditions for which strong spatial mixing on Kelly trees has been established thus far is by Ge and Stefankovic [20] who proved strong spatial mixing when q > a*A + 1, where a* = 1.763.. is the unique solution to xe 11 = 1. For lat- tice graphs (or more generally triangle-free amenable graphs) strong spatial mixing for coloring was established by Goldberg, Martin and Paterson in [23] for the case q > a*A - 3 for a fixed constant P. In fact their approach can be extended to the case of list coloring problem, but the graph amenability restriction still applies. In this chapter we generalize these results under only mildly stronger condition. We establish the strong spatial mixing of list colorings on arbitrary bounded degree triangle free graphs whenever the size of the list of each vertex v is at least aA(v) +#, where A(v) is the degree of v, a satisfies a > a* and / is a constant that only depends on a. The spatial mixing property is closely related to uniqueness of the infinite volume Gibbs measure on the spin system defined by the list coloring problem. In fact weak spatial mixing is a sufficient condition for the Gibbs measure to be unique. In its turn, strong spatial mixing is closely related to the problem of approximately counting the number of valid colorings of a graph, namely the partition function of the Gibbs measure. In particular, for amenable graphs strong spatial mixing implies rapid mixing of Glauber dynamics which leads to efficient randomized approximation algorithms for computing the partition functions, e.g. in [31}, [26], [51], [42] etc. The decay of correlations property similar to strong spatial mixing has also been shown to lead to deterministic approximation algorithms for computing the partition functions. This technique was introduced by Bandyopadhyay and Gamarnik [3] 28 and Weitz [53] and has been subsequently employed by Gamarnik and Katz [18} for the list coloring problem. Since decay of correlations implies the uniqueness of Gibbs measure on regular trees and regular trees represent maximal growth of the size of the neighborhood for a given degree, it is a general conjecture that efficient approximability of the counting problem coincides with the uniqueness of Gibbs measure on regular trees. More precisely the conjecture states that there exists a polynomial time approximation algorithm for counting colorings of any arbitrary graph, whenever q > A + 2. We are still very far from proving this conjecture or even establishing strong spatial mixing under this condition. The formulation in this chapter is similar to [18] . It was shown there that the logarithm of the ratio of the marginal probabilities at a given node induced by the two different boundary conditions contract in fe, norm as distance between the node and the boundary becomes large, whenever IL(v)I > aA(v)+#8, and a > a** ; 2.78.. and 8 is a constant that only depends on a. In this chapter we measure the distance with respect to a conveniently chosen error function which allows us to tighten the contraction argument-and relax the required condition to a > a* ; 1.76... This also means that the Gibbs measure on such graphs is unique. Unlike [181 the result presented in this chapter unfortunately does not immediately lead to an algorithm for computing the partition function. It does however allow us to compute the marginal probabilities approximately in polynomial time. It also allows us to estimate the exponent of the number of valid colorings, namely apprximate the log-partition function in polynomial time. The rest of the chapter is organized as follows. In Section 3.2 we introduce the notation, basic definitions and preliminary concepts. Also in this section we provide the statement of our main result and discuss in detail its implications and connections to previous results. In Section 2.3 we establish some preliminary technical results. 29 In Section 2.4 we prove our main result of this chapter. We conclude in Section 2.5 with some final remarks. 2.2 Definitions and Main Result We denote by g = (V, E) an infinite graph with the set of vertices and edges given by V and E. For a fixed vertex v E V we denote by A(v) the degree of v and by A the maximum degree of the graph, i.e. A = maxEv A(v) < oo. The distance between two vertices v, and v2 in V is denoted by d(vi, v 2 ) which might be infinite if v, and v 2 belong to two different connected components of g. For two finite subsets of vertices 1 I c V and '2 C V, the distance between them is defined as d(T1, min{d(vi, v 2 ) : 1 E I1, v2 E F2}. We assume {1, 2, ... , q} 2) = to be the set of all colors. Each vertex v E V is associated with a finite list of colors L(v) c {1, 2,..., q} and C = (L(v) : v E V) is the sequence of lists. The total variational distance between two discrete measures p, and L2 on a finite or countable sample space Q is given by || - A2II and is defined as |tiJ - 121 =ZWEn 'I,.pi(W) - p2(U)|. A valid list coloring C of g is an assignment to each vertex v E V, a color c(v) E L(v) such that no two adjacent vertices have the same color. A measure y on the set of all valid colorings of an infinite graph g is called an infinite volume Gibbs measure with the uniform specification if, for any finite region IF ; g, the distribution induced on T by p conditioned on any coloring C of the vertices V\T is the uniform conditional distribution on the set of all valid colorings of T. We denote this distribution by M4. For any finite subset T C g, let aT denote the boundary of T, i.e. the set of vertices which are adjacent to some vertex in IF but are not a part of T. 30 Definition 2.1. The infinite volume Gibbs measure 1A on ! is said to have strong spatial mixing (with exponentially decaying correlations) if there exists positive constants A and 0 such that for any finite region T C 9, any two partial colorings C 1 , C2 (i.e. vertices to which no color has been assigned are also allowed) of V\* which differ only on a subset W C 8, I and any subset A C 417 - A2I < AAje~ (2.1) Here |yIli - pIC2A denotes the total variational distance between the two distributions y and pf2 restricted to the set A. We have used the definition of strong spatial mixing from Weitz's PhD thesis [52]. As mentioned in [521, this definition of strong spatial mixing is appropriate for general graphs. A similar definition is used in [23], where the set W of disagreement was restricted to be a single vertex. This definition is more relevant in the context of lattice graphs (or more generally amenable graphs), where the neighborhood of a vertex grows slowly with distance from the vertex. In that context, the definition involving one vertex disagreement and the one we have adopted are essentially the same. Let a* = 1.76.. be the unique root of the equation 1 e- = 1. For our purposes we will assume that the graph list pair (9, C) satisfies the following. Assumption 2.1. The graph g is triangle-free. The size of the list of each vertex v 31 satisfies IL(v)I > aA(v) + 0. 4 for some constant a > a* and 6 = /8(a); (1 - (2.2) is such that 1//3)ae-(1+1/)>1 Using the above assumption we now state our main result. Theorem 2.1. Suppose Assumption 3.1 holds for the graph list pair (G, ). Then the Gibbs measure with the uniform specification on (G, ) satisfies strong spatial mixing with exponentially decaying correlations. We establish some useful technical results in the next section before presenting the details of the proof in Section (2.4). 2.3 Preliminary technical results The following theorem establishes strong spatial mixing for the special case when A consists of a single vertex. Theorem 2.2. Let q, A, a and 3 be given. There exists positive constants B and y depending only on the preceding parameters such that the following holds for any graph list pair (G, ) satisfying Assumption 3.1. Given any finite region T C g, any two colorings C 1 , C2 of V\T which differ only on a subset W C OT, and any vertex 32 v E T and color j E L(v), we have (1-) <P(c(v) P(c(V) = - jCi) I < < (1 + E) (2.3) ~P(C(V) = jIC2) where e = Be-d(W) We will now show that Theorem 2.1 follows from Theorem 2.2. Proof of Theorem 2.1. To prove this, we use induction on the size of the subset A. The base case with JAI = 1 is equivalent to the statement of Theorem 2.2. Assume that the statement of Theorem 2.1 is true whenever JAI t for some integer t > 1. We will use this to prove that the statement holds when JAI = t + 1. Let the vertices in A be v,v 2 ,..., vt+1. Let vk = (v 1 ,. .. ,vk) and Jk = (ji,..., jk) where j E L(vi), 1 < i < k. Also let c(vk) = (c(vi), c(v2 ),..., c(vk)) denote the coloring of the vertices v 1,V 2 ,... Vk. ) P(c(vt+1 ) = Jt+1 C1 ) =P(c(vt) = JtlC)P(c(vt+i = it+i)Ic(vt) = Jt, C1 <(1 + E)P(C(Vt) = JtIC1)P(c(Vt+ = jt+l)fc(vt) = Jt, C2 ). The inequality in the last statement follows from Theorem 2.2. This gives P(c(vt+i) = Jt+|C1) - P(c(vt+i) = 4 ) C+1C 2 = Jt, C2 ) (1 + E)P(c(vt) = JtIC)P(c(vt+1 = jt+1)IC(vt) ) -P(c(vt) = JtIC2)P(C(Vt+l = jt+l)fc(vt) = Jt, C 2 ) =EP(c(vt) = JtIC)P(c(Vt+l = jt+l)c(vt) = Jt, C 2 +P(c(Vt+l = jt+1)jc(vt) = Jt, C2){P(c(vt) 33 = JtICI) - P(c(vt) = JtIC2)}- Similarly, - P(c(vt+i) = Jt+1 jC2 ) P(c(vt+i) = Jt+I jC 1) ) >- EP(c(vt) = JtIC1)P(c(Vt+1 = jt+I)Ic(vt) = Jt, C2 +P(c(Vt+1 = it+I)Ic(vt) = Jt, C2 ){P(c(vt) = JtIC1) - P(C(vt) = JtIC 2 )}Combining the above, we get IP(c(vt+I) = Jt+1 C1) - P(c(vt+i) = Jt+11C2 )l ) eP(C(V) = JttC1)P(c(vt+i = jt+1)jc(vt) = Jt, C2 +P(c(vt+i = jt+1 ) c(v) = Jt, C2) I{P(c(vt) = JtICI) - P(c(vt) = JtIC2)}I- We can now bound the total variational distance lIpIl - p211A as follows. P(c(vt+i) = Jt+1IC) - P(c(vt+i) = Jt+1 IC2 )1 jiEL(vi), 1 itt+1 J, C 2 ) EP(c(vt) = JtIC1)P(c(Vt+l = jit+) Ic(vt) jiEL(v,), 1 i:t+1 P(c(t+ =jt+1)Ic(vt) = J, C2) + jiEL(vj), 1<iit+1 x I{P(c(vt) = JtICi) +whret n f f|m et i - P(c(vt) = JtIC2)}| o hs (t +cp)e where the last statement follows from the induction hypothesis. This completes the 0 induction argument. So, in order to establish Theorem 2.1, it is enough to show that Theorem 2.2 is 34 true. We claim that Theorem 2.2 follows from the theorem below which establishes weak spatial mixing whenever Assumption 3.1 holds. In other words under Assumption 3.1, strong spatial mixing of list colorings for marginals of a single vertex holds whenever weak spatial mixing holds. In fact g need not be triangle-free for this implication to be true as will be clear from the proof below. Theorem 2.3. Let q, A, a and P be given. There exists positive constants B and -y depending only on the preceding parameters such that the following holds for any graph list pair (9, L) satisfying Assumption 3.1. Given any finite region T C g, any two colorings C 1 , C2 of V\'I, we have (1 where c = Be - P(c(v) =jjCi) P(c(v) = jC) < (1+ e) P(C(V) = jjC2) (2.4) ,aO) We first show how Theorem 2.2 follows from Theorem 2.3. Proof of Theorem 2.2. Consider two colorings C1 , C2 of the boundary c9i of I which differ only on a subset W C ft as in the statement of Theorem 2.2. Let d = d(v, W). We first construct a new graph list pair (a', C') from (g, L). Here g' is obtained from G by deleting all vertices in OT which are at a distance less than d from v. Notice that for all such vertices C1 and C2 agree. Whenever a vertex u is deleted from g, remove from the lists of the neighbors of u the color c(u) which is the color of u under both C1 and C 2 . This defines the new list L'. In this process, whenever a vertex u loses a color in its list it also loses one of its edges. Also for a > a* > 1, we have IL(v)| -1 > a(A(v) - 1)+,8 whenever IL(v)| > aA(v) +fl. Therefore, the new graph list pair (', V') also satisfies Assumption 3.1. Define the region ' C !' as the ball 35 of radius (d -1) centered at v. Let Di and D2 be two colorings of (V)c which agree with C1 and C 2 respectively. From the way in which g', ' is constructed we have Pg,4 C(c(v) = jjCj) = Pg,,,C(c(v) = j|D), for i = 1, 2. (2.5) where Pg, (E) denotes the probability of the event E in the graph list pair (G). If V' is the set of all vertices of g', then D, and D2 assign colors only to vertices in V'\v. So we can apply Theorem 2.3 for the region V' and the proof is complete. L So it is sufficient to prove Theorem 2.3 which we defer till section (2.4). We use the rest of this section to discuss some implications of our result and connections between our result and previous established results for strong spatial mixing for coloring of graphs. The statement in Theorem 2.3 is what is referred to as weak spatial mixing [521. In general weak spatial mixing is a weaker condition and does not imply strong spatial mixing. This is indeed the case when we consider the coloring problem of a graph g by q colors, i.e. the case when the lists L(v) are the same for all v E V. However, interestingly, as the above argument shows, for the case of list coloring strong spatial mixing follows from weak spatial mixing when the graph list pair satisfies Assumption 3.1. We observed that the strong spatial mixing result for amenable graphs in [23] also extends to the case of list colorings. Indeed the proof technique only requires a local condition similar to that in Assumption 3.1 that we have adopted as opposed to a global condition like q ;> aA + P. Also in [23], the factor JAI in the definition (2.1) was shown to be not necessary which makes their statement stronger. We show that this stronger statement is also implied by our result. In particular, assuming Theorem 2.2 is true, we prove the following corollary. 36 Corollary 2.1. Let q, A, a and ft be given. There exists positive constants B and -y depending only on the preceding parameters such that the following holds for any graph list pair (9,,C) satisfying Assumption 3.1. Given any finite region T C !, any two colorings C1 , C 2 of the boundary ni of T which differ at only one point f E ft, and any subset A C xF, | JAI- Proof. Let the color of f ; Be --d(Af), pIC211A (2.6) be ji in C1 and j2 in C2 . Let C(A) be the set of all possible colorings of the set A. - p421A = IP(u(f) = ji) - P(alc(f) = j2)1 ~ oEC(A) P(c(f)= jila) P(C(f) = 2 P2 ji) EC(h P(c(f) = j210) For any j E L(f), using Theorem 2.2 we have for E = Be-"Yd(Af), P(c(f) = jla) P(c(f) = ijo-) E,'EC(A) P(c(f) P(c(f) = j) ZO!'EC(A) = ijl')P(') P(c(f) = jjAO)P(u') YX'Ec(A) P(c(f) = < EOEC(A) P(c(f) = jlo')P(c') il')(1 + e)P(o') EEC(A) P(Cf) = jj)P() = 1+ E. 37 j Similarly we can also prove for any j E L(f) P(c(f) =lo 1-) P(c(f) = j) Therefore, pJC - p 21A (+ < f) - 1 )[P(u-) = 2f. orEC(A) The notion of strong spatial mixing we have adopted also implies the uniqueness of Gibbs measure on the spin system described by the list coloring problem. In fact Weak Spatial Miung described in Theorem 2.3 is sufficient for the uniqueness of Gibbs measure (see Theorem 2.2 and the discussion following Definition 2.3 in [52]). We summarize this in the corollary that follows. Corollary 2.2. Suppose the graph list pair g, L satisfy Assumption 3. 1. Then the infinite volume Gibbs measure on the list colorings of G is unique. Proof of Theorem 2.3 2.4 Let v E V be a fixed vertex of g. Let m = A(v) denote the degree of v and let Vi , v2, ... , m be the neighbors of v. The statement of the theorem is trivial if m = 0 (v is an isolated vertex). Let q, = IL(v)I and q,,,= L(vi)1. Also let g, be the graph obtained from g by deleting the vertex v. We begin by proving two useful recursions on the marginal probabilities in the following lemmas. 38 Lemma 2.1. Let j1, j2 E L(v). Let j,3j denote the list associated with graph~g 4 which is obtained from C by removing the color j, from the lists L(vk) for k < i and removing the color j2 from the lists L(vk) for k > i (if any of these lists do not contain the respective color then no change is made to them). Then we have Pgc(c(v) = ji) PgX-(c(v) = j2) m j= 1 - Pg.,L,, 2 (c(vi) = ji) - P9.,L,,j,(c(vi) = 32)' Proof Let Zge(M) denote the number of colorings of a finite graph g with the condition M satisfied. For example, Zgc(c(v) = j) denotes the number of valid colorings of g when the color of v is fixed to be j E L(v). We use a telescoping product argument to prove the lemma: Pg C(c(v) ji) PgC(c(v) = j2) = Zg"c(c(v) = ji) ZgX(c(v) = 32) Zg~,c(c(vi) # ji, 1 <i < M) Zg,,(C(Vi) 9k j2, 1 <- M < m) Pgg(c(vi) # ji 1 K i < m) Pg 4X(c(vi) #?2, 1< i <iM) M Pg,,g(c(vk) 0 j1, 1 < k < i, c(vk) 34 j2, Pg,(c(vk) = ji, 1 k <i - 1, c(v) Pg, L,,(C() c(v) Oi, k< j 2 jC(vk) 54j,17 <k< 0 2i P9,,(c(0)A M1 -Pg,,-14 i +l 5k 5m) j2, i<k<m) C(vk) j2, i+ <k<m) 1, c(vk) =A j2, i+ < k< m) 1, (c(vi) = ji) Pg,,(C(Vi) 1 = j2) 01 The following lemma was proved in [181. We provide the proof here for completeness. 39 Lemma 2.2. Let j E L(v). Let Ci denote the list associated with the graph G, which is obtained from C by removing the color j (if it exists) from the list L(Vk) for k < i. Then we have Pg C(c(v) = ]., EkEL(v) (1 - Pg,,r (c(vi) = j) () M, (c0e ~g.,c, = k)) Proof. Pg C(c(v) = j) ZgC(c(v) = j) ZkEL(v) ZgC(c(v) = k) Zg,,t(c(V ) /- j, 1 < i < M) m) XIEL(.) Zg.,e(c(vi) A k, 1 <i Pg,,i(c( ) =A , 1< i < m) kEv)Pgi,,tc(c(vi) =, k, 1 < i < m) (2.7) Now for any k E L(v), Pg,c(c(v) 4 k, 1 < i < m) Pg,,,c(c(vi) $ k) fPg,,c(c(vi) = kjc(vj k, 1 < 1 < k - 1) i=2 Pg,,, 4 , (c(v ) k). Substituting this into (2.7) completes the proof of the lemma. 0 Before proceeding to the proof of Theorem 2.3, we first establish upper and lower bounds on the marginal probabilities associated with vertex v. Lemma 2.3. Let v E T be such that d(v, &F) > 2. For every j E L(v) and for 40 1 = 1, 2 the following bounds hold. P(c(v) = jIC) < 1/a. (2.8) P(c(v) = jc) (2.9) (mae P(c(v) = jC) > q~1 (1 - 1) (2.10) Here q is the total number of possible colors. Proof. These bounds were proved in [18] with a different constant, i.e. a > a** ::: 2.84. Here we prove the bound when Assumption 3.1 holds. In this proof we assume I = 1. The case 1 = 2 follows by an identical argument. Let C denote the set of all possible colorings of the children v,... ,Vm of v. Note that for any c E C, P(c(v) = jic) I5 1/1. and (2.8) follows. To prove (2.9) we will show that for every coloring of the neighbors of the neighbors of v, the bound is satisfied. So, first fix a coloring c of the vertices at distance two from v. Conditioned on this coloring, define for j E L(v) the marginal tij = Pg.,, (c(vi) = ic). Note that by (2.8) we have t,, 1/8. Because. is triangle-free, there are no edges between the neighbors of v and once we condition on c, we have PG.,' (c(v) = ilc) = Pg,,C(c(vi) = jc). 41 So we obtain zt = jEL(v) z Pg,,(c(vi) = jic) < (2.11) 1. jEL(v)(lL(vj) From Lemma 2.2 we have n(1 - tik) =k]Lv P(c(v) = j|C1) (1- M EkEL(v) 1 ik) (2.12) i=1 kLV) Using Taylor expansion for log(1 - x), we obtain (1 - e ~ 2(e (1 1/p)2 > i=1 i=1 where 0 I eIog(1-'-k) tik) = So ik < tik. 9 ik satisfies (1 - t ) 0, ) 2 > 1/2 by Assumption 3.1. Thus, we obtain 1(1 - e-(1+1/.)tk / - E i=1. i1 Using the fact that arithmetic mean is greater than geometric mean and using (2.11), we get [J(1 - tik) kEL(v) i=1 > (1- tik) qv (kEL(v) i=1 > qexp -q,~1(l + 1/0)E i=1 kEL(v) > (am+ #)e-(1+1/h6)Q.>M > ame-(1+1/,6)Q. 42 ti,k Combining with (2.12) the proof of (2.9) is now complete. m 1 (1-tij) > (1-1/l)'" > (1-1/#). Also (1 - q, 5 q. So we have P(c(v) = jIC1 ) (1 = Z - ti) 0 For j E L(v), define Xj = PgX(c(v) = jIC1 ), = Pg,.:(c(V) = jjC2 ), and the vector of marginals x = (xj :j E L(v)), y = (y, j E L(v)). We define a suitably chosen error function which we will use to establish decay of correlations and prove Theorem 2.3. This error function E(x, y) is defined as E(x, y) = max log jEL(v) - yj min log j EL(v) yy By (2.10) of Lemma 2.3 we have xj, y, > 0 for j E L(v). So the above expression is well-defined. Let jj E L(v) (j2 E L(v)) achieve the maximum (minimum) in the above expression. Recall that for given ji, j2, we denote by 4 ,j 1 j the list associated with graph g, which is obtained from C by removing the color ji from the lists L(vk) 43 k) ) From (2.8) we have for k < i and removing the color 32 from the lists L(vk) for k > i. Define for each <<i < m and j E Ljj~ (vi) the marginals Yij 12 (c(vi) = ilCI), = Pg1,, 4 , = Pg.V,, 4J 12 (C(Vi) = jIC2). and the corresponding vector of marginals x = (xij : j E Lij,, (vi)), . y = (yi3 : j E LjjjJ2(V)) First we prove the following useful fact regarding the terms appearing in the definition of the error function. Lemma 2.4. With xj and y defined as before, we have max log jEL(v) min log Proof. We have L(v)(Xi - y) = ( 0 ( ) 0. jEL(v) X - EjEL(v) y = 0. Since the quantities xj and y, are non-negative for all j E L(v), there exists indices j, E L(v) and j2 E L(v) such that xjl yjl and xj, < y32 . The lemma follows by taking the logarithm of both sides of the preceding inequalities. E We are now ready to prove the following key result which shows that the distance between the marginals induced by the two different boundary conditions measured with respect to the metric defined by the error function E(x, y) contracts. 44 Let e E (0, 1) be such that 1 (1 - (1= Assumption 3.1 guarantees that such an e exists. Lemma 2.5. Let mj = Ag,(vi). Then 1 1 -E(x, y) ; (1 - e) max -- E(xi, yi). m (2.13) " ism>O The expression on the right hand side of (2.13) is interpreted to be 0 if mi = 0 for all i. Proof. If j, = j2, then E(x, y) = 0. Otherwise, (li) - log (.z\10 X 2 / E(x, y) = log (o - log ( 32 Introduce the following variables: zij = log (X ), - Wij = log (lii) 45 .j Using the recursion in Lemma 2.1 we have =[2log For j = j1,j2 ez2) (f- -o 1 -1-e'i'"2 ew hi2 ( E(x,y) = log (1 - ezij,) - log(1 -e log (1 - ezs2) - log (1 - ews2)1 let zgi= Wy = log (1 - e log (1 - e'i). Then we can rewrite E(x, y) as E(x, y) = (zj, - Wil) - (z3 2 - wj 2 ). (2.14) Define the continuous function f : [0, 1] -+ R as m f(t) log(1 - ezei) = i=1 1 _- ezil+(1i--z0)- i=1 Then we have f(0) = 0 and f(1) = zj - wj. Applying the mean value theorem, there exists t E (0, 1) such that z, - w 1 = f(1) - f(0) = f'(t). 46 eziJ+t(wrj%) ezij+t(wij-z-j) (to1, wi Zj j 2=1 Observe that if j Ljjjj2(vi), then Pg.,Le ,2(c(U) = -'4' jIC1) = PgQL-II2 (c(v) = jIC2 ) = 0. Hence for j 0 Ljjjj2(vj), we have we = zij. Also if mi = 0 then v is an isolated vertex in , and in this case we also have wjj = z 1 . Using this fact, we have zi - Wi = _- ezij+t(wqs-%)) (Wij - zij). Jrw2(v)(VO E i:mj>O From convexity of el and Lemma 2.3 we have 0 < ezs+'(w a-%) < tewsi + (1 - t)ei < gei11# Similarly, again using Lemma 2.3 we have 0< Combining we have for j 0 = 1 1 < 1 - ei+t(-) - -/ ji,j2, 1 - ezei+t(uwij-%) 1 - (1 - 1/j6) mjae-I(1+1/P) From Lemma 2.4 we have maxkEL,,,I Zik} 2 (, ){Wik - zi} > 0 and minkEL, 1,,,2(,)Wik - r-zij) ez+t(w' Computing the expression for f(t), we get 0. Using this zjI - wl !5 E 1 i:Mj>0-. M11 max kE44jjJ2(Vi) 47 {Wik - Zik}, and min zj 2 - Wj 2 :mi>O Mi kELi,',, {Wik - Zik}. 2 (v) By using the above bounds in (2.14) we get max EO kEL,,,j Mi min {Wik - Zjk} - E(x, y) < 2 (v) kiJ1,2 i {Wm - Zik} Ii E(x, yi) i i:Mi>O E(x, yi). (2.15) The proof of Lemma 2.5 is now complete. EJ < m max i:TN>O Mi) We now use Lemma 2.5 to complete the proof of Theorem 2.3. Let i* achieve the maximum in (2.15), that is, i* = arg maxi Li = Li, 1 , (1 - 2 Eg,, 1 g(x., yi). Let g1 = , i1 = vi. and (x',yl) = (x,., y.). Lemma 2.5 says that A E(x, y) E),l9E(x', y"). Note that the graph list pair (a1, V1) satisfies Assumption 3.1. We can then apply apply Lemma 2.5 to vi, 9', L' to obtain v 2 , g 27 L2 such that )E(x2 y2) A If we let d = d(v, i9l), then applying 1 1)E(x,y) Lemma 2.5 successively d times we obtain 1 AgMv) E(x, y) (1 - (),d) E(xdyd) Agd(Vd) 2log ( (1- (118) E)d. (1 where the second inequality follows from Lemma 2.3. This gives for any j E L(v), log max log - E(x, y) 5 2A(Iog q - A log(l - 1/,a))(1 48 - e)d. Let F = 2A(log q - AIog(1 - 1/#8)). The quantity F depends only on the quantities defined in Assumption 3.1 and does not depend on the vertex v. Let do be large enough such that exp(F(1 - e)do) < 1+ 2F(1 - E)D. Then for d > do, we have . P(c(v) = jjC1 ) < 1 + 2Fed P(c(v) = jjC 2 ) ~ where -y =-log(1 - e). For d < do P(c(v) = jC 2) Taking B = --,d- - P(c(v) = JIG1 ) <1 + eF+ max{eF+,do 2F} we get P(c(v) = jiCI) < 1 + Be-yd. ) P(c(v) = jIC2 The lower bound on the ratio of probabilities is obtained in a similar fashion. This completes the proof of Theorem 2.3. 2.5 Conclusion In this chapter, we proved that the strong spatial mixing for the list coloring problem holds for a general triangle free graph when for each vertex of the graph, the size of its list is at least aA(v) + f and a > a* ; 1.763 and 8 is a sufficiently large constant depending only on a. This extends the previous results for strong spatial mixing of colorings for regular trees [20} and for amenable triangle free graphs [23]. An interesting next venture would be to use this long range independence property to produce efficient approximation algorithms for counting colorings similar to [18J. The main obstruction that we face here is that in order to prove contraction of the 49 recursion for a* - 1.763, we need to use bounds on the probabilities mentioned in Lemma 2.3. This restricts our result to correlation decay with respect to distance in the graph theoretic sense instead of correlation decay in the computation tree. This means that while our result can be used to produce an approximation scheme for computing marginals, it does not directly lead to an FPTAS for counting the number of valid colorings like in [18]. It would also be interesting to establish this result for smaller a. Here again Lemma 2.3 which necessitates a > a* proves to be the bottleneck. One possible way to address this would be to tighten the contraction argument such that a weaker version of inequality (2.9) suffices. It has been conjectured that a = 1 and 8 = 2 suffices but at the moment we axe quite far from this result. It also remains open whether strong spatial mixing is necessary or sufficient or both for the existence of an FPTAS for computing partition functions. 50 Chapter 3 Giant Component in Random Multipartite Graphs with Given Degree Sequences 3.1 Introduction The problem of the existence of a giant component in random graphs was first studied by Erd6s and R6nyi. In their classical paper [16], they considered a random graph model on n and m edges where each such possible graph is equally likely. They showed that if m/n > 1 +c, with high probability as n -+ oo there exists a component of size linear in n in the random graph and that the size of this component as a fraction of n converges to a given constant. The degree distribution of the classical Erdbs-R6nyi random graph has Poisson tails. However in many applications the degree distribution associated with an underlying graph does not satisfy this. For example, many so-called "scale-free" networks 51 exhibit power law distribution of degrees. This motivated the study of random graphs generated according to a given degree sequence. The giant component problem on a random graph generated according to a given degree sequence was considered by Molloy and Reed [43]. They provided conditions on the degree distribution under which a giant component exists with high probability. Further in [44], they also showed that the size of the giant component as a fraction of the number of vertices converges in probability to a given positive constant. They used an exploration process to analyze the components of vertices of the random graph to prove their results. Similar results were established by Janson and Luczak in [29] using different techniques based on the convergence of empirical distributions of independent random variables. There have been several papers that have proved similar results with similar but different assumptions and tighter error bounds [25], [8], [48]. Results for the critical phase for random graphs with given degree sequences were derived by Kang and Seierstad in [34]. All of these results consider a random graph on n vertices with a given degree sequence where the distribution is uniform among all feasible graphs with the given degree sequence. The degree sequence is then assumed to converge to a probability distribution and the results provide conditions on this probability distribution for which a giant component exists with high probability. In this chapter, we consider random multipartite graphs with p parts with given degree distributions. Here p is a fixed positive integer. Each vertex is associated with a degree vector d, where each of its component di, i E [p] dictates the number of neighbors of the vertex in the corresponding part i of the graph. As in previous papers, we assume that the empirical distribution associated with the number of vertices of degree d converges to a probability distribution. We then pose the problem of finding conditions under which there exists a giant component in the random graph with high probability. Our approach is based on the analysis of the Molloy 52 and Reed exploration process. The major bottleneck is that the exploration process is a multidimensional process and the techniques of Molloy and Reed of directly underestimating the exploration process by a one dimensional random walk does not apply to our case. In order to overcome this difficultly, we construct a linear Lyapunov function based on the Perron-Frobenius theorem, a technique often used in the study of multidimensional branching processes. Then we carefully couple the exploration process with some underestimating process to prove our results The coupling construction is also more involved due to the multidimensionality of the process. This is because in contrast to the unipartite case, there are multiple types of clones (or half-edges) involved in the exploration process, corresponding,to which pair of parts of the multipartite graph they belong to. At every step of the exploration process, revealing the neighbor of such a clone leads to the addition of clones of several types to the component being currently explored. The particular numbers and types of these newly added clones is also dependent on the kind of clone whose neighbor was revealed. So, the underestimating process needs to be constructed in a way such that it simultaneously underestimates the exploration process for each possible type of clone involved. We do this by choosing the parameters of the underestimating process such that for each type of clone, the vector of additional clones which are added by revealing its neighbor is always component wise smaller than the same vector for the exploration process. All results regarding giant components typically use a configuration model corresponding to the given degree distribution by splitting vertices in'to clones and performing a uniform matching of the clones. In the standard unipartite case, at every step of the exploration process all available clones can be treated same. However in the multipartite case, this is not the case. For example, the neighbor of a vertex in part 1 of the graph with degree d can lie in part j only if d > 0. Further, this 53 neighbor must also have a degree d such that di > 0. This poses the issue of the graph breaking down into parts with some of the p parts of the graph getting disconnected from the others. To get past this we make a certain irreducibility assumption which we will carefully state later. This assumption not only addresses the above problem, but also enables us to construct linear Lyapunov functions by using the Perron-Frobenius theorem for irreducible non-negative matrices. We also prove that with the irreducibility assumption, the giant component when it exists is unique and has linearly many vertices in each of the p parts of the graph. In [8], Bollobas and Riordan show that the existence and the size of the giant component in the unipartite case is closely associated with an edge-biased branching process. In this chapter, we also construct an analogous edge-biased branching process which is now a multi-type branching process, and prove similar results. Several real world networks naturally demonstrate a multipartite nature. The author-paper network, actor-movie network, the network of company ownership, the financial contagion model, heterogenous social networks, etc. are all multipartite [46], [91, [27]. Examples of biological networks which exhibit multipartite structure include drug target networks, protein-protein interaction networks and human disease networks [221, [56], [45]. In many cases evidence suggests that explicitly modeling the multipartite structure results in more accurate models and predictions. Random bipartite graphs (p = 2) with given degree distributions were considered by Newmann et. al in [471. They used generating function heuristics to identify the critical point in the bipartite case. However, they did not provide rigorous proofs of the result. Our result establishes a rigorous proof of this result and we show that in the special case p = 2, the conditions we derive is equivalent to theirs. The rest of the chapter is structured as follows. In Section 3.2, we start by introducing the basic definitions and the notion of a degree distribution for multipartite 54 graphs. In Section 3.3, we formally state our main results. Section 3.4 is devoted to the description of the configuration model. In Section 3.5, we describe the exploration process of Molloy and Reed and the associated distributions that govern the evolution of this process. In Section 3.6 and Section 3.7, we prove our main results for the supercritical case, namely when a giant component exists with high probability. In Section 3.8 we prove a sublinear upper bound on the size of the largest component in the subcritical case. We then conclude in Section 3.9 with some future directions. 3.2 Definitions and preliminary concepts We consider a finite simple undirected graph g = (V, C) where V is the set of vertices path between two vertices v, and v 2 in V is a collection of vertices vI v 2 in V such that for each i = = Ul, U2, - -- , - and E is the set of edges. We use the words "vertices" and "nodes" interchangeably. A 1,2,. .. ,l - 1 we have (u,u +) EL. A component, or more specifically a connected component of a graph 9 is a subgraph C C Q such that there is a path between any two vertices in C. A family of random graphs {9} on n vertices is said to have a giant component if there exists a positive constant e > 0 such that P(There exists a component C C g for which L ; e) -+1. Subsequently, when a property holds with probability converging to one as n -+ oo, we will say that the property hold with high probability or w.h.p. for short. For any integer p, we use [p] to denote the set {1, 2, ... ,p}. M E R", we denote by liMit = maxj,, IMji,, the For any matrix largest element of the matrix M in absolute value. It is easy to check that | j| is a valid matrix norm. We use 6,, to 55 denote the Kronecker delta function defined by 1 if i = j, 0, otherwise. We denote by 1 the all ones vector whose dimension will be clear from context. The notion of an asymptotic degree distribution was introduced by Molloy and Reed [431. In the standard unipartite case, a degree distribution dictates the fraction of vertices of a given degree. In this section we introduce an analogous notion of an asymptotic degree distribution for random multipartite graphs. We consider a random multipartite graph g on n vertices with p parts denoted by G,..., G,. For any i E [pl a vertex v E Gi is associated with a "type" d E ZPi which we call the "type" of v. This means for each i = 1, 2,..., p, the node with type d has d(i) Ad neighbors in G. A degree distribution describes the fraction of vertices of type d in Gi, i E [p]. We now define an asymptotic degree distributionas a sequence of degree distributions which prescribe the number of vertices of type d in a multipartite graph on n vertices. For a fixed n, let D(n) = (n4(n), iE 7p, d E {0, 1,... , n}P), where nO(n) denotes the number of vertices in Gi of type d. Associated with each D(n) is a probability distribution p(n) = (%!) , i E [p], d E {0, 1, ... , n}P which denotes the fraction of vertices of each type in each part. Accordingly, we write pO(n) = For any vector degree d the quantity i'd is simply the total degree of the vertex. We define the quantity w(n) A max{1'd : nr4(n) > 0 for some i E [p]}, (3.1) which is the maximum degree associated with the degree distribution D(n). To prove 56 our main results, we need additional assumptions on the degree sequence. Assumption 3.1. The degree sequence {V(n)}nE satisfies the following conditions: (a) For each n E N there exists a simple graph with the degree distribution prescribed by D(n), i.e., the degree sequence is a feasible degree sequence. (b) There exists a probability distribution p = (p', i E [p], d E Z) such that the sequence of probability distributions p(n) associated with D(n) converges to the distribution p. (c) For each i E [p], Zd l'dpi(n) -+ l'dp. (d) For each i,j E [p} such that A' A Ed dypd = 0, the corresponding quantity A (n) A Ed dyp?(n) = 0 for all n. (e) The second moment of the degree distribution given by Zd(1'd)2p? exists (is finite) and Ed(1'd)2pO(n) -+ E(1'd)2d Note that the quantity Ed 1'dp? (n) in condition (c) is simply . So this condition implies that the total number of edges is O(n) , i.e., the graph is sparse. In condition (e) the quantity Zd(1'd)2p?(n) is same as . so this condition says that sum of the squares of the degrees is O(n). It follows from condition (c) that Aj < oo and that A (n) -+ Aj. The quantity A? is asymptotically the fraction of outgoing edges from G, to G,. For p to be a valid degree distribution of a multipartite graph, we must have for each 1 <i < j < p, A= Aj and for every n, we must have Aj(n) = A(n). We have not included this in the above conditions because it follows from condition (a). Condition (d) excludes the case where there are sublinear number of edges between G, and G,. 57 There is an alternative way to represent some parts of Assumption 3.1. For any probability distribution p on ZPi, let D. denote the random variable distributed as p. Then (b), (c) and (e) are equivalent to the following. (b') DP() - D, in distribution. (c') E[1'Dp(n)] -+ E[1'D,]. 2 (e') E[(1'D,(n)) 2 ] -+ E[(1'Dp) I. The following preliminary lemmas follow immediately. Lemma 3.1. The conditions (b'), (c') and (e') together imply that the random variables {1'Dp(n) }EI and { (1'Dp(n) )2} are uniformly integrable. Then using Lemma 3.1, we prove the following statement. Lemma 3.2. The maximum degree satisfies w(n) = o(Vri). Proof. For any e > 0, by Lemma 3.1, there exists q E Z such that E[(1'Dp(n))21{1ID>q}I E. Observe that for large enough n, we have max{j n), Z} E[(1'D(n) )2 11'Dq}] 0 c. Since c is arbitrary, the proof is complete. Let S = {(i, j) I Al > 0} and let N IjSj. For each i E [p], let S, A {j E [p] I (ij) E S}. Note that by condition (a), the set of feasible graphs with the degree distribution is non-empty. The random multipartite graph g we consider in this chapter is drawn uniformly at random among all simple graphs with degree distribution given by D(n). The asymptotic behavior of D(n) is captured by the quantities p?. The existence of a giant component in g as n -+ oo is determined by the distribution p. 58 < 3.3 Statements of the main results The neighborhood of a vertex in a random graph with given degree distribution resembles closely a special branching process associated with that degree distribution called the edge-biased branching process. A detailed discussion of this phenomenon and results with strong guarantees for the giant component problem in random unipartite graphs can be found in [81 and [48]. The edge biased branching process is defined via the edge biased degree distribution that is associated with the given degree distribution. Intuitively the edge-biased degree distribution can be thought of as the degree distribution of vertices reached at the end point of an edge. Its importance will become clear when we will describe the exploration process in the sections that follow. We say that an edge is of type (i, j) if it connects a vertex in Gi with a vertex in Gj. Then, as we will see, the type of the vertex in Gj reached by following a random edge of type (i, j) is d with probability "s'. We now introduce the edge-biased branchingprocess which we denote by T. Here T is a multidimensional branching process. The vertices of T except the root are associated with types (ij) E S. So other than the root, T has N < p2 types of vertices. The root is assumed to be of a special type which will become clear from the description below. The process starts off with a root vertex v. With probability pi, the root v gives rise to dj children of type (i, j) for each j E [p]. To describe the subsequent levels of T let us consider any vertex with type (i,j). With probability this vertex gives rise to (d4- Smi) children of type (j, m) for each m E [p]. The number of children generated by the vertices of T is independent for all vertices. For each n, we define an edge-biased branching process T- which we define in the same way as T by using the distribution D(n) instead of V. We will also use the notations T(v) and T(v) whenever the type of the root node v is specified. 59 We denote the expected number of children of type (j, m) generated by a vertex of type (i, j) by /ijjm A Z(dm - dtp (3.2) Aij d It is easy to see that IAijjm > 0. Assumption 1(e) guarantees that piLjjm is finite. Note that a vertex of type (i, j) cannot have children of type (1, m) if j =, 1. But for convenience we also introduce tsjgn = 0 when j 0 1. By means of a remark we should note that it is also possible to conduct the analysis when we allow the second moments to be infinite (see for example [43], [81), but for simplicity, we do not pursue this route in this chapter. Introduce a matrix M E RN defined as follows. Index the rows and columns of the matrix with double indices (i, j) E S. There are N such pairs denoting the N rows and columns of M. The entry of M corresponding to row index (i, j) and column index (1, m) is set to be ijim. Definition 3.1. Let A E RNxN be a matrix. Define a graph 'H on N nodes where for each pair of nodes i and j, the directed edge (ij) exists if and only if A 3 > 0. Then the matrix A is said to be irreducible if the graph W is strongly connected, i.e., there exists a directed path in W between any two nodes in X. We now state the well known Perron-Frobenius Theorem for non-negative irreducible matrices. This theorem has extensive applications ii the study of multidimensional branching processes (see for example [38]). Theorem 3.1 (Perron-Frobenius Theorem). Let A be a non-negative irreducible matrix. Then 60 (a). A has a positive eigenvalue y > 0 such that any other eigenvalue of A is strictly smaller than - in absolute value. (b). There exists a left eigenvectorx of A that is unique up to scalarmultiplication associated with the eigenvalue y such that all entries of x are positive. We introduce the following additional assumption before we state our main results. Assumption 3.2. The degree sequence {V(n)}En satisfies the following conditions. (a). The matrix M associated with the degree distribution p is irreducible. (b). For each i E [p, Si5 0. Assumption 3.2 eliminates several degenerate cases. For example consider a degree distribution with p = 4, i.e., a 4-partite random graph. Suppose for i = 1,2, we have p1 is non-zero only when d3 = d4 = 0, and for i = 3,4, pd is non-zero only when d, = d2 = 0. In essence this distribution is associated with a random graph which is simply the union of two disjoint bipartite graphs. In particular such a graph may contain more than one giant component. However this is ruled out under our assumption. Further, our assumption allows us to show that the giant component has linearly many vertices in each of the p parts of the multipartite graph. Let 00 A 1- (T =i) = P(1T1 = 00) (.3 i1 Namely, - is the survival probability of the branching process T. We now state our main results. 61 Theorem 3.2. Suppose that the Perron robenius eigenvalue of M satisfies y > 1. Then the following statements hold. (a) The random graph G has a giant component C C G w.h.p. Further, the size of this component C satisfies lim P E < I< < 7 + 6=, (3.4) for any E > 0. (b) All components of g other than C are of size O(log n) w.h.p. Theorem 3.3. Suppose that the Perron Frobenius eigenvalue of M satisfies -y < 1. Then all components of the random graph g are of size O(w(n)2 log n) w.h.p. The conditions of Theorem 3.2 where a giant component exists is generally referred to in the literature as the supercritical case and that of Theorem 3.3 marked by the absence of a giant component is referred to as the subcritical case. The conditions under which giant component exists in random bipartite graphs was derived in [47] using generating function heuristics. We now consider the special case of a bipartite graph and show that the conditions implied by Theorem 3.2 and Theorem 3.3 reduce to that in [47]. In this case p = 2 and N = 2. The type of all vertices d in G1 are of the form d = (0, j) and those in G2 are of the form d = (k, 0). To match the notation in [47], we let pd= p when d = (0, j) and pd = qk when d = (k,0). So ' kqk. Using the definition of p221 = Ai= from equationd2pd = g jpA = (3.2), we get /-11221 = Z(di - J1)d A2 62 = E k(k 1)qk A1 - Similarly we can compute p2112 - From the definition of M, M =A1221 The Perron-Frobenius norm of M is its spectral radius and is given by (/p121)(/p2112). So the condition for the existence of a giant component according to Theorem 3.2 is given by (p'l2i)(A2112) - 1> 0 which after some algebra reduces to Ejk(jk- j -k)p,k > 0. j~k This is identical to the condition mentioned in [471. The rest of the chapter is devoted to the proof of Theorem 3.2 and Theorem 3.3. 3.4 Configuration Model The configuration model [54], [7], [4} is a convenient tool to study random graphs with given degree distributions. It provides a method to generate a multigraph from the given degree distribution. When conditioned on the event that the graph is simple, the resulting distribution is uniform among all simple graphs with the given degree distribution. We describe below the way to generate a configuration model from a given multipartite degree distribution. 1. For each of the n?(n) vertices in G of type d introduce d, clones of type (i, j). An ordered pair (i,j) associated with a clone designates that the clones belongs to Gf and has a neighbor in G. From the discussion following Assumption 3.1, the number of clones of type (i,j) is same as the number of clones of type (j,i). 63 2. For each pair (i, j), perform a uniform random matching of the clones of type (i, j) with the clones of type (j, i). 3. Collapse all the clones associated with a certain vertex back into a single vertex. This means all the edges attached with the clones of a vertex are now considered to be attached with the vertex itself. The following useful lemma allows us to transfer results related to the configuration model to uniformly drawn simple random graphs. Lemma 3.3. If the degree sequence {D(n)}nE satisfies Assumption 3.1, then the probability that the configuration model results in a simple graph is bounded away from zero as n -+ oo. As a consequence of the above lemma, any statement that holds with high probability for the random configuration model is also true with high probability for the simple random graph modeL So we only need to prove Theorem 3.2 and Theorem 3.3 for the configuration model. The proof of Lemma 3.3 can be obtained easily by using a similar result on directed random graphs proved in [12]. The specifics of the proof follow. Proof of Lemma 3.3. In the configuration model for multipartite graphs that we described, we can classify all clones into two categories. First, the clones of the kind, (i, i) E S and the clones of the kind (i, j) E S, i $ j. Since the outcome of the matching associated with each of the cases is independent, we can treat them separately for this proof. For the first category, the problem is equivalent to the case of configuration model for standard unipartite graphs. More precisely, for a fixed i, we can construct a standard degree distribution D(n) from V(n) by taking the ith 64 component of the corresponding vector degrees of the latter. By using Assumptions 3.1, our proof then follows from previous results for unipartite case. For the second category, first let us fix (i,j) with i : j. Construct a degree distribution D1 (n) = (nlk(n), k E [n]) where nk(n) denotes the number of vertices of degree k by letting nk(n) = Ed 1{d(j) = kni. Construct D2 (n) similar to 1) 1(n) by interchanging i and j. We consider a bipartite graph where degree distribution of the vertices in part i is given by Di(n) for i = 1, 2. We form the corresponding configuration model and perform the usual uniform matching between the clones generated from Vi(n) with the clones generated from D2 (n). This exactly mimics the outcome of matching that occurs in our original multipartite configuration model between clones of type (i, j) and (j, i). With this formulation, the problem of controlling number of double edges is very closely related to a similar problem concerning the configuration model for directed random graphs which was studied in [12]. To precisely match their setting, add "dummy" vertices with zero degree to both A (n) and D2 (n) so that they have exactly n vertices each and then arbitrarily enumerate the vertices in each with indices from [n]. From Assumption 3.1 it can be easily verified that the degree distributions VI(n) and D2 (n) satisfy Condition 4.2 in [12]. To switch between our notation and theirs, use D1 (n) -+ M"] and D2 (n) -+ DIn]. Then Theorem 4.3 in [12] says that the probability of having no self loops and double edges is bounded away from zero. In particular, observing that self loops are irrelevant in our case, we conclude that lim.,. 0 P(No double edges) > 0. Since the number of pairs (i, j) is less than or equal to p(p - 1) which is a constant with respect to n, the proof is now complete. [ Exploration Process 3.5 In this section we describe the exploration process which was introduced by Molioy and Reed in [43] to reveal the component associated with a given vertex in the random graph. We say a clone is of type (i, j) if it belongs to a vertex in Gi and has its neighbor in G3 . We say a vertex is of type (i, d) if it belongs to Gi and has degree type d. We start at time k = 0. At any point in time k in the exploration process, there are three kinds of clones - 'sleeping' clones , 'active' clones and 'dead' clones. For each (i, j) E S, the number of active clones of type (i, j) at time k are denoted by A;(k) and the total number of active clones at time k is given by A(k) = Z(ij)ES Aj (k). Two clones are said to be "siblings" if they belong to the same vertex. The set of sleeping and awake clones are collectively called 'living' clones. We denote by Li(k) the number of living clones in Gi and Li(k) to be the number of living clones of type (i, j) at time k. It follows that E)j L (k) = L (k). If all clones of a vertex are sleeping then the vertex is said to be a sleeping vertex, if all its clones are dead, then the vertex is considered dead, otherwise it is considered to be active. At the beginning of the exploration process all clones (vertices) are sleeping. We denote the number of sleeping vertices in G of type d at time k by N5(k) and let Ns(k) = Ei,d Nid(k). Thus Nid(0) = nO(n) and Ns(O) = n. We now describe the exploration process used to reveal the components of the configuration model. Exploration Process. 1. Initialization: Pick a vertex uniformly at random from the set of all sleeping vertices and and set the status of all its clones to active. 66 2. Repeat the following two steps as long as there are active clones: (a). Pick a clone uniformly at random from the set of active clones and kill it. (b). Reveal the neighbor of the clone by picking uniformly at random one of its candidate neighbors. Kill the neighboring clone and make its siblings active. 3. If there are alive clones left, restart the process by picking an alive clone uniformly at random and setting all its siblings to active, and go back to step 2. If there are no alive clones, the exploration process is complete. Note that in step 2(b), the candidate neighbors of a clones of type (i, j) are the set of alive clones of type (j, i). The exploration process enables us to conveniently track the evolution in time of the number of active clones of various types. We denote the change in A (k) by writing -A? (k + 1) = A '(k) + Z-(k + 1), (i, j) E S. Define Z(k) A (Zj(k), (i, j) E S) to be the vector of changes in the number of active clones of all types. To describe the probability distribution of the changes Z (k +1), we consider the following two cases. Case 1: A(k) > 0. Let E denote the event that in step 2-(a) of the exploration process, the active clone picked was of type (i,j). The probability of this event is A . () that case we kill the clone that we chose and the number of active clones of type 67 (i, j) reduces by one. Then we proceed to reveal its neighbor which of type (j,i). One of the following events happen: (i). E.: the neighbor revealed is an active clone. The probability of the joint event is given by P(E ln E.) A(k) LI(k) A (k)(k)-1 A(k) Li(k)-1 if i Such an edge is referred to as a back-edge in [43]. The change in active clones of different types in this joint event is as follows. - Ifi Z (k +1) = Zz(k +1) = -1, - otherwise . Zrm(k + 1) = 0, Ifij Z (k +1) = -2 otherwise . Z7"(k +1) =0, (ii). Ed: The neighbor revealed is a sleeping clone of type d. The probability of this joint event is given by A?(k) diN'(k) P(E nch) = n A (k) L3 (k) - J The sleeping vertex to which the neighbor clone belongs is now active. 68 The change in the number of active clones of different types is governed by the type d of this new active vertex. The change in active clones of different types in this event are as follows. - Ifi=j, Z (k +1) = -1, Z,"(k +1) = dm - 6im, Zj"(k +1) = 0, If i - otherwise. j, Zs (k +1)= -2 +d, Z"(k +1)= d, for m 0 i, Zi (k + 1)= 0, otherwise. Note that the above events are exhaustive, i.e., P(EnE )+ Z P(Ej Ea) 1. sjES ijES d Case 2: A(k) = 0. In this case, we choose a sleeping clone at random and make it and all its siblings active. Let E, be the event that the sleeping clone chosen was of type (i, j). Further let Ed be the event that this clone belongs to a vertex of type 69 (i, d). Then we have P(E n E = L3 (k) dN(k) _ d Nid(k) L{(k) L(k) L(k) In this case the change in the number of active clones of different types is given by Zi"(k +1) = dm, for m E Sj, Z "'(k +1) = 0, otherwise. We emphasize here that there are two ways in which the evolution of the exploration process deviates from that of the edge-biased branching process. First, a back-edge can occur in the exploration process when neighbor of an active clone is revealed to be another active clone. Second, the degree distribution of the exploration process is time dependent. However, close to the beginning of the process, these two events do not have a significant impact. We exploit this fact in the following sections to prove Theorem 3.2 and 3.3. 3.6 Supercritical Case In this section we prove the first part of Theorem 3.2. To do this we show that the number of active clones in the exploration process grows to a linear size with high probability. Using this fact, we then prove the existence of a giant component. The idea behind the proof is as follows. We start the exploration process described in the previous section at an arbitrary vertex v E g. At the beginning of the exploration process, i.e. at k = 0 , we have N'(0) = np'(n) and Lj(0) = nAj(n). So, close to 70 the beginning of the exploration, a clone of type (i, j) gives rise to d,, - 5,,, clones of type (j,m) with probability close to A,;(n) which in turn is close to A', for large enough n. If we consider the exploration process in a very small linear time scale, i.e. for k < en for small enough e, then the quantities the quantities dt 1d1(k) diP? remain close to a and are negligible. We use this observation to construct a process which underestimates the exploration process in some appropriate sense but whose parameters are time invariant and "close" to the initial degree distribution. We then use this somewhat easier to analyze process to prove our result. We now get into the specific details of the proof. We define a stochastic process B-(k) which we will couple with A (k) such that B (k) underestimates A (k) with probability one. We denote the evolution in time of B, (k) by B (k +1) To define Zj(k + = B (k)+ Z(k +1), (ij) E S. 1), we choose quantities 7rd satisfying <, i=1 - > 0, (3.5) ,(3.6) d for some 0 < -y < 1 to be chosen later. We now show that in a small time frame, the parameters associated with the exploration process do not change significantly from their initial values. This is made precise in Lemma 3.4 and Lemma 3.5 below. Before that we first introduce some useful notation to describe these parameters for a given n and at a given step k in the exploration process. Let M(n) denote the matrix of means defined analogous 71 I to M by replacing by . Also for a fixed n, define Mk(n) similarly by djNO(k) __4 replacing !, . Note that Mo(n) = M(n). Also from Assumption 3.1 it by follows that and that M(n) -+ M. -d M, (n) Al Lemma 3.4. Given 6 > 0, there exists c > 0 and some integernh such that for all n > and for all time steps k < en in the explorationprocess we have L.d L-j (k) - bi '1< Ai| '5. Proof. Fix el > 0. From Lemma 3.1 we have that that random variables 1'D(n) are uniformly integrable. N( N ) p1(n), we have Id Then there exists q E Z such that for all n we have dtpjn)1{1'd>q} <e1 . Since 0 d p1(n) - d Ed {1'd>q} ration process we have j k) .< For each time step k < en in the exploc. (- Ed 1{1'd1q} d Nk) - dipj(n)I < - <q . Now we can boundwe have Additionally, L (k) can change by at most two n L{'dk)q} 111>q} d 1. So for small enough e, we can make ~ AM(n) I < 2e. So for small enough e, for every (i, j) E S at each step. So ( - L'(). A() d N (k) d1p2(n) L' (k) - 6ij Ail (n) d } (3.7) (f1q di Nd(k) Lj(k) - 6j d Nd (k) nA (n) dpl(n) di N (k) nA (n) Aj(n) di jN'(k) d n A,(n). S6/4, where the last inequality can be obtained by choosing small enough e 1 . Since q is a constant, by choosing small enough e we can ensure that Ed 111,dq} constantLj(k)-6,y 72 I (k)-6 dA n~n) (n) <(k) d , - 6/4. Additionally from Assumption 3.1, for large enough n we have 6/2. The lemma follows by combining the above inequalities. 0 Lemma 3.5. Given 6 > 0, there exists e > 0 and some integer n^ such that for all n > n and for all time steps k < en in the explorationprocess we have I IMk(n) - MI I 6. Proof. The argument is very similar to the proof of Lemma 3.4. Fix el > 0. From Lemma 3.1 we know that the random variables (1'Dp(n)) 2 are uniformly integrable. It follows that there exists q E Z such that for all n, we have E[(1'D(n))1{(LjV(n))>q1 < el. From this we can conclude that for all i, j, m we have E>(d.El. Since 6 m)dpd'(n)1{1'd>q) (k)C < pd (n), we have n( (dm - 6im)dip'(n)11'd>q} d Z(d d - Jim)dsN(n) 1{1'd>q}I E. (3.8) Also L (k) can change by at most 2en. So, for small enough e, by an argument similar to the proof of Lemma 3.4, we can prove analogous to (3.7) that - 6im) d - ) V - Z 1{1d>ql}(dm d m <-. - ti(n) (3.9) ( Z 1{1y'd>q(d By choosing E small enough, we can also ensure o.)(3.10) d 1(1 d . - djIN'(k) )L (k) - 6m d pd(n) 6 <'() Since M(n) converges to M we can choose n such that IM(n) - M1 combining the last two inequalities, the proof is complete. . By 0 Lemma 3.6. Given any 0 < ^I < 1, there exists E > 0, an integer n E Z and quantities j satisfying (3.5) and (3.6) and the following conditionsfor all n > nh: 73 (a) For each time step k < en, d < d NO(k) /(3.11) Lt (k) - Jj for each (i, j) E S. by 7r (b) The matrix M defined analogous to M by replacing IIM - MI I in (3.2) satisfies (3.12) err(y), where err(y) is a term that satisfies lim,_ 0 err(y) = 0. Proof. Choose q = q(-y) E Z such that 7/2. Now choose ir4 'L{1'd>q} Zd satisfying (3.5) and (3.6) such that 7rd = 0 whenever 1'd > q. Using Lemma 3.4, we can now choose n and e such that for every (i, j) E S and d such that 1'd q, (3.11) is satisfied for all n > n and all k < en. The condition in part (a) is thus satisfied by this choice of 4. For any 7, let us denote the choice of 7rd made above by irj('y). By construction, whenever Mijim =0, we also have Mijim = 0. Suppose Mijim = Ed(dm - 5im) < 0. Also, by construction we have 0 < ir (Q) < dt and that I -+ as drj() > 7 -+0. Let X, be the random variable that takes the value (dn - 3im) with probability 7rj(y) and 0 with probability 7. Similarly, let X be the random variable that takes the value (din - Ji.) with probability . Then, from the above argument have Xy -+ X as - -+ 0 and that the random variable X dominates the random variable X, for all -y > 0. Note that X is integrable. The proof of part (b) is now complete by using the Dominated Convergence Theorem. 0 74 Assume that the quantities Eand ig have been chosen to satisfy the inequalities (3.11) and (3.12). We now consider each of the events that can occur at each step of the exploration process until time en and describe the coupling between Z (k + 1) and Z(k +1) in each case. Case 1: A(k) > 0. Suppose the event E' happens. We describe the coupling in case of each of the following two events. (i). E.: the neighbor revealed is an active clone. In this case we simply mimic the evolution of the number of active clones in the original exploration process. Namely, ZI(k + 1) = Zr(k + 1) for all 1, m. (ii). Ed: The neighbor revealed is a sleeping clone of type d. In this case, we split the event further into two events E 0 and Ef, that is Eg'0 UE?1 = E and Edo nE 1 = .In particular, (k) - 8ij d) =irfgi(L. (Ead P(E PdIE3nEcI)= ds Njd(k) P(E 1 E nE) = 1 - P(E 01E For the above to make sense we must have rji l E?). which is guar- anteed by our choice of ir . We describe the evolution of B (k) in each of the two cases. (a). Edo: in this case set Z7(k + 1) = Zm(k +1) for all 1, m. (b). Ed,: In this case, we mimic the evolution of the active clones of event E. instead of Ed. More specifically, 75 - If i = j, 2Z (k +1)= Zjk+) Z "(k +1) - 0, -17 otherwise. If ij, Z (k+1) -2, Zr"(k +1) =0, otherwise. Case 2: A(k) =0. Suppose that event E' nEd happens. In this case we split Ed into two disjoint events Ed and Ed such that P (Eod IEi irg (L4(k) nEd) = 7ij(.,()-bj -- 6i,) dj Nd (k) 1 - P(Ef|E flEd). P(E'E jnEd) Again, the probabilities above are guaranteed to be less than one for time k < En because of the choice of r. The change in B (k + 1) in case of each of the above events is defined as follows. (a) Ed. 76 S(k+ 1) = -1, Z"(k+1) =dm -im Zm(k+1)=0, - for lj. If i Z (k +1) = -2+ ds, Zi"(k + 1) = d,,, for m #i, Z"(k+1)=0, for I # i. (b) El. - ifi~j Z(k + 1) =Z %(k+ 1) =- Zm7(k +1)= 0, otherwise. - If i Z(k + 1) = -2, Zj(k +1) = 0, otherwise. This completes the description of the probability distribution of the joint evolution of the processes A (k) and B3 (k). Intuitively, we are trying to decrease the probability of the cases that actually 77 help in the growth of the component and compensate by increasing the probability of the event which hampers the growth of the component (back-edges). From the description of the the coupling between Z'(k +1) and for time k < en, with probability one we have B (k) Z(k +1) it can be seen that A(k). Our next goal is to show that for some (i, j) E S the quantity Bi (k) grows to a linear size by time en. Let H(k) = o-({A (r),B (r), (ij) E S, 1 < r < k}) denote the filtration of the joint exploration process till time k. Then the expected conditional change in Bj(k) can be computed by considering the two cases above. First suppose that at time step k we have A(k) > 0, i.e., we are in Case 1. We first assume that i # j. Note that the only events that affect Z' (k + 1) are El and Em E[Z(k + 1)1H(k)] = P(Eii|H(k)) E[Zji(k + 1)|H(k), E } ( for m E [p]. Then, 3.13) + E P(Em n EaIH(k)) E[Z4(k + 1)IH(k), Em n Ea] P(Em n EjojH(k)) E[Zj(k + 1)jH(k), Em n Ed'] + m,d P(E, n EdIH(k)) E[Zj(k + 1)IH(k), Em nEi}. + m,d The event En flEa affects -1. Z (k +1) only when m = j, and in this case, 4j(k + 1) = The same is true for the event Ed flEd. In the event Et f Ef, we have 78 Z4(k +1) = dj - S(jm. Using this, the above expression is 7rk) A (k) A j(k)(-1) + dj (-1) + AA(k) A(k)im~ L(k) rnd A -A(k) +E A(k) - 8jmn) dL((k)-d d +r A.2 (k) d A(k) dj Nid (k) +E d Ai (k) A(k) -A() 7r) (-1). - (k) + A(k) A(k) ( Lil (k) dzir) (-1) 7 (dj - +m1 djNid(k) L (k) d m(dj - 6jm)) dA(k) ) A 4s (k) L ( )+ (-1)+ + A (-) A(k) +E"%Amk m Ak M(d- 6m)), where the last equality follows from (3.6). Now suppose that at time k we have A(k) = 0, i.e., we are in Case 2. In this case, we can similarly compute E[Z (k + 1)IH(k)] = P(E|IH(k)) E[Zj(k + 1)IH(k), E ] + E P(Ei' flEd f EoIH(k)) E[Zj(k + 1)IH(k), Eim fEd n Eo] md + ZP(Ei nEd n EljH(k)) E[Zj(k + 1)fH(k), Em n Ed n E']. md 79 Using the description of the coupling in Case 2, the above expression is = L.(k) L(k) L.(k) L(k) +E L(k) Zrd (d, -6jm) M d + L(k) Y +E + ( 7rjiL'(k) djNid(k) z Lj(k) -Il(k) 1dIN' (k) L(k) d L(k) E-m m (dj - Jjm). d For the case i = j, a similar computation will reveal that we obtain very similar expressions to the case i &j. We give the expressions below and omit the computation. For Case 1, A(k) > 0, As'(k) A(k) (- 1At(k) A(k) +E A*m(k) A(k) (zid - E[Zi(k + 1)IH(k)] = - 6iim)). and for Case 2, A(k) = 0, E[Zl(k + 1)IH(k) = L(k) -1) + L (-) + Erd, (d, - 6im) Define the vector of expected change E[Z(k+1) |H(k)] A (E[Z (k + 1) IH(k)], (ij) E S). Also define A(k) = (i, (ij) E S if A(k) > 0 and A(k) (,) ifA) 0ank)k L~k)'7 (i'j E S) if A(k) =0. Let Q E RNXN be given by Qjjj = 1, for (i, j) E S, . Qij m = 0, otherwise Then we can write the expected change of Bj(k) compactly as E[Z(k + 1)|H(k)} = (M - 7Q - I) A(k). 80 (3.14) Fix 5 > 0. Let ^ be small enough such that the function err(-y) in (3.12) satisfies err(-y) < 6. Using Lemma 3.6 we can choose Eand 7rd satisfying (3.11) and (3.12). In particular, we have IM - Mj 15 6. For small enough 6, both M and M have strictly positive entries in. the exact same locations. Since M is irreducible, it follows that M is irreducible. The Perron-Frobenius eigenvalue of a matrix which is the spectral norm of the matrix is a continuous function of its entries. For small enough 6, the Perron-Frobenius eigenvalue of M is bigger than 1, say 1+2C for some C > 0. Let z be the corresponding left eigenvector with all positive entries and let Zm = min(id)ES zi and zM A max(ij)ES z . Define the random process W(k) A z B (k). (3.15) (ij)ES Then setting AW(k +1) = W(k + 1) - W(k), from (3.14) we have E[AW(k + 1)|H(k)] = z'EZ(k + 1) = z' ( - IyQ) A(k) = 2(z'A(k) - 7 z'QA(k). The first term satisfies 2(z,, 2(z'A(k) 5 2 (zm. This is because 1'A(k) = 1 and hence z'A(k) is a convex combination of the entries of z. By choosing 7 small enough, we can ensure yz'QA(k) 5 Czm. Let %= Czm > 0. Then, we have E[AW(k + 1)IH(k)} > K. (3.16) We now use a one-sided Hoeffding bound argument to show that with high probability the quantity W(k) grows to a linear size by time en. 81 Let X(k + 1) = n -- AW(k + 1). Then E[X(k + 1)IH(k)] 0. (3.17) cw(n) almost surely, for some constant c > 0. Also note that IX(k + 1)1 For any B > 0 and for any -B < x < B, it can be verified that 2 B 1 eB - e-B e-" + eB <1 ex x<e2~ 2 +i - 2 2 I 2 e-B B- 2 x. Using the above, we get for any t > 0, E 1 etew(n) - e-taa(n) E[X(k 2 2 2 2 2 e_ t c 2 2 + k [6tX(k+l)|H E~ex~~lIH(k)J :! 2 1)IH(k)] < e where the last statement follows from (3.17). We can now compute en-1x E[etEko X(k+1)] - J en I t22Cj2n en E[etX(k+l)IH(k)] < e 2 k=O So, X(k + 1) > Ern/2 p = P(etX"-' X(k+1)-tEn/2> 1) _ e-tn+*, (k=O Optimizing over t, we get en-1 <2e= X(k + 1) > ern/2) < e~~iP->- = o(1), ( P k=O 82 2 t c w (n) 2 2 which follows by using Lemma 3.2. Substituting the definition of X(k + 1), P (W(en) < o(1). Recall that W(k) =Z(j)ES z B (k) ; Nzm max(ij)es B (k) Define i A (3.18) NzM max(ij)Es A (k). . Then it follows from (3.18) that there exists a pair (i',j') such that A, (en) > n, w.p 1 - o(1). Using the fact that the number of active clones grows to a linear size we now show that the corresponding component is of linear size. To do this, we continue the exploration process in a modified fashion from time en onwards. By this we mean, instead of choosing active clones uniformly at random in step 2(a) of the exploration process, we now follow a more specific order in which we choose the active clones and then reveal their neighbors. This is still a valid way of continuing the exploration process. The main technical result required for this purpose is Lemma 3.7 below. Lemma 3.7. Suppose that afteren steps of the explorationprocess, we have A' (en) > /in for some pair (i', j'). Then, there exists e 1 > e and 61 > 0 for which we can continue the exploration process in a modified way by altering the order in which active clones are chosen in step 2(a) of the exploration proces such that at time e1n, w.h.p. for all (i, j) E S, we have Aj(ein) > 61 n. The above lemma says that we can get to a point in the exploration process where there are linearly many active clones of every type. An immediate consequence of this is the Corollary 3.1 below. We remark here that Corollary 3.1 is merely one of the consequences of Lemma 3.7 an can be proved in a much simpler way. But as we 83 will see later, we need the full power of Lemma 3.7 to prove Theorem 3.2-(b). Corollary 3.1. Suppose that after en steps of the exploration process, we have A,(En) > mn for some pair (i',j'). Then there exists 62 > 0 such that w.h.p., the neighbors of the A~f' clones include at least 62n vertices in Gj. Before proving Lemma 3.7, we state a well known result. The proof can be obtained by standard large deviation techniques. We omit the proof. Lemma 3.8. Fix m. Suppose there are there are n objects consisting of ain objects of type i for I < i < m. Let 6 > 0 be a constant that satisfies / < maxi a . Suppose we pick 8n objects at random from these n objects without replacement. Then for given e' > 0 there exists z = z(e', m) such that, #ojects chosen of type i _< n I Proof of Lemma 3.7. The proof relies on the fact that the matrix M is irreducible. If we denote the underlying graph associated with M by W, then It is strongly connected. We consider the subgraph ' of W which is the shortest path tree in W rooted at the node (i',j'). We traverse '7f' breadth first. Let d be the depth of 7?'. We continue the exploration process from this point in d stages 1,2, ... , d. Stage I begins right after time en. Denote the time at which stage 1 ends by eCn. For convenience, we will assume a base stage 0, which includes all events until time en. For 1 < 1 < d, let E, be the set of nodes (i, j) at depth I in 7. We let Eo = {(i', j')}. We will prove by induction that for I = 0, 1,... , d, there exists 6) > 0 such that at the end of stage 1, we have w.h.p., Af > SVn for each (i,j) e U' =01. Note that at the end of stage 0 we have w.h.p. Ai,' > pn. So we can choose 5(O) = to satisfy the base case of the induction. Suppose I2,I = r. Stage I + 1 consists of 84 r substages, namely (I+ 1, 1), (1+1, 2),..., (1+1, r) where each substage addresses exactly one (i, j) E I. We start stage (I+1,1) by considering any (i, j) E Z . We reveal the neighbors of aJ(3n clones among the A > (l)n clones one by one. Here 0 < a < 1 is a constant that will describe shortly. The evolution of active clones in each of these ao3Un steps is identical to that in the event E, in Case 1 of the original exploration process. Fix any (j, m) E 24k. Note that Miim > 0 by construction of 7V. So by making e and el, ... , El smaller if necessary and choosing a small enough, we can conclude using Lemma 3.5 that for all time steps k < ein + a(Irn we have IIMk(n) - MII < 3 for any 3 > 0. Similarly, by using Lemma 3.4, we get d d(Njd(k) 3+ j (kdi dp A?{k) d\ -d.Nd(k) i(k i dzpd - . < Li'(k) - Jij . < J. (3.19) Aj By referring to the description of the exploration process for the event E in Case 1, the expected change in Z,7(k+ 1) during stage (I + 1,1) can be computed similar to (3.13) as AI (k) - 53( + L7 (k) - J (dmkin) E[Z((k (Mk(n))i (a) d, N'(k) d L-j k) - 4i A.7(k) - J - ( - 1)H(k)) +(-im)+ L (k) - J (b) Mijim - 23> 3, where (a) follows from (3.19) and (b) can be guaranteed by choosing small enough 8. The above argument can be repeated for each (j, m) E Ii. We now have all the ingredients we need to repeat the one-sided Hoeffding inequality argument earlier in this section. We can then conclude that there exists 85 3," > 0 such that w.h.p. we have at least bjn active clones of type (j, m) by the end of stage (I + 1, 1). By the same argument, this is also true for all children of (i, j) in re. Before starting stage S1we set () = min{(1 - a)5( 1 , 5)}. This makes sure that at every substage of stage I we have at least V(In clones of each kind that has been considered before. This enables us to use the same argument for all substages of stage 1. By continuing in this fashion, we can conclude that at the end of stage 1+ 1 we have 60(+,)n clones of each type (i, j) for each (i,j) E U1+1 for appropriately defined J('+'). The proof is now complete by induction. Proof of Corollary 3.1. Consider any j E [p]. We will prove that the giant component has linearly many vertices in G with high probability. Let d be such that pd > 0 and let di > 0 for some i E [p]. This means in the configuration model, each of these type d vertices have at least one clones of type (j, i). Continue the exploration process as in Lemma 3.7. For small enough 1Ethere are at least n( j- ei) of type (j, i) clones still unused at time c1 n. From Lemma 3.7, with high probability we have at least S 1n clones of type (i, j) at this point. Proceed by simply revealing the neighbors of each of these. Form Lemma 3.8, it follows that with high probability, we will cover at least a constant fraction of these clones which correspond to a linear number of vertices covered. Each of these vertices are in the giant component and the proof is now complete. 0 We now prove part(b) of Theorem 3.2. Part (a) will be proved in the next section. We use the argument by Molloy and Reed, except for the multipartite case, we will need the help of Lemma 3.7 to complete the argument. Proof of Theorem 3.2 (b). Consider two vertices u, v E g. We will upper bound the probability that u lies in the component C, which is the component being explored at 86 time en and v lies in a component of size bigger than /log n other than C. To do so start the exploration process at u and proceed till the time step ein in the statement of Lemma 3.7. At this time we are in the midst of revealing the component C. But this may not be the component of u because we may have restarted the exploration process using the "Initialization step" at some time between 0 and Ein. If it is not the component of u, then u does not lie in C. So, let us assume that indeed we are exploring the component of u. At this point continue the exploration process in a different way by switching to revealing the component of v. For v to lie in a #log component of size greater than n, the number of active clones in the exploration process associated with the component of v must remain positive for each of the first Slog n steps. At each step choices of neighbors are made uniformly at random. Also, from Lemma 3.7, C has at least 6in active clones of each type. For the component of v to be distinct from the component of u this choice must be different from any of these active clones of the component of u. So it follows that the probability of this event is bounded above by (1 - 3j)"*s. For large enough /, this gives P(C(u) = C, C(v) # C, IC(v)I > Plogn) = o(n- 2). Using a union bound over all pairs of vertices u and v completes the proof. 3.7 0 Size of the Giant Component In this section we complete the proof of Theorem 3.2-(a) regarding the size of the giant component. For the unipartite case, the first result regarding the size of the giant component was obtained by Molloy and Reed [44] by using Wormald's results [551 on using differential equations for random processes. As with previous results for 87 the unipartite case, we show that the size of the giant component as a fraction of n is concentrated around the survival probability of the edge-biased branching process. We do this in two steps. First we show that the probability that a certain vertex v lies in the giant component is approximately equal to the probability that the edgebiased branching process with v as its root grows to infinity. Linearity of expectation then shows that the expected fraction of vertices in the giant component is equal to this probability. We then prove a concentration result around this expected value to complete the proof of Theorem 3.2. These statements are proved formally in Lemma 3.10. Before we go into the details of the proof, we first prove a lemma which is a very widely used application of Azuma's inequality. Lemma 3.9. Let X = (X1 , X 2 ,..., Xt) be a vector valued random variable and let f(X) be a function defined on X. Let .Fk A a(X1,..., Xk). Assume that IE(f(X) Fk) - E(f(X)IFk+l)I c. almost surely. Then P(If(X) - E[f(X)]j > s) < 2e~2c2 Proof. The proof of this lemma is a standard martingale argument. We include it here for completeness. Define the random variables Yo,... , Y as Ye E(f (X) k). The sequence {Y} is a martingale and IY - Yk+l1I 5 c almost surely. Also Y = 88 f (X) and Y = E[f(X)]. The lemma then follows by applying Azuma's inequality to the martingale sequence {Y}. 0 Lemma 3.10. Let e > 0 be given. Let v E g be chosen uniformly at random. Then for large enough n, we have P(v E C) - P(1TI= oo)I < E. Proof. We use a coupling argument similar to that used by Bollobas and Riordan [8] where it was used to prove a similar result for "local" properties of random graphs. We couple the exploration process starting at v with the branching process Tn(v) by trying to replicate the event in the branching process as closely as often as possible. We describe the details below. The parameters of the distribution associated with Tn is given by \ . In the exploration process, at time step k the corresponding parameters are given by d, 4 d(k) (see Section 3.5). We first show that for each of the first 6logn steps of the exploration process, these two quantities are close to each other. The quantity di N(k) is the total number of sleeping clones at time k of type (j,i) in Gj that belong to a vertex of type d. At each step of the exploration process the total number of sleeping clones can change by at most w(n). Also Lj(k) is the total number of living clones of type (j,i) in Gj and can change by at most two in each step. Then initially for all (i, j) we have L (0) 89 = O(n) and until time 03logn it remains e(n). Therefore, diNd(k+ 1) d N (k) < d N'(k +1) -ds N'(k) L3 d(k +1) -i L2 L(k) -5ij - L. (k) -ij +di Nj (k +1) + (k) -6j dt Njd(k +1) L4 (k+1) -6 From the explanation above, the first term is O(w(n)/n) and the second term is . From this we can conclude by using a telescopic 0(1/n). Recall th- sum and triangle inequality that for time index k < 8 log n, di N'd (k) _ pd ch(n) = (kw(n)/n) = O(w(n) log n/n). , L3dL(k) - 6j Ai(n) So the total variational distance between the distribution of the exploration process and the branching process at each of the first 8 log n steps is 0(w(n) log n/n). We now describe the coupling between the branching process and the exploration process. For the first time step, note that the root of Tn has type (i, d) with probability pd. We can couple this with the exploration process by letting the vertex awakened in the "Initialization step" of the exploration process to be of type (i, d). Since the two probabilities- are the same, this step of the coupling succeeds with probability one. Suppose that we have defined the coupling until time k < 8log n. To describe the coupling at time step k + 1 we need to consider the case of two events. The first is the event when the coupling has succeeded until time k, i.e., the two processes are identical. In this case, since the total variational distance between the parameters of the two processes is O(w(n) log n/n) we perform a maximal coupling, i.e., a coupling which fails with probability equal to the total variational distance. For our purposes, we do not need to describe the coupling at time 90 k + 1 in the event that the coupling has failed at some previous time step. The probability that the coupling succeeds at each of the first # log n steps is at least (1 - O(w(n)logn/n))'O'og = 1- O(w(n)(logn)2 /n) = 1-o(1). We have shown that the coupling succeeds till time 8 log n with high probability. Assume that it indeed succeeds. In that case the component explored thus far is a tree. Therefore, at every step of the exploration process a sleeping vertex is awakened because otherwise landing on an active clone will result in a cycle. This means if the branching process has survived up until this point, the corresponding exploration process has also survived until this time and the component revealed has at least 6 log n vertices. Hence, P(IC(v)I > 8#log n) = P(ITrI > 3 log n) + o(1). But Theorem 3.2 (b) states that with high probability, there is only one component of size greater than #log n, which is the giant component, i.e., P(v E C) = P(IC(v)I > 8#log n) +o(1) = P(I TI > 83logn)+ o(1). So, for large enough n, we have IP(v E C) - P(7}j > fllogn) e/2. The survival probability of the branching process T is given by P(ITI = Choose K large enough such that n)-+ 0)=1 IP(IT| - P(T = i). > K) - P(|T| = oo)j < e/4. Also, since for all i, j, d, from the theory of branching processes, for large enough 91 jP(TnhI K) - P(1TI 1P(jTj = 00) - P(TJ K)l = Since for large enougn n, we have P(TnI = oo) e/4, oo)I < E/2. P(T-I > ,logn) 5 P(IT. K), E the proof follows by combining the above statements. Now what is left is to show that the size of the giant component concentrates around its expected value. Proof of Theorem 3.2 (a) - (size of the giant component). From the first two parts of Theorem 3.2, with high probability we can categorize all the vertices of g into two parts, those which lie in the giant component, and those which lie in a component of size smaller than 03logn, i.e., in small components. The expected value of the fraction of vertices in small components is 1- ' + o(1). We will now show that the fraction of vertices in small components concentrates around this mean. Recall that cn = n ZiEpdED1'd Pf is the number of edges in the configuration model. Let us consider the random process where the edges of the configuration model are revealed one by one. Each edge corresponds to a matching between clones. Let Ei 1 < i < cn denote the (random) edges. Let Ns denote the number of vertices in small components, i.e., in components of size smaller than # log n. We wish to apply Lemma 3.9 to obtain the desired concentration result for which we need to bound IE[NsIEi,. .. , Ec] - E[NSIE1, ... , Ek+. 1. In the term E[NsIEi,.. ], let . , E+ 1 Ek+1 be the edge (x, y). The expectation is taken over all possible outcomes of the rest of the edges with Ek+1 fixed to be the edge (x, y). In the first term E[NsEi,..., Ek,] after E1,.. . , Ek are revealed, the expectation is taken over the rest of of the edges, 92 which are chosen uniformly at random among all possible edges. All outcomes are equally likely. We construct a mapping from each possible outcome to an outcome that has Ek+1 = (x, y). In particular, if the outcome contains the edge (x, y) we can map it to the corresponding outcome with Ek+1 = (x, y) by simply cross-switching the positions of (x, y) with the edge that occured at k + 1. This does not change the value of Ns because it does not depend on the order in which the matching is revealed. On the other hand, if the outcome does not contain (x, y), then we map it to one of the outcomes with Ek+1 = (x, y) by switching the two edges connected to the vertices x and y. We claim that switching two edges in the configuration model can change NS by at most 4,6 log n. To see why observe that we can split the process of cross-switching two edges into four steps. In the first two steps we delete each of the two edges one by one and in the next two steps we put them back one by one in the switched position. Deleting an edge can increase Ns by at most 2/6 log n and can never reduce Ns. Adding an edge can decrease Ns by at most 2/6 log n and can never increase Ns. So cross-switching can either increase or decrease NS by at most 4/0 log n. Using this we conclude IE[NsIEi,. .. , Ek] - E[NsjE1,... , Ek+] 4log n. We now apply Lemma 3.9 to obtain. P (Ns - (1 - 71)) > < e- (1). Since with high probability, the number of vertices in the giant component is n - Ns, the above concentration result completes the proof. 93 0 3.8 Subcritical Case In this section we prove Theorem 3.3. The idea of the proof is quite similar to that of the supercritical case. The strategy of the proof is similar to that used in [43]. More specifically, we consider the event E, that a fixed vertex v lies in a component of size greater than Cw(n)2 log n for some C > 0. We will show that P(E,) = o(n'). Theorem 3.3 then follows by taking a union bound over v E G. Assume that we start the exploration process at the vertex v. For v to lie in a component of size greater than (w(n) 2 log n the exploration process must remain positive for at least (w(n) 2 logn time steps, at each step of the exploration process, at most one vertex is new vertex is added to the component being revealed. This means at time (w(n) 2 log n we must have A (Cw(n) 2 log n) > 0, where recall that A(k) denotes the total number of active clones at time k of the exploration process. Let H(k) = c-({A(r), (ij) E S, I < r < k}) denote the filtration of the ex- ploration process till time k. We will assume that A(k) > 0 for 0 < k < (w(n) 2 log n and upper bound P(A((w(n) 2 log n) > 0). We first compute the expected conditional change in the number of active clones at time k for 0 < k < (w(n) 2 logn by splitting the outcomes into the several possible cases that affects Z-'(k + 1) as in (3.13). E[Z (k + 1)|H(k)} = P(EI H(k)) E[Z(k + 1)1H(k), Efl + 1] P(Eln EaIH(k)) E[Z(k + 1)IH(k), E,' n Eal m,d + P(E,. n Ed|H(k)) E[Z(k + 1)jH(k), E n Ed] 94 Ai(k) _1) + E A(k) A md A(k) A?-(k) A(k). nk)'n)(-j5mj) A(k) LV(k) N(k)diN(k) (d 5 Rin(k) A. (k) Ai(k) A+(k) L(j)(k) + A 4'(k) 1:di N (k) d i) d 1-6" L k A(k) We proceed with the proof in a similar fashion to the proof of the supercritical case. Let E[Z(k + 1)IH(k)] = (E[Z '(k + 1)jH(k), (ij) E S) and define the vector quantity A(k) = (j), (ij) E S). Also define the matrix Q(k) E RNxN where rows and columns are indexed by double indices and for each (i, j) E S, and A (k) Qi1 (k)=- 14k Qijim(k) = 0 for (l, m) # (j, i). Then the expected change in the number of active clones of various types can be compactly written as E[Z(k + 1)IH(k)]= (M(k) - I + Q(k)) A(k). As the exploration process proceeds, the matrix M(k) changes over time. However for large enough n, it follows from Lemma 3.5 that the difference between M(k) and M is small for 0 < k < 1(w(n) 2 log n. In particular given any e > 0, for large enough n, we have IIM(k) - Mjj < E. Also from Lemma 3.4 we also have IIQ(k)I < e. Let z be the Perron-Frobenius eigenvector of M. By the assumption in Theorem 3.3, we 95 have z'M= (1 -6)z' for some 0 < 6 < 1, where (1-6) = -yis the Perron-Frobenius eigenvalue of M. Also let zm = mini z, and zM 4 max, zi. Define the random process W(k) zA(k) Then the expected conditional change in W(k) is given by E(AW(k + 1)jH(k)) = z'EZ(k +1) = z'(M(k) - I+ Q(k)) A(k) = z'(M - I)A(k) + z'(M(k) - M + Q(k))A(k) = (-6)z'A(k) + z'(M(k) - M + Q(k))A(k). We can choose E small enough such that z'(M(k) - M + Q(k)) < 6z', where the inequality refers to element wise inequality. Thus E(AW(k)jH(k)) <- z'A(k) < 2z,= K. We can now repeat the one-sided Hoeffding bound argument following equation (3.16) in the supercritical case and obtain the following inequality: 2 P(JW(ai) + Ka)j > 6) < 96 2e 2-.(n) Setting a = (w 2 (n) logn and 5= ra, we get P(W(Cw2 (n) log n) > 0) :; 2e = o(n-'), for large enough C. We conclude P(G has a component bigger than Cw 2(n) logn) <E P(C(v) > Clogn) o(1). VE9 This completes the proof of the theorem. 3.9 Future Work Our result addresses the super-critical (Theorem 3.2) and sub-critical (Theorem 3.3) case, but leaves the critical case unresolved. This has been studied in detail for random graphs with degree distributions for the uni-partite case [34]. It may be possible to extend these results to the multipartite case. It may also be possible to strengthen the concentration bounds in our results by establishing exponential decay like in [8], via a purely branching process based analysis. 97 98 Chapter 4 MAX-CUT on Bounded Degree Graphs with Random Edge Deletions 4.1 Introduction The problem of finding a cut of maxi1um size in an arbitrary graph g = (V, C) is a well known NP-hard problem in algorithmic complexity theory. In fact, even when restricted to the set of 3-regular graphs, MAX-CUT is NP-hard with a constant factor approximability gap [51, i.e., it is NP-hard to find a cut whose size is at least (1- ecr,.t) times the size of the maximum cut, where e,.j = 0.003. The weighted MAX-CUT is a generalization of the MAX-CUT problem, where each edge e e E is associated with a weight we, and one is required to find a cut such that the sum of the weights of the edges in the cut is maximum. We study a randomized version of the weighted MAX-CUT problem, where each 99 weight W is a Ber(p) random variable, for some 0 < p < 1, and the weights associated with distinct edges are independent. Since the weights are binary, weighted MAX-CUT on the randomly weighted graph is equivalent to MAX-CUT on a thinned random graph where the edges associated with zero weights have been deleted. We call this problem the thinned MAX-CUT. The variable p controls the amount of thinning. It is particularly simple to analyze thinned MAX-CUT, when p takes one of the extreme values, i.e., p = 1 and p = 0. When p = 1, all edges are retained and the thinned graph is same as the original graph and finding the maximum cut remains computationally hard. On the other hand, when p = 0, the thinned graph has no edges, and the MAX-CUT problem is trivial. This leads to a natural question of whether there is a hardness phase transition at some value of 0 < p = pc < 1. In this chapter, we identify a threshold for hardness phase transition in the thinned, MAX-CUT problem on graphs with maximum degree d = 0(1). We show that on the set of all graphs with degree at most d, the phase transition occurs at pc = . This phase transition coincides with a phase transition in the connectivity properties of the random thinned graph. For p < pc we show that the random d1 graph resulting from edge deletions undergoes percolation, i.e., it disintegrates into disjoint connected components of size O(logn). The existence of a polynomial time algorithm to compute MAX-CUT then follows easily because it is possible to simply employ brute force computation in this case. On the other hand, For p > pc, we show NP-hardness by constructing a reduction from EiW-optimal MAX-CUT on 3-regular graphs. Our reduction proof uses the random bipartite graph based gadget Wi similar to [50], where it was used to establish hardness of computation of the partition function of the hardcore model. Given a 3-regular graph g on n vertices, the gadget '? is d-regular and is constructed by first replacing each vertex v of g by a random bipartite graph J, of size 2n2 consisting of two parts 100 , and S,, and then adding connector edges between each pair of these bipartite graphs whenever there is an edge between the corresponding vertices in g. The key result in our reduction argument is that the gadget 1 satisfies a certain polarizationproperty which carries information about the MAX-CUT on the 3-regular graph g used to construct it. Namely, we show that any maximal cut of the thinned graph 1, obtained from I must fully polarize each bipartite graph and include all its edges in the cut, i.e., either assign the value 1 to almost all vertices in X and 0 to almost all vertices in S,, or vice versa. Then, we show that the cut Cg of g obtained by assigned binary values to its vertices v based on the polarity of J,, must be at least (1 - eit.a)optimal. To establish this polarization property we draw heavily on the results and proof techniques from Chapter 3 about the giant component on random multipartite graphs. Our study of the thinned MAX-CUT problem is motivated from the point of view of establishing a correspondence between phase transition in hardness of computation and decay of correlations. Our formulation of the thinned MAX-CUT problem was inspired by a similar random weighted maximum independent set problem studied by Gamarnik et. al in [19]. The rest of the chapter is organized as follows. In Section 4.2, we state our main theorems that establish the hardness phase transition threshold for thinned MAXCUT. In Section 4.3 we prove that below the critical threshold thinned MAX-CUT can be solved in polynomial time and in Section 4.4, we construct a reduction from -optimal MAX-CUT to show NP-hardness of thinned MAX-CUT above the critical threshold. Finally in Section 4.5, we end with some concluding remarks and open problems. 101 4.2 Main Results In this section, we state our main theorems regarding the hardness phase transition in thinned MAX-CUT on bounded degree graphs. Given an integer d > 3, define the critical probability pc as (4.1) pe(d) Theorem 4.1. Suppose d is fixed and let p <pe(d). There exists a polynomial time algorithmA such that given any graph g on n vertices and maximum degree d, with high probabilityA solves the thinned-MAX-CUT problem on G, i.e., produces a maximal cut of the thinned random graph G,. Theorem 4.2. Suppose d is fixed and let p > pe(d). There exists a pairof polynomial time algorithms A 1 and A 2 such that, given an arbitrary3-regular graph G on n vertices: " A, constructs a random d-regulargraph W from g on 2n2 vertices. " Let Cu, be a solution to thinned-MAX-CUT on , i.e., a maximal cut on ,. Then A 2 uses C, to produce a cut Cg on g such that with high probability, the cut Cg is an c,.gweoptimal cut of G. The following result from [5] justifies the reduction in Theorem 4.2. A cut Cg of a graph G is said to be c-optimal if its size is at least (1 - c) times the size of the maximum cut of G. Theorem 4.3 ([5]). Let ec,tic = 0.003. The problem of finding an 6c,.iew-optimal cut on the set of 3-regulargraphs is NP-hard. 102 Theorem 4.2 in conjunction with Theorem 4.3 states that if p > pc(d), the thinned-MAX-CUT problem is NP-hard by providing a reduction from c,-itma-optimal MAX-CUT on 3-regular graphs. Along with Theorem 4.1, this shows that the thinned-MAX-CUT problem undergoes a computational hardness phase transition at p = Pc. 4.3 Proof of Theorem 4.1 The proof of Theorem 4.1 is closely related to percolation on random graphs. Indeed we can show that the following holds. Proposition 4.1. With high probability, all connected components of 9, are of size O(log n). We first show how this proposition can be used to prove Theorem 4.1. Proof of Theorem 4.1. Since g, is a disjoint union of connected components, finding the maximum cut on g, is equivalent to finding the maximum cut on each of the disjoint components individually. Since from Proposition 4.1 each of the component is of size O(log n) w.h.p, we can use brute force evaluation to find the maximum cut of each component in time 2 0(2*sn) = no(i). Since, there are at most n such components the proof is complete. 0 What is left is to prove Proposition 4.1. Proof of Proposition4.1. Set p(d - 1) = 1 - E with e > 0. Let v E V be any vertex. Let C, = (Vt, E.) be the connected component of g containing v. Similarly, let Cv,, = (V,,,, E , ,,) be the connected component of .,, containing v. We reveal the component C,,,, sequentially via the following exploration process. 103 (a) Initialization. Set for time t = 0, - The set of explored vertices Ve = 0 - The set of unexplored vertices Ve,o = v - The set of initial edge E, = E (b) Repeat until V,, 0 and time 1 < t < n: 1. Pick any u E V,,t. 2. Let u1 ,...um be the neighbors of u in the graph (Vt, Ce) with edges (u, U) E Ce. For each 1 < i < M, delete each edge (u, uj) independently with probability (1 - p). Remove these deleted edges from the set 4e. Let (u, uj),. .. , (u, uiN(t) be the undeleted edges. 3. Set Ve +-Ve U u and Vu,t+1 +- VUt U {u, ... USN(t) }\u. The above exploration process terminates in at most n steps with Ve = C,,p and Ee = S,,,. The way the exploration process proceeds ensures that every edge in S, is verified for deletion exactly once. In each iterate of step (b), the number of vertices added to Ve is N(t) which is a random variable distributed as N(t) ~ Bin(M,p). Since G has maximum degree d, we must have M < d - 1 (other than the first step when M = d). In addition, the random variables N(t) are independent for different t. If IV,,,J k log n, then we must have IV,,tI > 0 for 1 < t < klogn. Note that IVu,tI = 1 + E'. N(r) - t. Let B 1, B2 ,..., Bn be i.i.d. Bin(d - 1, p). Then from the preceding discussion, the random variables Bt stochastically dominate the random variables N(t) and hence 104 Vu, I is stochastically dominated by 1+ Et B, - t. Therefore, klogn (4.2) N(r) > klogn - 1) P(IVu,kIgn > 0) = P( r=-1 klogn P(Z B, > klogn - 1) (4.3) klogn ) (4.4) =P - Therefore, P(Cvp, o(1/n) E k [Br - (1 > log nk for large enough k. 4.4.1 (4.5) > klog n) = o(1/n). Taking a union bound over all vertices in g completes the proof of the proposition. 4.4 log n 0 Proof of Theorem 4.2 Construction of gadget for reduction In this section, we describe the steps in the construction of the graph Wi in Theorem 4.2 from any given 3-regular graph G. These steps together form the description of the algorithm A 1 . A similar construction was used by Sly in [501 to prove hardness of approximation of the partition function in the hardcore model. Let G = (V, E) with IVI = n. For each v E V, we first construct a random bipartite graph J, on 2n2 vertices. Here 0 < - < 1 is a fixed constant that we will choose later. Let us denote by R and S, the two parts of the bipartite graph J, each of size n2 . To construct the random graph J,, we first fix a degree sequence consisting of (1 - y) fraction of degree d vertices and y fraction of degree d - 1 vertices in each of R, and S,. We then construct a random bipartite graph using the 105 configuration model described in Chapter 3 with the degree distribution described above. Note that the number of edges in the bipartite graph is given by cn2 , where (4.6) c =d(l - -y)+ (d - 1)-j. We call the -yn2 vertices with degree d - 1 in & and S, the connector vertices and denote them by C, = CR, U Cs,. Divide CR, and Cs, into three equal parts each of size in2 and denote them by CR,.,, and Cs,,,, for 1 < i < 3 where vi, v 2 , v3 are the neighbors of v in g. We construct ^nb regular trees with out degree d - 1 and depth 2 [O.511 log nj where it 2b . Note that each tree has n2-b leaves and there are -n 2 leaves in total. Identify these leaves with the -n 2 connector vertices CS. Is. We refer to the roots of the corresponding trees as TR;,, and Ts,, . CR,,V1 . Repeat the same process for the rest of the connector vertices in CR,,,, and For each (i,j) E E, add edges between Ji and J by forming a matching between nbroots in TRy and 'n' roots in TR,,i. At the end of the construction all of the roots have been matched. This concludes the construction of W and hence the steps in the algorithm A 1 . From the above description, it is clear that A 1 runs in polynomial time. Note that each vertex in W has degree exactly equal to d. The graph 7, is obtained from W by deleting each edge independently with probability 1 - p. This particular way of constructing 7-, allows us to relate the MAX-CUT on 7, to the MAX-CUT on g. In the following sections, we prove properties of 7, and the MAX-CUT on that will lead us to the proof of Theorem 4.2. 106 7, JJu n Figure 4-1: Illustration of the bipartite graphs J. and J, in 7- associated with an edge (u, v) E 4.4.2 Properties of '-4, In this section, we prove a series of properties of the thinned gadget 7%,that helps us in determining the key properties of any maximum cut of N7-,. First, we consider the properties of each of the bipartite graphs Jo, individually. Performing the thinning process on the random bipartite graph J, leads to a random bipartite graph with a specific degree distribution. To analyze this degree distribution, it is useful to think in terms of the configuration model used for generating J,. Since the edge deletion process is performed independently of the random graph construction, we can think of the deleted edges directly terms of the clones in the configuration model. Each clone in R, and S, is associated with an edge. So deleting an edge in the configuration model corresponds to deleting a clone in R, and its neighboring clone in S,. Because of the independence, the process of random matching in the configuration model and then deleting each edge with probability 1 - p is equivalent to: 107 0 Sampling N ~ Bin(cn2 ,p). * Retaining N clones from the en 2 clones in each of R,, and S, uniformly at random and deleting the rest. e Performing a random matching between the remaining N clones in R, and S,. Let Nd,, denote the number of non-connector vertices in R, that have degree r after the thinning, and let Nd-l,, denote the number of connector vertices in R, that have degree r after the thinning. Let b,(m, r) denote the binomial coefficient given by b,(m, r) = (')p_(1 -)" The following concentration inequalities can be obtained by standard Chernoff bounds: there exists el > 0 such that " (IN/cn 2 _ P " (4-7) >C)<-,,n2 N(( , 2 - b,(d, r) > (1( - -y)nI N- '1,-b,(d - 1, r) > f < e-1" 2 , (4.8) < c-i"2(49 The degree distribution of J, after random edge deletions can be inferred from (4.7)-(4.9). Denoting by p3 the fraction of vertices of degree j in & and S, we get that = (1 - y)b,(d,j)+yb,(d-1,j), (1 - y)bp(d, d),7if if0 jd-1 (4.10) j = d. In the discussion after Theorem 3.2 in Chapter 3, the condition for the existence of 108 a giant component is stated for the special case of bipartite graphs as jk(jk - j - k)pjqk > 0. (4.11) j,k When pj = qj, the above condition reduces to Zjk(jk - j - k)p pk > 0 (4.12) j,k 2 2 2pj (4.13) >2pj 0 2 (4.14) - 2jp 3 > 0. j Plugging in the values of pj in (4.14), we get d-1 2pj - 2jpj = 2 Z(j 1 - 2j)[(1 - 7)b,(d, j) + 7b,(d - 1, j)] (4.15) + (1 - 7) (d2 - 2d)b,(d, d) d (1 - Y) (4.16) d-1 ( 2 U(2 - 2j)b,(d - 1, j) - 2j)b,(d, j) +7 (4.17) j=1 j=1 (Z) (1 - -y)dp[(d - 1)p - 1] + -y(d - 1)p[(d - 2)p - 1] > 6,, (4.18) where 6, > 0 is a constant and (x) follows by choosing -f small enough, and using the fact that p(d - 1) > 1. We can use the results of Chapter 3 to conclude the existence of a giant component in J, with high probability. We however need sharper concentration inequalities than those in Chapter 3. So, instead we follow the proofs and exploit the fact that the 109 degree distribution of J, has maximum degree d = 0(1) to obtain the following lemmas. Lemma 4.1. With probability at least 1 - e-2n2 , for every v E V, the random bipartite graph J, has a giant component. Before proving the above lemma, we first prove the following variation of Lemma 3.7 with a tighter concentration inequality. Lemma 4.2. Suppose that the maximum degree w(n) = d = 0(1). There exists a constant t > 0, such that after time tn in the exploration process, the number of active clones in each part A2(tn 2 ) and A'(tn2 ) is greaterthan 6n2 with probability at least 1 - e~,*2 for some positive constants 6 and ei. Proof. The proof of the lemma is along the lines of the proof of Theorem 3.3. Following the proof of Theorem 3.3 and taking into account the fact that the maximum degree w(n) = d in (3.1), we can strengthen the inequality in Eq. (3.18) to obtain where el = 2 8c'd and W(.) is defined in (3.15). We can then proceed by using the above argument in the proof of Lemma 3.7 (and make el smaller if necessary) to 0 complete the proof. Proof of Lemma 4.1. For any v E V, by Lemma 4.2, the number of active clones in the exploration process associated with the random bipartite graph J, reaches size greater than 6n2 with probability at least 1 - e-,n 2 . Taking a union bound over all v E V, we can extend the previous statement to all Jo,. The proof is then complete by noticing that the component corresponding to these active clones must be a giant 0 component. 110 We would now like to show that the trees TH, , , have a linear fraction of their leaves in the giant component. The next lemma formalizes this statement. Lemma 4.3. The following statements hold: (a) There exists 6e > 0 such that, with large enough n with probability greaterthan 1 - n- 9 8 , for every v E V, J, has a giant component and there are at least en2 -b connector leaves of each tree in T in the giant component of J,. (b) Let NT be the number of leaves of a tree T E T in the giant component of J,. Then conditioned on NT, the distribution of of these NT leaves is uniformly at random among all n2 - leaves of T. Proof. We prove the lemma by using a similar line of argument to proving the concentration inequalities on the size of the giant component in Section 3.7 in the course of the proof of Theorem 3.2 (a). We rewrite the modified proof here for completeness. Recall that the probability that every other component other than the giant component is of size less than 6 log n is at least 1-n-1 00 for some constant [ > 0 and large enough n. Taking a union bound over all vertices v E V, we can ensure that the preceding statement holds for all J, with probability at least 1-n-9 = o(1). In the subsequent parts of the proof, we will assume that this is the case. The proof then proceeds in two parts. In the first part, we show that there exists a constant %e > 0 such that the probability that a connector vertex is in the giant component is at least %e. Let T be a random Galton-Watson branching process which is the edge-biased branching process specifically associated with a connector vertex. Then in T, the root has offspring distribution Bin(d -1, p) and each offspring of the root acts as a root of the standard edge-biased branchingprocess associated with the degree distribution in (4.10)(see Section 3.3 for details). Recall from Chapter 3 that 111 the condition for existence of a giant component is equivalent to the condition that the edge-biased branching process has a positive survival probability. Hence, denoting by qr,,, the survival probability of the standard edge-biased branching process, we have that p(d - 1) > 0 implies q.,,j > 0. Define q = P(IT = o) as the survival probability of the branching process T. Then q = E' bp(d - 1, i)q ,.,ivc > 0. Fix one of the connector trees T. Recall that T has n 2 - leaves that axe identified with n2-b of the connector vertices. Denote the set of leaves of T by LT. Let I E LT be any leaf. Then using the coupling argument in the proof of Lemma 3.10, we can show that for large enough n, P(l E Giant component of J,) > 0.9q A qg. (4.20) Now we show concentration of the number of vertices in LT in the giant component of J around its mean value. Let Ns denote the number of vertices in LT that are in small components, i.e., components of size less than 8 log n. Following the proof of Theorem 3.2 (a) and denoting by Ej the random edges of the configuration model, we get as before IE[NsIEi, ... , Ek] - E[NsIEi,..., Ek+,] <_ 4/3logn. (4.21) Note that E[Ns < n2-'(1 - q.). So using Lemma 3.9 we get, 62n2-2b P (in(b(Ns - (1 - qc))I > 6) < 2e 8c, 2 1og 2 n. (4.22) The proof then follows by choosing 6 < qc and defining 6 =qc - 6 and then taking a union bound over all the -ynb+1 trees in 7W,. 112 E Equipped with the above lemmas regarding the giant component of each J, we now proceed in the following section to analyze the structure of any maximum cut of 74. 4.4.3 Properties of MAX-CUT on '4 In this section, we analyze the properties of a maximum cut CH, of 74. Before that, we first state some well known results regarding Galton-Watson branching processes relevant to our case. These results apply directly to the connector trees attached to the bipartite graphs J,. Let T be a Galton-Watson branching process with offspring distribution X Bin(d - 1,p). Denote by m = E[XI = p(d - 1) > 1 and let ZN be the number of vertices at depth N in T. It is known that (see for e.g., [2]) there exists a non-negative random variable W such that lim m-NN = W, N-+oo w.p. 1. (4.23) We now state a couple of known lemmas about the branching process T and the distribution of W. Lemma 4.4. Let 1 < y < m be a constant. Then, P(ZN = 0) =Pezt + ON() (4.24) where p_ is the extinction probability of the branching process, and P(0 < ZN < yN) P(O < W < (y/m)N)(1 + oN(-)). 113 (4.25) Here oN(1) denotes terms that converge to 0 as N -+ oo. Proof. Both the statements can be proved directly by using KN= yN in Corollary 0 5 in [17] and observing that E[Zi log Z1] <00. Let f,(s) A E__~ b,(d - 1, i)s' be the probability generating function of the offspring distribution of T. Define a A 9o f'(Pext) > 0. Lemma 4.5. There exists C > 0 such that for any 0 < x < 1, we have (4.26) P(0 < W < X) < Cxa. Proof. From [6], [14], we have that there exists a constant C > 0 such that the density function w(x) of W exists and satisfies for all 0 < x < 1, . w(x) < Cx"- The proof then follows by integrating both sides of the above inequality. L We will assume that the quantity b > 0 is chosen such that it satisfies the following inequality: b < min 2m 1.9 log (+ , 0.01, aplog 2 (~1 + m) 9 log(d - 1) 1 . (4.27) As we will prove shortly, the branching process T truncated at depth Plog n is either already extinct or has O(mAIon) leaves w.h.p. as n -+ oo. This motivates the definition of the following property for the bipartite graphs J, 114 Property 4.1. All trees associated with J, have either no leaves or have at least ntL leaves, where 6L > 0 is a constant defined by or19 log (1-')og(2 log(d - 1) ~ 2 109) 12 .(428 log(d - 1) (4.28) The inequality follows from (4.27). The following lemma characterizes when Property 4.1 holds. Lemma 4.6. The following statements hold for the trees in 7i. (a) There exzists pL > 0 such that for any v E V, Property 4.1 is satisfied by J, with probability at least 1 - n~PL for large enough n. (b) Let NL be the number of vertices v e V for with J, satisfies Property4.1. Then there exists 6 3 > 0 such that for large enough n, P(NL > 0.9999n) > 1 - e-n. Proof. Let T be any tree connected to J,. Then T is a branching process with offspring distribution Bin(d - 1, p) and depth N = p log n where recall that M = 2b. Let 1 < y A I-" < m. Using this value of N and y in Lemma 4.4 and log(d-1) 2 ==> <( Zw< Z < nWL) < n-P where the second inequality follows from the fact that yo (4.29) = n (2-b) >tn6 , P(O < Z, iogn < y''*g") < C(Y/y/m * Lemma 4.5, we get with 0 < p A aplog (2n). Taking a union bound over all n' trees in J(v), we get 115 that J, satisfies Property 4.1 with probability at least n-() that p - b > 0). By defining p (note that (4.27) says - p - b > 0, the proof of part (a) is complete. Since the gadgets J, are independent, part (b) follows by using standard large 0 deviation inequalities. A cut on 74, is equivalent to assigning binary values to the vertices of 74,, where the size of the cut is determined by the number of edges whose adjacent vertices have opposite assigned values. We can now prove the following property that must be satisfied by any maximum cut of 7ip. We call this property the full polarization property. Proposition 4.2. With high probability, every maximum cut Cu, satisfies the full polarizationproperty, i.e., for every v E V, one of the following is true: Option 1: All but O(nL-2b) vertices in R, that are in the giant component of J,, are assigned the value 1 and all but O(n&-Lb) vertices in S. that are in the giant component of J, are assigned the value 0. Option 2: All but Q(nL- 2 b) vertices in R that are in the giant component of J,.,, are assigned the value 0 and all but O(n'L-) vertices in S, that are in the giant component of J,' are assigned the value 1. To prove that each J,, satisfies the full polarization property under any maximal cut C1 ,, we need to prove a property of the giant component on J, that we call the strong connectivity property. This is stated in the following lemma. Lemma 4.7 (Strong connectivity of the giant component). The following statement is true with high probability: For every partition of the giant component of J, into 116 two parts A , and B each of size at least nk- 2 b, there exists at least n6 edges between . A,, and B, Proof. Fix any v E V. Let E be the event that the statement given in the lemma is false, i.e., there exists a partition A., B, of the giant component such that jBvj > nb-L IA, > but there are less than n' edges between A , and B,, . Then deleting these edges leads to B,, being disconnected from A, , . Suppose B , ,1, B, 2 ,..., B,, are the components of B, after being disconnected from A,. Then since each of the components were originally a part of one singe connected component, each B,, must have had at least one edge connecting it to A,. We claim that, conditioned on the event E, there exists an 1 < i < M such that |B ,,5 > n8 -5b. To prove this, suppose that for all i, jB,,j 5 n M = . Then n . Since, each of the components has at least one edge with A, there are at least n' edges between A , and B, . This contradicts the assumption that event E holds. Since IA , 1 tB,, , the above claim holds for A. as well. In other words, denoting by A , ,* and B,,, the largest components in A,, and B,, after removing the connecting edges, we have that conditioned on the event E, both A.,* and B,, are of size at least nk-b Suppose that we introduce further randomness into the random graph 7,. In particular, we delete each edge of W, independently with probability f(n) = e-nb and obtain the random graph W,. Note that 74 can also be obtained directly from W by retaining each edge of W with probability p(l - e-nb). Denote the bipartite gadget corresponding to J, in N, by J.. Let A,,,. and B,, be the two largest connected components of J.. Let F be the event that in the random graph JA, we have |A,4* B,,,*j First, we think of n6 L-. We will now find upper and lower bounds on P(F). 7, being obtained by first generating W, and then deleting 117 each edge of Up with probability f(n). Then, conditioned on the occurrence of the event E in 4,, the probability that all of the edges between A, and B, are deleted and none of the other edges are deleted is at least (e-nb)n2 (l enb) 2 . This results -_ in at least two components Av,, and B,,. each of size at least n61-'. This leads to the following fact: enb)c 2 . Fact 4.1. P(FIE) > (e-n")(1 Second, we think of obtaining 'p, directly from W by retaining each edge with probability p(l - e'). For large enough n the difference between the degree distri- bution of J, and J,, is insignificant. More precisely, all of the results in (4.7)-(4.9) about the degree distribution still holds for J. As a consequence, the statement in Lemma 4.2 holds as well, i.e., with probability at least 1 - e~ 1 "2 there exists a time tn in the exploration process of J when both Al(tn) and Al(tn) are at least 6n2 . In this case, the component corresponding to these active cones is the giant component and is of size E(n'). Now, occurrence of the event F requires that there are two components in , of size greater than n - b. This means for F to occur, there must be a component other than the giant component in au with size at least n6L-. Let w be any vertex of J. unexplored till time tin. Then the probability that the component associated with w is of size at least n6L-b and is not the giant component is at most (1 - 6) - = e1O(1-)nL . Taking a union bound over all vertices in J, we have that P(F) 2 n2,1og(1-6)fl-Lb(_ - + e~1" 2 , (4.30) where the second term corresponds to the probability of the converse event in Lemma 118 4.2. Combining with Fact 4.1, we have that (E P(F) P(FIE) e-1n 2) + e~e"2 e-nb) CI 2 <4n2e1Og(1.A)niL 2n2e10-(1)ntL-"( e-nb(1 - - 5-+n From (4.27) we have that b < 6 L/9. Hence, we get . P(E) < e-e(nb) The proof then follows by taking a union bound over all v E V. 0 Proof of Proposition4.2. Let R,, c R be the part of the giant component of J, that is assigned the value 1 and Rv,o c R, be the part of the giant component of J, that is assigned the value 0 in a maximal cut Cw,. Similarly define S,,1 and S.,o. Define A. = R,1 U Sv,o and B. = Rv,o U S,,1 . Assume without loss of generality that IB.1 IA,|. Suppose that IB,1 > n6 L-b. Then using Lemma 4.7, with high probability, there must be at least n' edges between A, and B,. These edges are not a part of the cut CW,. Consider the cut Cu, obtained from Cu, by modifying only the binary assignments in J, and the trees T, as follows: * Assign all of R, the value 1, all of S, the value 0. e Assign all even levels of the trees TR. the value 1 and all odd levels 0. e Assign all even levels of the trees Ts. the value 0 and all odd levels 1. Let Z, be the set of all edges of 74 with at least one end in J, U T,. Then, the number of edges in the cuts Cu, and Cu, outside of the set Z, are equaL Within 119 the set Z, the cut Cu has at least IZj - 0(n') edges in it, i.e., all edges of Z, with the possible exclusion of the connector edges. On the other hand, the number IZvI - of edges of Z, in the cut CW, is at most 0(n). Hence, the cut Cw cannot be a maximal cut of T, and we arrive at a contradiction. 0 We have established that all of the giant components are (almost) fully polarized in any maximum cut. However, there are two different choices for the polarity of each giant component, obtained by switching 0 and 1. We now show that the choice of polarity affects the size of the cut. In other words, the number of connector edges between two gadgets J. and Jv that are in a cut depends on whether the giant components in J. and J, have the same or different polarity. For this to happen, the trees forming the connection should have some of their leaves in the giant component. This is summarized in the lemma below. Lemma 4.8. With high probability, for every v E V, every tree associated with J, 6 L leaves in the giant component of that has greater than n6L leaves, has at least e.n J,, where be is the constant in Lemma 4.3. Proof. Let E be the event that for every v E V, every tree in T, has at least &n 2 -b of its original leaves (before edge deletions in the trees) in the giant component of J,. From Lemma 4.3 (a), we have that P(E) > 1 - n-'. Let T be any tree in J, with at least nL leaves after edge deletions. Let NT be the original number of leaves of T in the giant component of J,. Then conditioned on E, we have NT Scn2-. Then the expected number of leaves of T in the giant component of J, after edge deletions is 6cfnl. The proof then follows by standard large deviation arguments and taking a union bound over all trees in N,. LI 120 Definition 4.1. We call two trees adjacent or neighboring if their roots were matched during the construction of W-. Note that if a tree has zero leaves, or its neighboring tree has zero leaves, then it is inconsequential in affecting the polarity of the maximum cut Cu,. This motivates the following definition. Definition 4.2. We call a pair of adjacent trees (T, t) with T E T&,, (or Ts,,u) and T E T,, (or Tsu,, ) useful if (a) T has non-zero number of leaves in the connector vertices of J,. (b) T has non-zero number of leaves in the connector vertices of Ju. (c) The edge connecting the roots of T and T was not deleted in the edge deletion process. We call a tree T useful if it belongs to a useful pair. Let EL > 0 be a fixed constant that satisfies 1 - EL 1.5 1.5007 l+EL (4.31) We now prove the following fact about the number of useful trees associated with any edge e = (u,v) E E. Lemma 4.9. For any e = (u, v) E E, let NR,e be the number of useful pairs of trees in T.,, and T&,u and let Ns,e be the number of useful pairs of trees in T&,, and Ts.,U. There exists a constant 6 u.ep > 0 such that, with high probability, for every 121 (u,v) E E, both of the following conditions are satisfied: (1 - EL)usepfnl < NR,e < (1 + EL)6tsefuinb, (4.32) (1 - EL)usefuinf Nse < (1 + EL)(usefun. (4.33) Proof. For any e = (u, v) E E, there are 2nb pairs of trees in Ru, R, and n pairs of trees in Su, S, associated with it. Let pet be the extinction probability of a Galton-Watson branching process with offspring distribution Bin(d - 1,p). Then the probability that any tree T has zero leaves is peet + o(1). Let T E TR,,, and T E TR,,u be a pair of adjacent trees associated with e. Then the pair (T, T) is useful iff neither of them is extinct and the edge connecting their roots is not deleted. Using independence, P((T, T) is useful) = (1 - p,,t + o(1))2p. (4.34) Note that conditioned on the event E, a pair of adjacent trees (T, T) being useful is independent of all other pairs. Let Jumful = 2(1 - pe.t)2 p. The proof then follows by standard large deviation inequalities and taking a union bound over all pairs of adjacent trees in 74. Recall that NL denotes the number of vertices in V for which J, satisfies Property 4.1. Define VL to be the set of these vertices. Also, let EL c E be the set of all edges (u, v) E E such that both u, v E VL. We now prove the following fact about the useful trees associated with any edge (u, v) E EL. Lemma 4.10. With high probability, for every e = (u, v) E EL, every useful tree T e , (or Ts,,,) associated with e has at least An&L of its leaves in the giant component of J,. 122 Proof Follows immediately by using Lemma 4.8 and the definition of a useful tree. 0 We are now ready to prove our main proposition which relates a maximum cut to an E-optimal cut CG of g. First we give the following definition: Definition 4.3. Let C be any cut of 4 where all the giant components satisfy the full polarizationproperty (see Proposition 4.2). Call the polarity of J, to be 1 if Option 1 in Proposition 4.2 holds, and 0 otherwise. Let Cg be a cut of g. Then Cu, is said to have polarity according to Cg if one of the following holds: 1. For all v E V the polarity of J, in C is same as the binary assignment of v in C9. 2. For all v E V the polarity of J, in Cu, is opposite of the binary assignment of v in Cg. Proposition 4.3. Let C be a maximal cut of g. Let Cg be any cut of g such that jCgI < |CgI - 0.001n. Then any cut Ch, of 14 with polarity according to Cg is not a maximal cut of l4. Proof. Recall that CL is the set of vertices in E both of whose incident vertices are in V,. From Lemma 4.6, with high probability, the number NL of vertices in VL is at least 0.9999n. Assume that this is true. Then, since g is 3-regular, we must have CEL I > 1.5n - 0.0003n. Further, also assume that the statements of Lemma 4.9 and Lemma 4.10 hold. Since |CgI < IQ - 0.001n, the cut Cg has at- most Hence there are at least (1.5n - 0.0003n) - (IC IQ - 0.001n edges in EL. - 0.001n) edges in EL that are not in Cg. Let (u, v) E EL be such that (u, v) 0 Cg. Then, we know that there are at 123 least J,,sejft(1 - EL)nb useful pairs of trees associated with (u, v) and each of these trees have at least rnt 6 L leaves in the corresponding giant component. Let T and T be such a useful tree pair. Since Cu, satisfies the full polarization property, in each J., all but O(nWL2b) vertices in the giant component of J, obey the polarity of J,. Let the polarity of Jv be b, E {O, 1}. Since T has at least an5 L leaves in the giant component of Jv. there must exist at least one leaf L of T in the giant component of J, such that L is assigned the binary value b,. Similarly, there must exist at least one leaf L of T in the giant component of Ju such that L is assigned the binary value bu. Since (u, v) 0 Cg, we must have bu = bv. Observe that the path from L to L passing through T and T has odd number of hops. Hence, there is at least one edge in this path that is not in CH,. This means that for each (u, v) E EL such that (U, v) 0 Cg, there are at least 6 usefui(1 - EL)nb edges not in Cw,. Then, the number of edges in the cut CH, is bounded above by ICu,, 1;i,,l - (1.5007n -1 Cg)8,ufe(1 - EL)nb. (4.35) Now consider a cut C s of 74, constructed as follows: (a) For each v E V, if v is assigned the value in 1 in the cut C, then assign all vertices in R, the value 1 and all vertices of S, the value 0. Perform an analogous assignment if v is assigned 0 in C. (b) Suppose T is a useful tree. Then assign all even levels of T including the root, the value assigned to the leaves of T. (c) Suppose T is not a useful tree. Then either T has no leaves, or its adjacent tree T has no leaves, or the edge connecting the roots of T and T has been 124 deleted. In either case, perform an assignment such that all of the edges of T are in the cut C*. From the above description, the only edges of EN, that are NOT in C* are: * The connector edges between roots of useful pairs of trees T E Tu and ' E T, such that u and v have the same binary assignment in C, i.e., (u, v) 0 C. The number of such edges is upper bounded by (1.5n |Cj)6 efu(1 + EL)n6. So the number of edges in the cut C, is lower bounded as ICUt| tEUiI - (1.5n - JCg*)6taeIt(1 + EL)n (4.36) Combining (4.35) and (4.36) we have, IC I - IC'4 (1.5007n - IC3I)&u 1f 1(1 - EL)nb - (1.5n - )6 (1+ EL))nb > [1.5007(1 - EL) - 1.5(1 + EL)J6.,efun b+1 > 0, where the last line follows from (4.31). Hence Cw, cannot be a maximal cut of 74 and the proof is complete. 0 With the properties of Cu, proved in the preceding sections, we are now ready to complete the proof of Theorem 4.2. Proof of Theorem 4.2. We are now in a position to provide the details of the algorithm A 2 . Given a maximal cut Cu, of 74, * Find the largest connected component C, of every bipartite graph J,. 125 * Find the polarity of C, i.e., declare the polarity to be 1 if the number of vertices in C, fl & assigned the value 1 is at least as many as the number of vertices in C, n S,, and declare the polarity to be 0 otherwise. " Produce a cut Cg of g by assigning the vertex v the value same as the polarity of J". From Proposition 4.3 it follows that the cut Cg must satisfy jCgj ICZJ - 0.001n > (1 - ecita)IC , with high probability, where the inequality follows from the fact that 4.5 (4.37) IQI > n-1. l Conclusion We showed that the thinned MAX-CUT problem on graphs with maximum degree d undergoes a computational hardness phase transition at pc = 1. However, it is unclear whether or not the problem is computationally tractable when p = pc. We conjecture that the problem remains NP-hard in this case. 126 Chapter 5 Algorithms for Low Rank Matrix Completion 5.1 Introduction Matrix completion refers to the problem of recovering a low rank matrix from an incomplete subset of its entries. This problem arises in a vast number of applications that involve collaborativefiltering, where one attempts to predict the unknown preferences of a certain user based on collective known preferences of a large number of users. It attracted a lot of attention in recent times due to its application in recommendation systems and the well-known Netflix Prize. 5.1.1 Formulation Let M = a#' be a rank r matrix, where a E Rx, P E Rnx and suppose that r << m, n. Let E c [m] x [n} be a subset of the indices and let Me denote the entries of M on the subset E. The two major questions that arise in matrix completion are: 127 (a) Given Me, is it possible to reconstructM? and (b) Are there efficient algorithms to perform this reconstruction? Without any further assumptions, matrix completion is NP-hard [41]. However under certain conditions, the problem has been shown to be tractable. The most common assumption adopted is that the matrix M is "incoherent" and the subset C is chosen uniformly at random. The incoherence condition was introduced in [10], [11], where it was shown that convex relaxation resulting in nuclear norm minimization succeeds with further assumptions on the size of S. In [35] and [36}, the authors use an algorithm consisting of a truncated singular value projection followed by a local minimization subroutine on the Grassmann manifold and show that it succeeds when JEj = Q(nrlog n). In [28], it was shown that the local minimization in [351 can be successfully replaced by Alternating Minimization. The use of Belief Propagation for matrix factorization has also been studied by physicists in [33]. For the rest of the chapter we will assume that m = n for simplicity of notation and the results easily extend to the more general m = O(n). Let ( be a bipartite graph on the vertex set V = VR U Vs corresponding to the rows and columns of M and with edge set E. Let VR = {ri, i E [n]} and Vs = {si, i E [n]}. Denote by A = A(n) the maximum degree of g. The graph g represents the structure of the revealed entries of M. We denote the ith row of a by ai and the jth row of 6 by 8j. Note that M1 , = a'i#. Matrix completion can be recast as the following optimization problem over !: 3 - Mij 12, min Xx'sy X,YERnxr (jE (5.1) where xi and y3 are associated with vertices ri E VR and s3 E Vs and denote the ith row of X and jth row of Y respectively. The above optimization problem is 128 non-convex and as expected, it is in general NP-hard to solve. 5.1.2 Algorithms We provide three algorithms for matrix completion that operate on the graph g: Information Propagation (IP), Vertex Least Squares (VLS) and Edge Least Squares (ELS). Information Propagation (IP) is a simple sequential decoding algorithm on the graph g. In Section 5.2, we provide sufficient conditions when IP successfully recovers M from Me, and show that under these conditions it takes only O(n) computation time when r = 0(1). In Section 5.3, we cover VLS and ELS, which are iterative decentralized algorithms. The VLS is identical to the Alternating Minimization algorithm in [28], where it was used as a local optimization subroutine following a singular value projection, i.e., with a warm start. This was motivated by the fact that VLS was a major component in the award winning algorithms for the Netflix challenge [401, [39]. In this chapter, we also study the use of VLS with a cold start, where it is used as the sole procedure for matrix completion. For the special case when M is rank one, we provide sufficient conditions when VLS converges to the right solution. The ELS is a new algorithm which is obtained as a message-passing variation of the VLS. We provide experimental results, where we observe that ELS significantly outperforms VLS both in terms of sample complexity and convergence speed. This also suggests that replacing VLS with warm start by ELS with warm start in existing algorithms might significantly improve their performance. 129 5.2 Information Propagation Algorithm Information Propagation (IP) is a simple sequential decoding algorithm on g that works when g has certain "strong connectivity" properties.We first formally state the steps in the algorithm: INFORMATION PROPAGATION (IP) 1. Initialization o Initialize the set of decoded vertices Do = {r,... ,r,.} and X = Y = [0]. o Initialize the set of potential vertices Po = {v E Vjv has at least r neighbors in Do}. o Set xi = e. for i 2. Repeat until P = 1, ... ,r, where ej is the ith standard unit vector. 0: o Pick v E Pt. Assume w.l.o.g. that v E Vs, i.e., v = sj for some j. o Let ri,, ... , r, E Dt be neighbors of s . Set yj to be a solution to the set of linear equations x'y = o Set Pt+l = Pt\s and Dt+1 = Dt U sj. 3. Declare M = XY'. Since, IDt+iI= IDtI + 1, the algorithm terminates after at most 2n -r steps. This algorithm has similarities to the rank one recursive completion algorithm in [371. We present the following theorem regarding the performance of IP which admits a simple proof. 130 Theorem 5.1. Assume that M = a3' with a,,8 E R"'. Also assume that every collection of r rows of either a or 6 are linearly independent. Suppose that the algorithm IP startingfrom some initial Do terminates uith DT = V. Then M = M. Proof. Let B be the sub matrix of a consisting of the first r rows of a. Since the first r rows of a given by a1 ,..., a, are linearly independent, B is full rank, we can set a +- aB and +- PB'. So w.l.o.g. assume that ai = e, 1 < i < r. By the initialization step, XD. = avD. Assume that after t steps of the algorithm we have XDtnvR = aDtnvR and YDtnvs = ODtnvs. Let sj E P be the vertex picked at time t + 1 in step 2 of the algorithm and let r,... , r,. E Dt be neighbors of s. Then we must have for agp3 = M ,,. But since, a1 ,,..., aj,. are linearly independent, the update at step t +1 indeed sets yj = /3,. The theorem then follows by induction. L The main requirement for the success of IP is that it doesn't terminate prematurely, i.e. we do not encounter Pt = 0 for some t < 2n - r. This requires that g satisfies a certain "strong connectivity" property. For the special case when M is rank one, assuming that all entries of a and 8 are non-zero, the only requirement on g is that it is connected. More generally, for r > 1, the required "strong connectivity" property can be defined via the steps of IP itself. The following condition is sufficient, but not necessary. Assumption 5.1. For each B C V such that r < JBI < 2n and jB n RI > r, there exists a vertex v E V\B with at least r neighbors in B. When the above assumption holds, then unless IDtI = 2n, i.e., all vertices have been decoded, the corresponding set of potential vertices Pt that can be decoded is non-empty. Furthermore, this sufficient condition can be verified easily for some graphs, for example, radom Erdos Renyi graphs. Specifically, we have the following result. 131 Theorem 5.2. Let A be a large enough number such that Are-A+r < 1/4. Let c be a fixed constant that satisfies (c/2)e- 2 A > r + 1. Let g be a random bipartitegraph on 2n vertices with n vertices in each part where every edge is present with probability p = (.*n)1/r. Then if r = 0(1), the random graph g satisfies Assumption 5.1 w.h.p. as n -+ oo. Theorem 5.2 says that This requires an extra n' 1 [ej = O(n 2-1/r(logn)1/r) is sufficient for IP to succeed. /' factor of revealed entires as opposed to 6 1= 0(n log n) in [35]. The benefit is that, with these extra entries we can now perform matrix completion using IP in only O(n) time. For the case of rank one matrices r = 1, we only need connectivity of the graph, and Theorem 5.2 reduces to the well-known l"o threshold of connectivity of Erdos-Renyi random graphs. Proof of Theorem 5.2. For r = 1, the theorem is same as the well-known result about connectivity of Erdos-Renyi random graphs. We will assume for the rest of the proof that r > 2. We will show that for every partition of g = (R U S, S) into two parts B and BC with IBI > r, there exists at least one vertex in BC that has at least r neighbors in B. Assume w.l.o.g. that jB n RI lB n S|. Recall that lB f R > r. Let v E BC f S. Then the number of neighbors of v in B n R which we denote by D, is distributed as D, ~ Bin(IB n RI,p). Hence, the probably that none of the vertices in BC n S has at least r neighbors in B n R is given by: [P (Bin(IBnAR|, p) < r)]"~I"nsi < [P (Bin(IB n RI, p) < r)] max{n-jsnai,1}, where the inequality follows from the assumption IBfRI | (5.2) IBnS and that IBCI> 1. Let E be the event that Assumption 5.1 is not satisfied. Then by the union bound 132 over all possible sizes of lB n RI we get, n-1 P(E) < n [P (Bin(bp) < r)]"- "+ [P (Bin(n,p) < r)]. (n) We will show that for every r < b < n - 1 the quantity [P (Bin(b, p) < r)]nb is o(n-') and that the same is also true for [P (Bin(n,p) < r). We divide the proof into three cases depending on the size of b: Case 1: b > n/2 In this case we have P (Bin(b,p) < r) < P (Bin(n/2,p) < r) < (p -ln/2-r r n/2)r r ) \ 2 (Clgfl1/, n/ r < rnr clogn n clog n n )I < crnr-' log n = e~(V') (n/2-r) e~( O) (since r > 2). Therefore, Q[P (Bin(b,p) < r)] "* < Case 2: Alp < b < n/2 133 "-b e-n-" (5.3) "-~ = o(n-1) Again we have P (Bin(b,p) < r) < r ()p (1 - p)br < 1/4. <(bp)re -p(b) Hence, (n) [P (Bin(b,p) < r)]nb < n -2 =n -) Case 3: r < b < A/p We get P (Bin(b,p) < r) < 1 - P (Bin(b, p) = r) p)br ((1 b1- clogn b clogn (1r1 n L < -e-clo < - n n( o 134 e-2A. f ~ for large enough n .- - .. .. .--. -. JULLJ.1--. - - .- - -. -. - "-0--r." -. "' -- Hence, () [P (Bin(b,p) < r)]"~b 1 - cl5 clon -2 < e* *lon-01) nM~- 1o-'gn[(C/2)e-2A(b)-b} = o(n1). 5.3 Vertex Least Squares and Edge Least Squares Algorithms In this section we provide two iterative decentralized algorithms that attempt to solve the non-convex least squares problem in (5.1). The first is what we call the Vertex Least Squares (VLS) algorithm. VERTEX LEAST SQUARES (VLS) 1. Initialization: * Xi,yyj,o E R', ij E [n] * Maximum Iterations = T 135 2. For i = 1 to T: For each ri E R, set xi+t = arg min E ji)2. (5.4) arg nun ~II (y'xj,t+i - Mi3 ). (5.5) XERr (X'g~ - j:s3 -'r, For each s E S, set =jt~ 3. Declare M XTYI. Each iteration of the VLS consists of solving 2n least squares problems, so the total computation time per iteration is O(r2LAn). The VLS algorithm in the above form is identical to Alternating Minimization [28] and exploits the biconvex structure of the objective in (5.1). We prefer to write the iterations of this algorithm in the above form to highlight the local decentralized nature of the updates at each vertex. In [28], this algorithm was used as a local minimization subroutine with a warm start provided by an SVD projection step prior to it. However, we observe in our experiments that in many cases, VLS by itself can succeed in reconstructing M. To support our observations, we present the following theorem regarding the convergence of VLS in the special case of positive matrices with rank 1. Recall that the Frobenius norm of a matrix A is denote by IIAIIF and is given by IIAIIF = Theorem 5.3. Let M = a#' with a,/8 E R and suppose there exists 0 < b < 1 such that for all i, j E [n], we have b < ai, I3 < 1/b. 136 Suppose that the graph g is connected and has diameter d = clog n for some fixed constant c and maximum degree A = 0(1). Suppose that VLS is initialized at b < xj,O 1/b for i E [n]. Then, there exists a constant a1 > 0 such that given any e > 0, there exists a second constant a2 > 0 for which after T = a2 n 1 log n iterations of VLS, we have IIIXTY - MIIF < Before proceeding to the proof of Theorem 5.3, we remark here that in [28], the success of VLS was established by showing that the VLS updates resemble a power method with bounded error. In our proof we also show that VLS updates are like time varying power method updates, but without any error term. In [28], the warm start VLS required that principal angle distance between the left and right singular vector subspaces of the actual matrix M and the initial iterates are at most 0.5. With the conditions given in Theorem 5.3, this may not always be the case. From [28], the subspace distance between two vectors u and v (rank 1 case) is given by d(u,v)=- -(5.6) Suppose that b, 1 < i < n/2, 1/b, n/2+1ti Cii Then d(xo, a) = 1 - ,iO= n. 1/b, 1 < i < n/2 b, n/2+1 < i < n. (5.7) is greater than 1/2 when b is a small constant. In fact the subspace distance can be very close to one. Nevertheless, according to Theorem 5.3 VLS converges to the correct solution. 137 Proof of Theorem 5.3. From the update rules for VLS in Eq. (5.4)-(5.5), we can write tiy't yj+1 and 2 t (5.8) xi,t+i = 2 :sg~ ,Ei:s ~-ri I4,+1 1, Let A be the partial adjacency matrix of g, i.e., A:, = 0, if ri ~- s i otherwise Define U,t =x~t and Vjt= Yt. With the chosen initial conditions in the theorem, we have that b2 < ui,o, vj,o 1/b 2 . Using (5.8), the updates for ut,t and v,,t can be written as, =1,t+= and iU.t.. (5.9) The convex combination update rules in (5.9) imply that all future iterates satisfy b2 uit, ,,t 1/b 2 and bP < xi,t, yt 5 1/b. Combining the two updates in (5.9), we see that u,,t+, can be expressed as a convex combination of ui,t, i.e., there exists a stochastic matrix Pt such that ut+1 = Ptut, where Ut = (ui,t, i E [n]) expressed f as a column vector. It is apparent that the support of Pt is same as the support EE where Aij matrix of 1 i.e., Pt is the transition of AA', probability 0, otherwise a random walk on (VR, ER), where (i 1, i2 ) E ER if and only if i2 is a distance two neighbor of i 1 in g. Although Pt depends on t, we can prove some useful properties satisfied by Pt that hold for all times t as stated in the following lemma. Lemma 5.1. There exists a constant 0 < -/ < 1 that depends on b and A such that the non-zero entries of Pt satisfy y : P ,ij for all t. 138 i' k0t, A 04,$ A ji E2 k:sjvrjk > 2 Y3,t Ek:ak~ri > A1-2A.j AOL k't and R( ) - P7wf. Notice that Pt = Rt(')Rt(2) 7 where R(1) t'ij A j. Hence the non-zero entries of Pjj must be at least -y A 520 03 Given that the the diameter of g is d, define the sequence of matrices {Qk}Mi_ as kd Qk = 1 (5.10) Pt. t=(k-1)d+1 Then for any k, Qk satisfies Qiy> -Id = yclog* A &, where a1 = -clogy > 0. Let wt,i = udt,i, i E [n]. Then, maxwt+,s max wt,j(1 - n~") + min wt,in", min wt+l,t > min wt, (1 - n~") + max wt,in". Combining the above gives (max wt+,1 - min wt+i,j) < (1 - 2n~"')(maxw,i - min wt,). Si i (5.11) Let 6 > 0 be a small enough constant such that + < 1+ e < ---. Choose a2 to be a large enough constant such that -202 < S. Then, for T = a 2n"a we get by applying (5.11) recursively, <6o. - min wo,j) (5.12) (5.13) Substituting the definition of wtj, we get (max &uri- mini uor,) ; 6, where T 139 - (max w 1,j - nun wt,) < (1 - 2n" )a2na(max wo, ItS a 2 cn*l log n. This means there exists a constant b < B < 1/b such that (B - J, B + 6) for all i E [n]. From 5.9, we get that E (B - 3, B + J) for all 1 UTi E j E [n]. This gives Ui,TV,T E Hence, Iaj/3i - Xi,TYj,Ti = (B- B+5 c(l jaifi(1 - Ui,TVj,T) E,1+E). , (5.14) which completes the proof. We now proceed to describe the Edge Least Square (ELS) algorithm, which is a message passing version of the VLS algorithm. EDGE LEAST SQUARES (ELS) 1. Initialization: 0 X, yj. E R, (i,j) EC * Maximum Iterations = T 2. For t = 1,..., T: Foreachri E Randj: ri ~ sj set xi-jt+l = arg mm XERr ('y k:sk,,rj,kAj 140 ,, - M )2. (5.15) For each sj E S and i : ri ~ si set +<,++1t= i arg (Y'Xk-+j,t+1 - Mi ) 2 . (5.16) k:r1 '-'a3 ,kfi 3. Compute XtT 4. Declare M = Xi.+.j,T and Yj,T - = dg(e, i,T- >.ijr-.ij XTYT. Each iteration of the ELS consists of solving 2161 least squares problems, so the total computation time per iteration is O(r 2 AICI). For the special case of rank one matrices, it is possible to conduct an analysis on the ELS iterations along the lines of the proof of Theorem 5.3 for VLS. Let W be the dual graph on the directed edges of g. Here W = (VH, 6H) where VH = VHR, U VH,c and VH,R consists of all directed edges (r,, s,) of g and VHs consists of all directed edges (sj, ri) of g. Additionally, (ri, sj) E VH,R and (sk, r) E VHS are neighbors if and only if j = k. Similar to (5.8), we can write the corresponding update rules for ELS as follows =k:kI.J~,~jM ikYk.4i,t 2-+j,t+2 Ek:sA;~ri,k-Aj yk-i~kki anE and k:srrk,k96i MkjXk-..j,t+1 yj-*i,t+- 2 Ek--j (5.17) Define u ,t = and vj.+.,t = . Then, similar to (5.9), we can write the corresponding update rules for ELS as follows. 92.+, U-+j,t+ 1 1 and ykit k:ak~rgr,kfj Z1:8gLfj Y-+it Vk-+i,t Vj-+i,t k__+____ LaXk _____ - k:aiirxirkk#i 43 :s f ,t _uk.+jt l (5.18) 141 Again, as before, defining u = (uj_+j, ri ~ sj) we can write ut+i = Put for some stochastic matrix Pt. The support of Pt is the graph VHR where the two vertices are neighbors if and only if they are distance two neighbors in X. From the above equations, it is apparent that it is possible to prove a result similar to Theorem 5.3 for ELS. We state the result below. We omit the proof as it is identical to the proof of Theorem 5.3. Theorem 5.4. Let M = a#' with a, P E R' and suppose there exists 0 < b < 1 such that for all i, j E [n], we have b < ai, /j 1/b. Suppose that the graph W is connected and has diameterd = clog n for some fixed constant c and maximum degree A = 0(1). Suppose that VLS is initialized at b < xio 1/b for i e [n]. Then, there exists a constant a1 > 0 such that given any e > 0, there exists a second constant a2 > 0 for which after T = a2 n"i log n iterations of VLS, we have 1 <C. IIF -M tXTYT' 5.4 Experiments In this section, we provide simulation results for the VLS and ELS algorithms with particular emphasis on (a) Showing faster convergence of ELS as compared to VLS in several randomly generated instances. (b) Success rates for VLS and ELS algorithms for randomly generated instances and random initializations. In view of Theorem 5.3, we generate a, 6 E RI at random from U[0.01, 0.99]. We then compare the decay in root mean square error (RMS) defined below in (5.19) 142 with number of iterations for the two algorithms. We perform this experiment on a random 3-regular bipartite graph G = (V, E) on 2n vertices with n vertices on We then run VLS and ELS on each side and keep it fixed for the experiment. Me. Random regular graphs are known to be connected with high probability, and we did not find significant variation in results by changing the graph. Since ELS requires about a A factor more computation per iteration, we plot the decay of RMS number for VLS and vs normalized iterations index which is defined as iteration Total iteratins Aiteration number for ELS. Total iterations The root mean square (RMS) error after T iterations of ether VLS or ELS is defined as ilif (5.19) . 1 *1 n-|nM- XMTYTF. 04 al am 0 O.1 0.2 as 0.4 as5 07 &. Figure 5-1: RMS vs number of iteration (normalized) for VLS and ELS The comparison in Figure 5-1 (computed for n = 100) demonstrates that ELS converges faster than VLS. 143 For the case rank r > 1, we choose a, 3 E R!IT randomly by generating each entry of the two matrices independently and uniformly from the interval [-1, 1]. We then construct the random revelation graph g randomly in two steps. First, we generate a random r + 1 regular bipartite graph g = (V, E). Second, we generate another edge set E, where each edge exists independently with probability c/n. We then superimpose the two sets of edges and set E = E U El. The first step ensures that G has minimum degree r + 1. This ensures that the least square optimization problems (5.4), (5.5) and (5.15), (5.16) involved in the update rules of VLS and ELS have unique solutions. The second step allows us to control the number of revealed entries via the quantity c. In particular, the above random graph model results in g having an average degree of ~ (r + 1 + c) which corresponds to ~,-, (r + 1 + c)n revealed entries. We plot the empirical fraction of failure obtained from 50 iterations as a function of c. A failure is assumed to occur when the algorithm (VLS or ELS) fails to achieve an RMS less than 10-. Figure 5-2 and Figure 5-3 show the results for ELS when r = 2 and r = 3 respectively. This provides evidence for the success of ELS even with a cold start. Figure 5-4 does the same for VLS with cold start for r = 2, showing that it does not always succeed. 5.5 Discussion and Future Directions The performance of ELS in our experiments ssuggest that cold start ELS has good sample complexity and is quite usable on its own. However, we currently have very limited understanding of the reason for its success, especially when the rank of M is greater than one. It would be interesting in future work to find a theoretical framework for analysis of ELS. 144 0LI 0.7 0J 0,3 0A 0 t 2 3 4 5 7 8 S 10 Figure 5-2: ELS: Failure fraction vs c for r = 2 with planted 3-regular graph (n = 100) The faster convergence and better sample complexity of ELS also suggests that a warm start version of ELS (perhaps following a SVD projection step like in [35], [28]) can be very successful and can offer significant improvements over algorithms that use other warm start subroutines such as VLS or manifold gradient descent. From a theoretical stand point it may be possible to extend the analysis of warm start VLS in [28] to prove convergence of warm start ELS. 145 as 0.4 07 I:: 0 1 2 3 4 5 6 7 a 9 io . . .. . . . . . . 9 10 12 14 if Is . Figure 5-3: ELS: Failure fraction vs c for r = 3 with planted 4-regular graph (n = 100) 07 0.6 Q.5 I04 02 0.2 al 2 4 6 20 Figure 5-4: VLS: Failure fraction vs c for r = 2 with planted 3-regular graph (n = 100) 146 Bibliography [1] S. Arora. Polynomial time approximation schemes for euclidean traveling salesman and other geometric problems. Journal of the A CM, 45(6), 1998. [2] A. Asmussen and H. Hering. Branching Processes, volume 3 of Progress in probability and statistics. Birkhauser Boston Inc., Boston, MA, 1983. [3] A. Bandyopadhyay and D. Gamarnik. Counting without sampling: Asymptotics of the log-partition function for certain statistical physics models. Random Structures and Algorithms, 33(4):452-479, 2008. [4J E. A. Bender and E. R. Canfield. The asymptotic number of labelled graphs with given degree sequences. Journal of Combinatorial Theory, 24:296-307, 1978. [5] P. Berman and M. Karpinski. On some tigher inapproximability results, further improvements. ECCC, TR98-065, 1998. [6] J. D. Biggins and N. H. Bingham. Large deviations in the supercritical branching process. Adv. Appl. Probab., 23(4):757-772, 1993. [7] B. Bollobis. Random Graphs. Academic Press, 1985. [8] B. Bollobis and 0. Riordan. An old approach to the giant component problem. 2012. 147 [9] M. Boss, H. Elsinger, M. Summer, and S. Thurner. Network topology of the interback market. QuantitativeFinance, 4(6), 2004. [10] E. Candes and B. Recht. Exact matrix completion via convex optimization. Foundationsof computational mathematics, 9(6), 2009. [11] E. Candes and T. Tao. The power of convex relaxation: near optimal matrix completion. IEEE Transactionson Information Theory, 56(5):2053-2080, 2009. [12] N. Chen and M. Olvera-Cravioto. Directed Random Graphs with Given Degree Distributions. Arxiv.org, 1207.2475, 2013. [13] R. Dobrushin. Prescribing a system of random variables by the help of conditional distributions. Theory Prob. and its Appl., 15:469-497, 1970. [14] S. Dubuc. La densite de la loi-limite d'un processus en cascade expansif. Z. Wahrsch. Verw. Gebiete, 19:281-290, 1971. [151 M. Dyer, A. Frieze, and R. Kannan. A random polynomial-time algorithm for approximating the volume of convex bodies. Journal of the ACM, 38(1):1-7, 1991. [16] P. Erdos and A. R6nyi. On the Evolution of Random Graphs. Magayr Tud. Akad. Mat. Kutato Int. Kozl, 5:17-61, 1960. [17] K. Fleischmann and V. Wachtel. Lower deviation probabilities for supercritical galton-watson processes. Annales de l'Institut Henri Poincare (B) Probability and Statistics, 43(2):233-255, 2007. 148 [18] D. Gamarnik and D. Katz. Correlation decay and deterministic FPTAS for counting list-colorings of a graph. Journal of Discrete Algorithms, 12:29-47, 2012. [19] D. Gamarnik, T. Nowicki, and G. Swirszcz. Maximum weight independent sets and matchings in sparse random graphs. exact results using the local weak convergence method. Random Structures and Algorithms, 28(1), 2006. [20] Q. Ge and D. Stefankovic. Strong spatial mixing of q-colorings on bethe lattices. arXiv:1102.2886v3, November 2011. [21] H. 0. Georgii. Gibbs measures and phase transitions. Walter de Gruyter and Co., Berlin, 1988. [22] K. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A. Barabasi. The human disease network. PNAS, 104(21), 2007. [23] L. A. Goldberg, R. Martin, and M. Paterson. Strong spatial mixing with fewer colours for lattice graphs. SICOMP, 35(2):486-517, 2005. [24] G. R. Grimmett. A theorem about random fields. Bulletin of the London Mathematical Society, 5(1):81-84, 1973. [25] H. Hatami and M. Molloy. The scaling window for a random graph with a given degree sequence. Random Structures and Algorithms, 41:99 - 123, 2012. [26] T.P. Hayes and E. Vigoda. Couplings with the Stationary Distribution and Improved Samplings for Colorings and Independent sets. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 971-979, 2005. 149 [271 M. 0. Jackson. Social and Economic Networks. Princeton University Press, 2008. [28] P. Jain, P. Netrapalli, and S. Sanghavi. Low rank matrix completion using alternating minimization. arXiv:1212.0467v1, 2012. [29] S. Janson and M. Luczak. A new approach to the Giant Component Problem. Random Structures and Algorithms, 37(2):197-216, 2008. [30] M. Jerrum, A. Sinclair, and E. Vigoda. A polynomial time approximation algorithm for the permanent of a matrix with non-negative entries. Journal of the ACM, 51(4):671-697, 2004. [31] M. R. Jerrum. A very simple algorithm for counting the number of k-colourings of a low-degree graph. Random Structures and Algorithms, 7(2):157-165, 1995. [32] J. Jonasson. Uniqueness of uniform random colorings on regular trees. Statistics and ProbabilityLetters, 57:243-248, 2002. [331 Y. Kabashima, F. Krzakala, M. M6zard, A. Sakata, and L. Zdeborova. Phase transitions and sample complexity in bayes-optimal matrix factorization. arXiv:1402.1298., 2014. [34] M. Kang and T.G. Seierstad. The critical phase for random graphs with a given degree sequence. Combinatorics, Probability and Computing, 17:67-86, 2008. [35] R. H. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries. IEEE Transactions on Information Theory, 2010. [36] R. H. Keshavan, A. Montanari, and S. Oh. Matrix completion from noisy entries. Journal of Machine Learning Research, 11:2057-2078, 2010. 150 [37] R. H. Keshavan, S. Oh, and A. Montanari. Learning low rank matrices from o(n) entries. Proceedings of the Allterton Conference on Comm, Control and Computing, September 2008. [38] H. Kesten and B. P. Stigum. A Limit Theorem for Multidimensional GaltonWatson Processes. The Annals of Mathematical Statistics, 37(5):1211 - 1223, 1966. [39] Y. Koren. The BellKor solution to the Netflix grand prize. 2009. [40] Y. Koren, R. M. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30-37, 2009. [41] R. Meka, P. Jain, C. Caramanis, and I. S. Dhillon. Rank minimization via online learning. ICML, pages 656-663, 2008. [42] M. Molloy. The Glauber dynamics on colorings of a graph with high girth and maximum degree. SIAM Journal on Computing, 33(3):712-734, 2004. [43] M. Molloy and B. Reed. A critical point for Random Graphs with a given degree sequence. Random Structures and Algorithms, 6:161-180, 1995. [44] M. Molloy and B. Reed. The Size of the Largest Component of a Random Graph on a fixed Degree Sequence. Combinatorics, Probabilityand .Computing, 7:295-306, 1998. [45] J. L. Morrision, R. Breitling, D. J. Higham, and D. R. Gilbert. A lock-and-key model for protein-protein interactions. Bioinformatics, 22(16), 2006. [46] M.E.J. Newmann. The structure of scentific collaboration networks. Proc. Natl. Acad. Sci. USA, 98, 2001. 151 [471 M.E.J. Newmann, S.H. Strogatz, and D.J. Watts. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E, 64(026118), 2001. [48] 0. Riordan. The phase transition in the configuration model. Combinatorics, Probabilityand Computing, 21(265-299), 2012. [49} B. Simon. The statistical mechanics of lattics gases. Princetonseries in physycs, Princeton university press, Princeton, NJ, 1, 1993. [501 A. Sly. Computational transition at the uniqueness threshold. pages 287-296. Foundations of Computer Science (FOCS), IEEE, 2010. [511 E. Vigoda. Improved bounds for sampling colorings. Journal of Mathematical Physics, 41(3):1555-1569, 2000. [52] D. Weitz. Mixing in time and space for discrete spin systems. PhD thesis, University of California Berkeley, May 2004. [531 D. Weitz. Counting independent sets up to the tree threshold. In Proceedings of the thi'ty-eighth annual ACM symposium on theory of computing (STOC), pages 140-149, 2006. [54] N. C. Wormald. Some Problems in the Enumeration of Labelled Graphs. PhD thesis, Newcastle University, 1978. [55] N. C. Wormald. Differential Equations for Random Processes and Random Graphs. Annals of Applied Probability,5:1217-1235, 1995. [56] M. Yildrim, K. Goh, M. E. Cusick, A. Barabasi, and M. Vidal. Drug-target network. Nat Biotechnol, 25, 2007. 152