Decay of Correlations and Inference ... Models 2 201

advertisement
Decay of Correlations and Inference in Graphical
Models
OF TECHNOLOGY
by
SEP 2 5 201
Sidhant Misra
LIBRARIES
B.Tech., Electrical Engineering,
Indian Institute of Technology, Kanpur (2008),
SM., Electrical Engineering and Computer Science,
MIT (2011)
Submitted to the Department of Electrical Engineering and
Computer Science
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2014
@ Massachusetts Institute of Technology 2014. All rights reserved.
Author........
Signature redacted
Department of Electrical Engineering and Computer Science
August 28, 2014
Signature redacted
Certified by ....
Signature redacted
David Gamarnik
Professor
Thesis Supervisor
Accepted by.........
I '6 C)
Leslie A. Kolodziejski
Chairman, Department Committee on Graduate Theses
Decay of Correlations and Inference in Graphical Models
by
Sidhant Misra
Submitted to the Department of Electrical Engineering and Computer Science
on August 28, 2014, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Abstract
We study the decay of correlations property in graphical models and its implications
on efficient algorithms for inference in these models. We consider three specific
problems: 1) The List Coloring problem on graphs, 2) The MAX-CUT problem on
graphs with random edge deletions, and 3) Low Rank Matrix Completion from an
incomplete subset of its entries. For each problem, we analyze the conditions under
which either spatial or temporal decay of correlations exists and provide approximate
inference algorithms in the presence of these conditions. In the course of our study
of 2), we also investigate the problem of existence of a giant component in random
multipartite graphs with a given degree sequence.
The list coloring problem on a graph g is a generalization of the classical graph
coloring problem where each vertex is provided with a list of colors. We prove the
Strong Spatial Mixing (SSM) property for this model, for arbitrary bounded degree
triangle-free graphs. Our results hold for any a > a* whenever the size of the list
of each vertex v is at least aA(v) + P where A(v) is the degree of vertex v and
# is a constant that only depends on a. The result is obtained by proving the
decay of correlations of marginal probabilities associated with the vertices of the
graph measured using a suitably chosen error function. The SSM property allows
us to efficiently compute approximate marginal probabilities and approximate log
partition function in this model.
Finding a cut of maximum size in a graph is a well-known canonical NP-hard
problem in algorithmic complexity theory. We consider a version of random weighted
MAX-CUT on graphs with bounded degree d, which we refer to as the thinned
MAX-CUT problem, where the weights assigned to the edges are i.i.d. Bernoulli
random variables with mean p. We show that thinned MAX-CUT undergoes a
3
computational hardness phase transition at p = pc = d 11: the thinned MAX-CUT
problem is efficiently solvable when p < pc and is NP-hard when p > pc. We show
that the computational hardness is closely related to the presence of large connected
components in the underlying graph.
We consider the problem of reconstructing a low rank matrix M from a subset
of its entries Mo. We describe three algorithms, Information Propagation (IP), a
sequential decoding algorithm, and two iterative decentralized algorithms named
the Vertex Least Squares (VLS), which is same as Alternating Minimization and
Edge Least Squares (ELS), which is a message-passing variation of VLS. We provide
sufficient conditions on the structure of the revelation graph for IP to succeed and
show that when M has rank r = 0(1), this property is satisfied by Erdos-Renyi
graphs with edge probability Q ((,g-) 'r). For VLS, we provide sufficient conditions
in the special case of positive rank one matrices. For ELS, we provide simulation
results which shows that it performs better than VLS both in terms of convergence
speed and sample complexity.
Thesis Supervisor: David Gamarnik
Title: Professor
4
Acknowledgments
First and foremost, I am extremely grateful to my advisor, Professor David Gamarnik,
for his guidance and support. His patient and systematic approach were critical in
helping me navigate my way through the early stages of research. He was very generous with his time; our discussions often lasted more than two hours, where we
delved into the finer technical details. I learnt tremendously from our meetings and
they left me with renewed enthusiasm. David has also been an extremely supportive
mentor, and has offered a lot of encouragement and guidance in my career decisions.
I would like to thank my committee members Professor Patrick Jaillet and Professor Devavrat Shah for offering their time and support. Patrick brought fruitful
research opportunities my way, and also gave me the wonderful opportunity to be a
part of the Network Science course as a TA with him. Devavrat was kind enough
to mentor me in my early days at MIT and introduced me to David. His continued
encouragement and his valuable research insights have been incredibly helpful.
My time at MIT would not have been as enjoyable without the many friends and
colleagues I met at LIDS. I would like to thank Ying Liu for all his help, the fun chats
over lunch, and the topology and algebra study sessions. I would also like to thank
my officemates James Saunderson and Matt Johnson for their enjoyable company
and many stimulating white board discussions.
This thesis would not have been possible without the support of my parents. Their
unconditional love and encouragement has helped me overcome the many challenges
I have encountered throughout the course of my PhD, and for that I am eternally
grateful. Finally, I am very grateful to Aliaa for being my comfort and home, and
for making my journey at MIT a wonderful experience.
5
6
Contents
1
2
Introduction
11
1.1
Graphical Models and Inference .........................
11
1.2
Decay of Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.3
Organization of the thesis and contributions . . . . . . . . . . . . . .
15
Strong Spatial Mixing for List Coloring of Graphs
27
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
2.2
Definitions and Main Result .......................
30
2.3
Preliminary technical results ...................
2.4
Proof of Theorem 2.3 ...............................
2.5
Conclusion ................................
. ...
32
38
. 49
3 Giant Component in Random Multipartite Graphs with Given Degree Sequences
51
3.1
Introduction ................................
51
3.2
Definitions and preliminary concepts ......
3.3
Statements of the main results ..........................
59
3.4
Configuration Model ................................
63
3.5
Exploration Process ............................
66
7
..............
55
Supercritical Case . . . . . . . . . . . .
3.7
Size of the Giant Component
3.8
Subcritical Case . . . . . . . . . . . . .
94
3.9
Future Work . . . . . . . . . . . . . . .
97
70
. . . . ...
87
.
.
.
3.6
4 MAX-CUT on Bounded Degree Graphs wi th Random Edge Deletions
4.1
99
. . . . . . . . . . . . . . .
Introduction .................
4.2 Main Results ...................
99
...................
102
4.3
Proof of Theorem 4.1 ...............
................
103
4.4
Proof of Theorem 4.2 ...............
S. . . ..........
105
Construction of gadget for reduction
4.4.2
Properties of 7,
4.4.3
Properties of MAX-CUT on 74,
.
.
. . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . .
.
4.5
4.4.1
. . . . . . . . . . . ..
...................
107
. . . . . . . . . . . . . . . 113
. . . . . . . . . . . . . . . 126
127
5 Algorithms for Low Rank Matrix Completion
5.1
. 105
127
Introduction ..........................
5.1.1
Formulation . . . . . . . . . . . . . . . . .....
5.1.2
Algorithms
127
129
5.2
Information Propagation Algorithm . . . . . . . . . . . .
130
5.3
Vertex Least Squares and Edge Least Squares Algorithms
135
5.4
Experiments . . . . . . . . . . . . . . . . . . . . . . . . .
142
5.5
Discussion and Future Directions . . . . . . . . . . . . .
144
.
.
.
.
.
. . . . . . . . . . . . . . . . . . . . .
8
List of Figures
4-1
Illustration of the bipartite graphs J, and J,, in W associated with an
edge (u, v) E C
. . . . . . . . . . . ... . . . . . . . . . . . . . . . . . 107
5-1
RMS vs number of iteration (normalized) for VLS and ELS
5-2
ELS: Failure fraction vs c for r = 2 with planted 3-regular graph
(n = 100)
5-3
. . . . . 143
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
ELS: Failure fraction vs c for r = 3 with planted 4-regular graph
(n = 100)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5-4 VLS: Failure fraction vs c for r = 2 with planted 3-regular graph
(n = 100)
. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . 146
9
10
Chapter 1
Introduction
1.1
Graphical Models and Inference
Graphical models represent joint probability distributions with the help of directed or
undirected graphs. These models aim at capturing the structure present in the joint
distribution, such as conditional independencies, via the structure of the representing
graph. By exploiting the structure it is often possible to produce efficient algorithms
for performing inference tasks on these models. This has led to graphical models
being useful in several applications, e.g., image processing, speech and language
processing, error correcting codes, etc.
In this thesis we will be primarily concerned with undirected graphical models,
where the joint probability distribution of a set of random variables X 1 , ...
represented on an undirected graph
g
, X,
is
= (V, E). The random variables X are asso-
ciated with the vertices in V. Undirected graphical models represent independencies
via the so-called graph separation property. In particular, let A, B, C C [n] be disjoint vertices in V. Let XA, XB and Xc denote sub-vectors of X corresponding to
11
indices in A, B and C respectively. Then XA and XB are independent conditioned
on XC whenever there is no path from A to B that does not pass through C, i.e., C
separatesA and B.
The Hammersely-Clifford theorem [24] says that the joint distribution on g satisfies P(x) > 0, then it can be factorized as
P(x)
=
fl0k(xc),
(1.1)
cEC
where C is the set of all maximal cliques of g and Z = E,
Ec 0,5(xe) is the
normalizing constant, known as the partitionfunction of the distribution.
Broadly speaking, inference in graphical models refers to the following two problems: a) Computation of marginalor conditionalmarginalprobabilitiesand b) Computation of the Maximum Aposteriori Probability (MAP) assignment. The first problem involves an integration or counting operation of the form
P(x 2 )
(1.2)
=c(xc).
eEC
j:Ai
The second problem of computing the MAP assignment is an optimization problem
of the form
XMAP =
arg maxflOc(x).
(1.3)
cEC
Both of these problems are typically computationally difficult. In fact, in the general
setting, computing MAP assignments is known to be an NP-complete problem and
computing marginals is known to be a #P-complete problem. In such cases, it is
often desirable to produce approximate solutions in polynomial time by constructing
12
Polynomial Time Approximation Schemes (PTAS). More precisely, if we denote the
quantity of interest, i.e., marginal probabilities or MAP assighments associated with
each vertex of a graph on n vertices to be
(v), then for a fixed e > 0, a PTAS
computes an approximate solution O(v) which is at most a multiplicative factor
of (1
E)
away from 4(v) in time polynomial in n. If the algorithm also takes
time polynomial in 1/c, then it is called a Fully Polynomial Time Approximation
Scheme (FPTAS). Approximation algorithms can be either randomized (in which
case abbreviations PRAS and FPRAS are commonly used) or deterministic in nature.
For some graphical models, it may not be possible to find a PTAS, unless P = NP.
There is a substantial body of literature [30], [15], [1]. exploring the existence of
PTAS for NP-hard problems.
1.2
Decay of Correlations
The decay of correlations property is a property associated with either the graphical
model itself (spatial correlation decay) or an iterative algorithm A that computes
the quantity of interest q(v) (temporal or computation tree correlation decay). It
describes a long range independence phenomenon, where the dependence on boundary/initial conditions fades as one moves away from the boundary/initial time.
(a) Spatial correlation decay: The impact on 4(v) resulting from conditioning on
values of vertices that are at a distance greater than d from v decays as d
increases.
More precisely, for any vertex v let B(v, d) denote the set of all
vertices of G at distance at most d from v. Let x,, and X,2 be two assignments
13
on X(v,r) . Then
S-c(d) <) <1+E(d)
-t(V;
Xa,2 )-
(1.4)
where E(d) is some decaying function of d that dictates the rate of correlation
decay.
(b) Temporal correlation decay: The impact of initial conditions on the quantity
$(v) computed by d iterations of the algorithm A decays as d increases. Similar
to spatial correlation decay, if we denote by xz
and
4) two initial conditions
provided to the algorithm A, then temporal decay of correlations refers to the
statement for 0(v) analogous to (1.4), i.e.,
1-E(d)
o(v;
V (0)) < 1+ e(d).
(1.5)
The decay of correlations property has its origins in statistical physics, specifically in the study of interacting spin systems. In this context, the spatial decay of
correlations property is referred to as spatialmixing where it is used to describe the
long range decay of correlations between spins with respect to the Gibbs measure
associated with the system. Spatial mixing has its implications on the uniqueness of
the so called infinite volume Gibbs measure [13], [21], [49]. The onset of spatial mixing is associated with a phase transition from multiple Gibbs measures to a unique
Gibbs measure.
On the other hand, temporal decay of correlations is referred to as temporal mixing in statistical physics, where it is most often used to describe the mixing time
of the so called heat bath Glauber Dynamics. The Glauber Dynamics is a Markov
14
chain on the state space of the spin system whose steady state distribution is the
Gibbs measure. Temporal mixing attracted a lot of attention in theoretical computer
science, because it can be used to produce approximation schemes based on Markov
Chain Monte Carlo (MCMC) methods.
Spatial mixing as well has been used to
produce approximation schemes which are deterministic in nature and are based on
directly exploiting the spatial long range independence. In the context of theoretical
computer science, this provides insight into the hardness of algorithmic approxima,
tions in several problems. There is a general conjecture that the onset of spatial
mixing (phase transition to the unique Gibbs measure regime) coincides with the
phase transition in the hardness of approximation. Specifically, the approximation
problem is tractable (solvable in polynomial time) if and only if the corresponding
Gibbs measure exhibits the decay of correlations property.
1.3
Organization of the thesis and contributions
In this section, we describe the problems we investigate in the thesis, along with a
brief survey of the related literature. We state our main results and describe briefly
the technical tools we used to establish them. We also state a few relevant open
problems.
STRONG SPATIAL MIXING FOR THE LIST COLORING PROBLEM,
CHAPTER 2
Formulation
The list coloring problem is a generalization of the classical graph coloring problem.
In the list coloring problem on a graph g
=
(V, C), with maximum degree A, each
vertex v is provided with a list of colors L(v) it can choose from, where L(v) C
15
{1, 2,
...
q}, where {1, 2,.. , q} is the superset of all colors. In a valid coloring, each
vertex is assigned a color from its own list such that no two neighbors in g are assigned
the same color. It is well known that in this general setting, deciding whether a valid
coloring exists is NP-hard and counting the number of proper colorings is #P-hard.
One can associate a natural graphical model with the list coloring problem which
represent the uniform distribution on the space of all valid colorings. The partition
function of this graphical model is the total number of valid colorings. As mentioned
earlier, computing the partition function, and hence performing the inference task of
computing the marginal probabilities exactly is computationally intractable. There
is a large body of literature describing randomized approximation schemes to compute marginal probabilities and the number of valid colorings based on a Markov
Chain Monte Carlo (MCMC) method. Most results are based on establishing fast
mixing of the underlying Markov chain known as the Glauber Dynamics [31], [261,
[51], [42]. Deterministic approximation schemes based on decay of correlations in the
computation tree have also been studied in the literature [18]. These results are established with an assumption on the relation between the number of colors available
and the degree, i.e., ILI > aA + #. Over time, considerable research effort has been
directed towards relaxing the assumption by decreasing the value of a required in
the aforementioned condition. In fact, it is known for the special case of A-regular
trees that spatial decay of correlations exists as early as a = 1 and 8 = 2, which
also marks the boundary of phase transition into the unique Gibbs measure regime.
It is conjectured that a = 1 and 6 = 2 is sufficient for existence of spatial decay
of correlations and existence of approximation algorithms in the list coloring problem.
Contributions
In this thesis, we establish the existence of a strong version of spatial decay of cor16
relations called Strong Spatial Mixing (SSM) for the list coloring problem whenever
a > a* ;
1.76. Ours is the most general conditions under which strong spatial mix-
ing has been shown to exist in the literature, and is a step towards establishing SSM
for a = 1 as conjectured. As a corolary of our result, we also establish the uniqueness
of Gibbs measure for the list coloring problem and the existence of approximation
schemes for computing marginal probabilities in this regime.
Our proof technique has two main components. The first component is a recursion we derive that relates the marginal probabilities at a vertex to the marginal
probabilities of its neighbors. This recursion is a variation of the one derived in [18]
where our recursion deals with the ratio of marginals rather than directly with the
marginals themselves. The second component is the construction of an error function which is chosen suitably such that it interacts well with our recursion. To prove
our result, we then show that the distance between two marginals induced by two
different boundary conditions satisfies a certain contractionproperty with respect to
the error function.
Open Problems
There are several problems still to be addressed. One is to tighten our result and
establish SSM for a = 1 and 6 = 2 as conjectured. Second, the SSM result we establish, allows the construction of a PTAS for computing marginal probabilities, and
computing the exponent of the partition function up to a constant factor. However,
it does not directly lead to an FTPAS for the partition function. It is still unresolved
whether SSM by itself is sufficient for constructing FPTAS.
17
GIANT COMPONENT IN RANDOM MULTIPARTITE GRAPHS, CHAPTER 3
Formulation
The problem of the existence of a giant component in random graphs was first studied by Erdos and R6nyi. In their classical paper [16], they considered a random
graph model on n and m edges where each such possible graph is equally likely.
They showed that if m/n > 1 + c, then with high probability as n -+ oo, there
exists a component of size linear in n in the random graph and that the size of this
component as a fraction of n converges to a fixed constant. The degree distribution
of the classical Erd6s-R6nyi random graph has asymptotically Poisson degree distribution. However in many applications the degree distribution associated with an
underlying graph does not satisfy this property. For example, many so-called "scalefree" networks exhibit power law distribution of degrees. This motivated the study
of random graphs generated according to a given degree sequence. The giant component problem on a random graph generated according to a given degree sequence
was considered by Molloy and Reed [43]. They showed that if a random graph has
asymptotic degree sequence given by {pj}L 1 , then with high probability it has a
giant component whenever the degree sequence satisfies
'jj(j - 2)p > 0, along
with some additional regularity conditions including the bounded second moment
condition. A key concept in their proof was the construction of the so called exploration process which reveals the random component of a vertex sequentially. They
showed that whenever the degree sequence satisfies the aforementioned condition, the
exploration process has a strictly positive initial drift, which when combined with
the regularity conditions is sufficient to prove that at least one of the components
must be of linear size. They also show that there is no giant component whenever
18
{IM!, < 0 and in [44] also characterize the size of the giant component. Since
then, there has been several papers strengthening the results of Molloy and Reed
and several beautiful analysis tools have been invented along the way. We defer a
more detailed literature review to Chapter 3. We mention here a relatively recent
paper by Bollobas and Riordan [81, where they use a branching process based analysis that offers several advantages, including tighter probability bounds and relaxing
the finite second moment assumption.
In this thesis, we study random multipartite graphs with p parts with given degree distributions. Here p is a fixed positive integer. Each vertex is associated with
a degree vector d, where each of its component di, i E [p] dictates the number of
neighbors of the vertex in the corresponding part i of the graph. Several real world
networks naturally demonstrate a multipartite nature. The author-paper network,
actor-movie network, the network of company ownership, the financial contagion
model, heterogenous social networks, etc. are all multipartite [46], [9], [27 in nature. Examples of biological networks which exhibit multipartite structure include
drug target networks, protein-protein interaction networks and human disease networks [22], [56], [45. In many cases evidence suggests that explicitly modeling the
multipartite structure results in more accurate models and predictions.
We initiated our study of the giant component problem in multipartite graphs
because it was needed as a part of the proof of our hardness result in Chapter 4 for
the MAX-CUT with edge deletions problem. However, the problem of giant component is also of independent interest and we devote a whole chapter to it.
Contributions
We provide exact conditions on the degree distribution for the existence of a giant
component in random multipartite graphs with high probability. In the case where
19
a giant component exists, we also provide a characterization of its size in terms of
parameters of the degree distribution.
Our proofs involve a blend of techniques from Molloy and Reed [43] and Bollobas
and Riordan [8] along with results from the theory of multidimensional branching
processes. We show that whenever the matrix of means M of the so called edge-biased
degree distribution satisfies the strong connectivity property, then the existence of a
giant component is governed by the Perron-Frobenius eigenvalue Am of M. Whenever
AM > 1, we show that a certain Lyapunov function associated with the exploration
process of Molloy and Reed has a strictly positive drift. The Lyapunov function
we use is one often used in the analysis of multi type branching processes, which is
a weighted f, norm Lyapunov function constructed by using the Perron-Frobenius
eigenvector of M. To establish results regarding the size of the giant component, we
use a coupling argument relating from Bollobas and Riordan [8], relating the exploration process to the multi type branching process associated with the edge-biased
degree distribution.
Open Problems
Our result only addresses the super-critical and sub-critical case, but leaves the
critical case unresolved. This has been studied in detail for random graphs with
degree distributions for the uni-partite case [34]. It may be possible to extend these
results to the multipartite case as well.
20
MAX-CUT ON GRAPHS WITH RANDOM EDGE DELETIONS, CHAPTER 4
Formulation
The problem of finding a cut of maximum size in an arbitrary graph g = (V, C) is a
well known NP-hard problem in algorithmic complexity theory. In fact, even when
restricted to the set of 3-regular graphs, MAX-CUT is NP-hard with a constant
factor approximability gap [5], i.e., it is NP-hard to find a cut whose size is at least
(1- Ec,.itia) times the size of the maximum cut, where c ,.mg = 0.997. The weighted
MAX-CUT is a generalization of the MAX-CUT problem, where each edge e E E is
associated with a weight we, and one is required to find a cut such that the sum of
the weights of the edges in the cut is maximum.
We study a randomized version of the weighted MAX-CUT problem, where each
weight W is a Ber(p) random variable, for some 0 5 p < 1, and the weights associated with distinct edges are independent. Since the weights are binary, weighted
MAX-CUT on the randomly weighted graph is equivalent to MAX-CUT on a thinned
random graph where the edges associated with zero weights have been deleted. We
call this problem the thinned MAX-CUT. The variable p controls the amount of
thinning. It is particularly simple to analyze thinned MAX-CUT, when p takes one
of the extreme values, i.e., p = 1 and p = 0. When p = 1, all edges are retained
and the thinned graph is same as the original graph and finding the maximum cut
remains computationally hard. On the other hand, when p = 0, the thinned graph
has no edges, and the MAX-CUT problem is trivial. This leads to a natural question
of whether there is a hardness phase transition at some value of 0 < p = pc < 1.
Our study of the thinned MAX-CUT problem is motivated from the point of view
of establishing a correspondence between phase transition in hardness of computation
21
and decay of correlations. Our formulation of the thinned MAX-CUT problem was
inspired by a similar random weighted maximum independent set problem studied
by Gamarnik et. al in [19].
Contributions
We identify a threshold for hardness phase transition in the thinned MAX-CUT
problem. We show that on the set of all graphs with degree at most d, the phase
transition occurs at p, = d_1. We show that this phase transition coincides with a
phase transition in decay in correlations resulting from the connectivity properties
of the random thinned graph. This result is a step towards showing the equivalence
of spatial mixing and hardness of computation.
For p < pc we show that the random graph resulting from edge deletions undergoes percolation, i.e., it disintegrates into disjoint connected components of size
O(log n). The existence of a polynomial time algorithm to compute MAX-CUT then
follows easily. For p > pc, we show NP-hardness by constructing a reduction from
approximate MAX-CUT on 3-regular graphs.
Our reduction proof uses the ran-
dom bipartite graph based gadget W similar to [501, where it was used to establish
hardness of computation of the partition function of the hardcore model. Given a
3-regular graph g, the gadget W is d-regular and is constructed by first replacing
each vertex v of g by a random bipartite graph J, of size 2n' consisting of two parts
R, and S, and then adding connector edges between each pair of these bipartite
graphs whenever there is an edge between the corresponding vertices in g. The key
result in our reduction argument is that the gadget W satisfies a certain polarization
property which carries information about the MAX-CUT on the 3-regular graph g
used to construct it. Namely, we show that any maximal cut of the thinned graph
-,
obtained from W must fully polarize each bipartite graph and include all its edges
22
in the cut, i.e., either assign the value 1 to almost all vertices in R, and 0 to almost
all vertices in S, or vice versa. Then, we show that the cut Cg of g obtained by
assigned binary values to its vertices v based on the polarity of J, must be at least
(1 - ec,.tes)-optimal. To establish this polarization property we draw heavily on the
results and proof techniques from Chapter 3 about the giant component on random
multipartite graphs.
Open Problems
While our result finds the exact hardness phase transition threshold for the thinned
MAX-CUT problem, the same remains unresolved for the thinned maximum independent set problem in [19] from which our problem was originally inspired. There, only
half of the result has been established, as in the region where decay of correlations
and an approximate algorithm exists has been identified. It remains open whether
in the absence of decay of correlations, computing the maximum independent set is
computationally hard.
ALGORITHMS FOR LOW RANK MATRIX COMPLETION, CHAPTER 5
Formulation
Matrix completion refers to the problem of recovering a low rank matrix from a subset of its entries. This problem arises in a vast number of applications that involve
collaborativefiltering, where one attempts to predict the unknown preferences of a
certain user based on collective known preferences or a large number of users. It
attracted a lot of attention due to the famous Netfliz prize, which involved reconstructing the unknown movie preferences of Netflix users.
23
In matrix completion, there is an underlying low rank matrix M E R" of rank
r < n , i.e., M = a)' where a, P E Rn"x. The value of the entires of M on a subset
a indices S C [n] x [n] is revealed. Denoting by Mg the subset of revealed entries of
M, the two major questions of matrix completion are:
(a) Given Mo, is it possible to reconstruct M?
(b) Can the reconstruction in (a) be done efficiently?
Without any further assumptions, matrix completion is an ill posed problem with
multiple solutions and is in general NP-hard [41]. However under certain additional
conditions, the problem has been shown to be tractable. The most common assumption adopted in the literature is that the matrix M is "incoherent" and the subset
E is chosen uniformly at random. The incoherence condition was introduced in [10],
[11}, ivhere it was shown that convex relaxation resulting in nuclear norm minimization succeeds with further assumptions on the size of E. In [35] and [361, the authors
use an algorithm consisting of a truncated singular value projection followed by a
local minimization subroutine on the Grassmann manifold and show that it succeeds
when JE ==
(nrlog n). In [28], it was shown that the local minimization in [35] can
be successfully replaced by Alternating Minimization. The use of Belief Propagation
(BP) for matrix factorization has also been studied by physicists in [33] heuristically,
where they perform a mean field analysis on the performance of BP.
Matrix completion can be recast as a bi-convex least squares optimization problem over a graph g = (V, 6) whose edges represent the revealed entries of M. On
this graph, Alternating Minimization can be interpreted as a local algorithm, where
the optimization variables are associated with the vertices V. In each iteration, the
variable on a vertex is updates using the value of its neighbors. Since in [28], Alternating Minimization is preceded by a Singular Value Projection (SVP), we call it
24
a warm-start Alternating Minimization. In this thesis, we investigate, Alternating
Minimization, which we call Vertex Least Squares (VLS), when it is used with a cold
start. We also propose two new matrix completion algorithms.
Contributions
We analyze VLS with a cold start and prove that in the special case of positive rank
one matrices, it can successfully reconstruct M from Mg. More specifically, we show
that if M = af' with a,# > 0 and the graph G is connected, has bounded degree
and diameter of size O(log n), then VLS reconstructs M up to a Root Mean Square
(RMS) error of e in time polynomial in n.
We propose a new matrix completion algorithm called Edge Least Squares (ELS),
which is a message passing variation of VLS. We show through simulations that ELS
performs significantly better than VLS, both in terms of sample complexity and time
till convergence. The superior cold start performance of ELS suggests that ELS with
warm start can be perhaps very successful, and would be better than VLS with warm
start.
We also provide a simple direct decoding algorithm, which we call Information
Propagation (IP). We prove that under certain strong connectivity properties of g,
Information Propagation can recover M in linear time. We show that when r = 0(1),
the required strong connectivity property satisfied by a bipartite Erdos-Renyi graph
g(n,p) with p = Q ((In)1/r)
Open Problems
It remains an open problem to provide a theoretical analysis to prove the convergence
of ELS. The full power of cold start VLS is also unresolved and it may be possible to
extend our proof for the rank one case to the higher rank case. Additionally, it may
25
be possible to provide theoretical analysis to demonstrate the superior performance
of ELS over VLS.
26
Chapter 2
Strong Spatial Mixing for List
Coloring of Graphs
2.1
Introduction
In this chapter we study the problem of list colorings of a graph. We explore the
strong spatial mixing property of list colorings on triangle-free graphs which pertains to exponential decay of boundary effects when the list coloring is generated
uniformly at random from the space of valid list-colorings. This means fixing the
color of vertices far away from a vertex v has negligible impact (exponentially decaying correlations) on the probability of v being colored with a certain color in its list.
Strong spatial mixing is an instance of the spatial decay of correlations property. A
related but weaker notion is weak spatial mixing. Strong spatial mixing is stronger
than weak spatial mixing because it requires exponential decay of boundary effects
even when some of the vertices near v are conditioned to have fixed colors. Because
of this added condition, strong spatial mixing is particularly useful in computing
27
conditional marginal probabilities.
Jonasson in [32] established weak spatial mixing on Kelly (regular) trees of any
degree A whenever the number of colors q is greater than or equal to A +1.
How-
ever the weakest conditions for which strong spatial mixing on Kelly trees has been
established thus far is by Ge and Stefankovic [20] who proved strong spatial mixing
when q > a*A + 1, where a* = 1.763.. is the unique solution to xe
11
= 1. For lat-
tice graphs (or more generally triangle-free amenable graphs) strong spatial mixing
for coloring was established by Goldberg, Martin and Paterson in [23] for the case
q > a*A - 3 for a fixed constant P. In fact their approach can be extended to the
case of list coloring problem, but the graph amenability restriction still applies.
In this chapter we generalize these results under only mildly stronger condition.
We establish the strong spatial mixing of list colorings on arbitrary bounded degree
triangle free graphs whenever the size of the list of each vertex v is at least aA(v) +#,
where A(v) is the degree of v, a satisfies a > a* and / is a constant that only depends
on a.
The spatial mixing property is closely related to uniqueness of the infinite volume
Gibbs measure on the spin system defined by the list coloring problem. In fact weak
spatial mixing is a sufficient condition for the Gibbs measure to be unique. In
its turn, strong spatial mixing is closely related to the problem of approximately
counting the number of valid colorings of a graph, namely the partition function of
the Gibbs measure. In particular, for amenable graphs strong spatial mixing implies
rapid mixing of Glauber dynamics which leads to efficient randomized approximation
algorithms for computing the partition functions, e.g. in [31}, [26], [51], [42] etc.
The decay of correlations property similar to strong spatial mixing has also been
shown to lead to deterministic approximation algorithms for computing the partition
functions.
This technique was introduced by Bandyopadhyay and Gamarnik [3]
28
and Weitz [53] and has been subsequently employed by Gamarnik and Katz [18}
for the list coloring problem. Since decay of correlations implies the uniqueness of
Gibbs measure on regular trees and regular trees represent maximal growth of the
size of the neighborhood for a given degree, it is a general conjecture that efficient
approximability of the counting problem coincides with the uniqueness of Gibbs
measure on regular trees. More precisely the conjecture states that there exists a
polynomial time approximation algorithm for counting colorings of any arbitrary
graph, whenever q > A + 2. We are still very far from proving this conjecture or
even establishing strong spatial mixing under this condition.
The formulation in this chapter is similar to [18] . It was shown there that the
logarithm of the ratio of the marginal probabilities at a given node induced by the
two different boundary conditions contract in fe, norm as distance between the node
and the boundary becomes large, whenever IL(v)I > aA(v)+#8, and a > a** ; 2.78..
and 8 is a constant that only depends on a. In this chapter we measure the distance
with respect to a conveniently chosen error function which allows us to tighten the
contraction argument-and relax the required condition to a > a* ; 1.76... This
also means that the Gibbs measure on such graphs is unique. Unlike [181 the result
presented in this chapter unfortunately does not immediately lead to an algorithm for
computing the partition function. It does however allow us to compute the marginal
probabilities approximately in polynomial time. It also allows us to estimate the
exponent of the number of valid colorings, namely apprximate the log-partition
function in polynomial time.
The rest of the chapter is organized as follows. In Section 3.2 we introduce the
notation, basic definitions and preliminary concepts. Also in this section we provide
the statement of our main result and discuss in detail its implications and connections
to previous results. In Section 2.3 we establish some preliminary technical results.
29
In Section 2.4 we prove our main result of this chapter. We conclude in Section 2.5
with some final remarks.
2.2
Definitions and Main Result
We denote by g = (V, E) an infinite graph with the set of vertices and edges given
by V and E. For a fixed vertex v E V we denote by A(v) the degree of v and by
A the maximum degree of the graph, i.e. A = maxEv A(v) < oo. The distance
between two vertices v, and v2 in V is denoted by d(vi, v 2 ) which might be infinite if
v, and v 2 belong to two different connected components of g. For two finite subsets
of vertices
1
I c V and '2 C V, the distance between them is defined as d(T1,
min{d(vi, v 2 ) :
1
E I1, v2 E
F2}.
We assume {1, 2, ...
, q}
2)
=
to be the set of all colors.
Each vertex v E V is associated with a finite list of colors L(v)
c
{1, 2,..., q} and
C = (L(v) : v E V) is the sequence of lists. The total variational distance between
two discrete measures p, and L2 on a finite or countable sample space Q is given by
|| - A2II and is defined as |tiJ
-
121 =ZWEn 'I,.pi(W) - p2(U)|.
A valid list coloring C of g is an assignment to each vertex v E V, a color
c(v) E L(v) such that no two adjacent vertices have the same color. A measure y
on the set of all valid colorings of an infinite graph g is called an infinite volume
Gibbs measure with the uniform specification if, for any finite region IF
; g, the
distribution induced on T by p conditioned on any coloring C of the vertices V\T is
the uniform conditional distribution on the set of all valid colorings of T. We denote
this distribution by M4. For any finite subset T C g, let aT denote the boundary of
T, i.e. the set of vertices which are adjacent to some vertex in IF but are not a part
of T.
30
Definition 2.1. The infinite volume Gibbs measure 1A on ! is said to have strong
spatial mixing (with exponentially decaying correlations) if there exists positive constants A and 0 such that for any finite region T C 9, any two partial colorings C 1 , C2
(i.e. vertices to which no color has been assigned are also allowed) of V\* which
differ only on a subset W C 8,
I
and any subset A C 417
- A2I < AAje~
(2.1)
Here |yIli - pIC2A denotes the total variational distance between the two distributions y
and pf2 restricted to the set A.
We have used the definition of strong spatial mixing from Weitz's PhD thesis
[52]. As mentioned in [521, this definition of strong spatial mixing is appropriate for
general graphs. A similar definition is used in [23], where the set W of disagreement
was restricted to be a single vertex. This definition is more relevant in the context
of lattice graphs (or more generally amenable graphs), where the neighborhood of a
vertex grows slowly with distance from the vertex. In that context, the definition
involving one vertex disagreement and the one we have adopted are essentially the
same.
Let a* = 1.76.. be the unique root of the equation
1
e- =
1.
For our purposes we will assume that the graph list pair (9, C) satisfies the following.
Assumption 2.1. The graph g is triangle-free. The size of the list of each vertex v
31
satisfies
IL(v)I > aA(v) + 0.
4
for some constant a > a* and 6 = /8(a);
(1
-
(2.2)
is such that
1//3)ae-(1+1/)>1
Using the above assumption we now state our main result.
Theorem 2.1. Suppose Assumption 3.1 holds for the graph list pair (G, ).
Then
the Gibbs measure with the uniform specification on (G, ) satisfies strong spatial
mixing with exponentially decaying correlations.
We establish some useful technical results in the next section before presenting
the details of the proof in Section (2.4).
2.3
Preliminary technical results
The following theorem establishes strong spatial mixing for the special case when A
consists of a single vertex.
Theorem 2.2. Let q, A, a and 3 be given. There exists positive constants B and
y depending only on the preceding parameters such that the following holds for any
graph list pair (G, ) satisfying Assumption 3.1. Given any finite region T C g, any
two colorings C 1 , C2 of V\T which differ only on a subset W C OT, and any vertex
32
v E T and color j E L(v), we have
(1-) <P(c(v)
P(c(V) =
- jCi)
I < < (1 + E)
(2.3)
~P(C(V) = jIC2)
where e = Be-d(W)
We will now show that Theorem 2.1 follows from Theorem 2.2.
Proof of Theorem 2.1. To prove this, we use induction on the size of the subset A.
The base case with JAI = 1 is equivalent to the statement of Theorem 2.2. Assume
that the statement of Theorem 2.1 is true whenever JAI
t for some integer t > 1.
We will use this to prove that the statement holds when JAI = t + 1. Let the
vertices in A be v,v 2 ,..., vt+1. Let vk = (v 1 ,.
..
,vk) and Jk
=
(ji,...,
jk)
where
j E L(vi), 1 < i < k. Also let c(vk) = (c(vi), c(v2 ),..., c(vk)) denote the coloring of
the vertices v 1,V 2 ,... Vk.
)
P(c(vt+1 ) = Jt+1 C1 ) =P(c(vt) = JtlC)P(c(vt+i = it+i)Ic(vt) = Jt, C1
<(1 + E)P(C(Vt) = JtIC1)P(c(Vt+ = jt+l)fc(vt) = Jt, C2 ).
The inequality in the last statement follows from Theorem 2.2. This gives
P(c(vt+i) = Jt+|C1) - P(c(vt+i) = 4
)
C+1C
2
=
Jt, C2
)
(1 + E)P(c(vt) = JtIC)P(c(vt+1 = jt+1)IC(vt)
)
-P(c(vt) = JtIC2)P(C(Vt+l = jt+l)fc(vt) = Jt, C 2
)
=EP(c(vt) = JtIC)P(c(Vt+l = jt+l)c(vt) = Jt, C 2
+P(c(Vt+l = jt+1)jc(vt) = Jt, C2){P(c(vt)
33
=
JtICI) - P(c(vt)
=
JtIC2)}-
Similarly,
-
P(c(vt+i) = Jt+1 jC2
)
P(c(vt+i) = Jt+I jC 1)
)
>- EP(c(vt) = JtIC1)P(c(Vt+1 = jt+I)Ic(vt) = Jt, C2
+P(c(Vt+1 = it+I)Ic(vt) = Jt, C2 ){P(c(vt) = JtIC1) - P(C(vt) = JtIC 2 )}Combining the above, we get
IP(c(vt+I)
=
Jt+1 C1) - P(c(vt+i) = Jt+11C2 )l
)
eP(C(V) = JttC1)P(c(vt+i = jt+1)jc(vt) = Jt, C2
+P(c(vt+i = jt+1 ) c(v) = Jt, C2) I{P(c(vt) = JtICI)
-
P(c(vt) = JtIC2)}I-
We can now bound the total variational distance lIpIl - p211A as follows.
P(c(vt+i) = Jt+1IC) - P(c(vt+i) = Jt+1 IC2 )1
jiEL(vi), 1 itt+1
J, C 2
)
EP(c(vt) = JtIC1)P(c(Vt+l = jit+) Ic(vt)
jiEL(v,), 1 i:t+1
P(c(t+ =jt+1)Ic(vt) = J, C2)
+
jiEL(vj),
1<iit+1
x I{P(c(vt) = JtICi)
+whret
n f f|m et i
-
P(c(vt) = JtIC2)}|
o hs (t +cp)e
where the last statement follows from the induction hypothesis. This completes the
0
induction argument.
So, in order to establish Theorem 2.1, it is enough to show that Theorem 2.2 is
34
true. We claim that Theorem 2.2 follows from the theorem below which establishes
weak spatial mixing whenever Assumption 3.1 holds. In other words under Assumption 3.1, strong spatial mixing of list colorings for marginals of a single vertex holds
whenever weak spatial mixing holds. In fact g need not be triangle-free for this
implication to be true as will be clear from the proof below.
Theorem 2.3. Let q, A, a and P be given. There exists positive constants B and
-y depending only on the preceding parameters such that the following holds for any
graph list pair (9, L) satisfying Assumption 3.1. Given any finite region T C g, any
two colorings C 1 , C2 of V\'I, we have
(1
where c = Be -
P(c(v) =jjCi)
P(c(v) = jC) < (1+ e)
P(C(V) = jjC2)
(2.4)
,aO)
We first show how Theorem 2.2 follows from Theorem 2.3.
Proof of Theorem 2.2. Consider two colorings C1 , C2 of the boundary c9i of I which
differ only on a subset W C ft as in the statement of Theorem 2.2. Let d = d(v, W).
We first construct a new graph list pair (a', C') from (g, L). Here g' is obtained from
G by deleting all vertices in OT which are at a distance less than d from v. Notice
that for all such vertices C1 and C2 agree. Whenever a vertex u is deleted from g,
remove from the lists of the neighbors of u the color c(u) which is the color of u under
both C1 and C 2 . This defines the new list L'. In this process, whenever a vertex u
loses a color in its list it also loses one of its edges. Also for a > a* > 1, we have
IL(v)| -1 > a(A(v) - 1)+,8 whenever IL(v)| > aA(v) +fl. Therefore, the new graph
list pair (', V') also satisfies Assumption 3.1. Define the region ' C !' as the ball
35
of radius (d -1)
centered at v. Let Di and D2 be two colorings of (V)c which agree
with C1 and C 2 respectively. From the way in which g', ' is constructed we have
Pg,4 C(c(v) = jjCj) = Pg,,,C(c(v) = j|D),
for i = 1, 2.
(2.5)
where Pg, (E) denotes the probability of the event E in the graph list pair (G).
If V' is the set of all vertices of g', then D, and D2 assign colors only to vertices in
V'\v. So we can apply Theorem 2.3 for the region V' and the proof is complete.
L
So it is sufficient to prove Theorem 2.3 which we defer till section (2.4). We use
the rest of this section to discuss some implications of our result and connections
between our result and previous established results for strong spatial mixing for
coloring of graphs.
The statement in Theorem 2.3 is what is referred to as weak spatial mixing [521.
In general weak spatial mixing is a weaker condition and does not imply strong
spatial mixing. This is indeed the case when we consider the coloring problem of a
graph g by q colors, i.e. the case when the lists L(v) are the same for all v E V.
However, interestingly, as the above argument shows, for the case of list coloring
strong spatial mixing follows from weak spatial mixing when the graph list pair
satisfies Assumption 3.1.
We observed that the strong spatial mixing result for amenable graphs in [23]
also extends to the case of list colorings. Indeed the proof technique only requires a
local condition similar to that in Assumption 3.1 that we have adopted as opposed
to a global condition like q ;> aA + P. Also in [23], the factor JAI in the definition
(2.1) was shown to be not necessary which makes their statement stronger. We show
that this stronger statement is also implied by our result. In particular, assuming
Theorem 2.2 is true, we prove the following corollary.
36
Corollary 2.1. Let q, A, a and ft be given. There exists positive constants B and
-y depending only on the preceding parameters such that the following holds for any
graph list pair (9,,C) satisfying Assumption 3.1. Given any finite region T C !, any
two colorings C1 , C 2 of the boundary ni of T which differ at only one point f E ft,
and any subset A C xF,
| JAI-
Proof. Let the color of
f
; Be --d(Af),
pIC211A
(2.6)
be ji in C1 and j2 in C2 . Let C(A) be the set of all possible
colorings of the set A.
-
p421A
=
IP(u(f) = ji) - P(alc(f) = j2)1
~
oEC(A)
P(c(f)= jila)
P(C(f) = 2
P2 ji)
EC(h
P(c(f) = j210)
For any j E L(f), using Theorem 2.2 we have for E = Be-"Yd(Af),
P(c(f) = jla)
P(c(f) = ijo-)
E,'EC(A) P(c(f)
P(c(f) = j)
ZO!'EC(A)
=
ijl')P(')
P(c(f) = jjAO)P(u')
YX'Ec(A) P(c(f) =
< EOEC(A) P(c(f) =
jlo')P(c')
il')(1 + e)P(o')
EEC(A) P(Cf) = jj)P()
=
1+ E.
37
j
Similarly we can also prove for any j E L(f)
P(c(f) =lo 1-)
P(c(f) = j)
Therefore,
pJC - p
21A
(+
<
f)
- 1
)[P(u-) = 2f.
orEC(A)
The notion of strong spatial mixing we have adopted also implies the uniqueness
of Gibbs measure on the spin system described by the list coloring problem. In fact
Weak Spatial Miung described in Theorem 2.3 is sufficient for the uniqueness of
Gibbs measure (see Theorem 2.2 and the discussion following Definition 2.3 in [52]).
We summarize this in the corollary that follows.
Corollary 2.2. Suppose the graph list pair g, L satisfy Assumption 3. 1. Then the
infinite volume Gibbs measure on the list colorings of G is unique.
Proof of Theorem 2.3
2.4
Let v E V be a fixed vertex of g. Let m = A(v) denote the degree of v and let
Vi , v2,
...
,
m be the neighbors of v. The statement of the theorem is trivial if m = 0
(v is an isolated vertex). Let q, = IL(v)I and q,,,= L(vi)1. Also let
g,
be the graph
obtained from g by deleting the vertex v. We begin by proving two useful recursions
on the marginal probabilities in the following lemmas.
38
Lemma 2.1. Let j1, j2 E L(v). Let j,3j denote the list associated with graph~g
4
which is obtained from C by removing the color j, from the lists L(vk) for k < i
and removing the color j2 from the lists L(vk) for k > i (if any of these lists do not
contain the respective color then no change is made to them). Then we have
Pgc(c(v) = ji)
PgX-(c(v) = j2)
m
j=
1
-
Pg.,L,, 2 (c(vi) = ji)
- P9.,L,,j,(c(vi) = 32)'
Proof Let Zge(M) denote the number of colorings of a finite graph g with the
condition M satisfied. For example, Zgc(c(v) = j) denotes the number of valid
colorings of g when the color of v is fixed to be j E L(v). We use a telescoping
product argument to prove the lemma:
Pg C(c(v)
ji)
PgC(c(v) = j2)
=
Zg"c(c(v) = ji)
ZgX(c(v) = 32)
Zg~,c(c(vi) # ji, 1 <i < M)
Zg,,(C(Vi) 9k j2, 1 <-
M
< m)
Pgg(c(vi) # ji 1 K i < m)
Pg 4X(c(vi) #?2, 1< i <iM)
M Pg,,g(c(vk) 0 j1, 1 < k <
i, c(vk) 34 j2,
Pg,(c(vk) = ji, 1 k <i - 1, c(v)
Pg,
L,,(C()
c(v)
Oi,
k<
j 2 jC(vk)
54j,17
<k<
0 2i
P9,,(c(0)A
M1 -Pg,,-14
i +l
5k 5m)
j2,
i<k<m)
C(vk)
j2,
i+
<k<m)
1, c(vk)
=A j2,
i+
< k< m)
1,
(c(vi) = ji)
Pg,,(C(Vi)
1 = j2)
01
The following lemma was proved in [181. We provide the proof here for completeness.
39
Lemma 2.2. Let j E L(v). Let Ci
denote the list associated with the graph G,
which is obtained from C by removing the color j (if it exists) from the list L(Vk) for
k < i. Then we have
Pg C(c(v)
=
].,
EkEL(v)
(1 - Pg,,r (c(vi) = j)
()
M,
(c0e
~g.,c,
= k))
Proof.
Pg C(c(v) = j)
ZgC(c(v) = j)
ZkEL(v)
ZgC(c(v) =
k)
Zg,,t(c(V ) /- j, 1 < i < M)
m)
XIEL(.) Zg.,e(c(vi) A k, 1 <i
Pg,,i(c( ) =A
, 1< i < m)
kEv)Pgi,,tc(c(vi) =, k, 1 < i < m)
(2.7)
Now for any k E L(v),
Pg,c(c(v) 4 k, 1 < i < m)
Pg,,,c(c(vi)
$
k) fPg,,c(c(vi) = kjc(vj
k, 1 < 1 < k - 1)
i=2
Pg,,, 4 , (c(v )
k).
Substituting this into (2.7) completes the proof of the lemma.
0
Before proceeding to the proof of Theorem 2.3, we first establish upper and lower
bounds on the marginal probabilities associated with vertex v.
Lemma 2.3. Let v E T be such that d(v, &F) > 2. For every j E L(v) and for
40
1 = 1, 2 the following bounds hold.
P(c(v) = jIC) < 1/a.
(2.8)
P(c(v) = jc)
(2.9)
(mae
P(c(v) = jC) > q~1 (1
-
1)
(2.10)
Here q is the total number of possible colors.
Proof. These bounds were proved in [18] with a different constant, i.e. a > a** :::
2.84. Here we prove the bound when Assumption 3.1 holds. In this proof we assume
I = 1. The case 1 = 2 follows by an identical argument. Let C denote the set
of all possible colorings of the children v,... ,Vm of v. Note that for any c E C,
P(c(v) = jic)
I5
1/1. and (2.8) follows.
To prove (2.9) we will show that for every coloring of the neighbors of the neighbors of v, the bound is satisfied. So, first fix a coloring c of the vertices at distance
two from v. Conditioned on this coloring, define for j E L(v) the marginal
tij = Pg.,, (c(vi) = ic).
Note that by (2.8) we have t,,
1/8. Because.
is triangle-free, there are no edges
between the neighbors of v and once we condition on c, we have
PG.,' (c(v) = ilc) = Pg,,C(c(vi) = jc).
41
So we obtain
zt
=
jEL(v)
z
Pg,,(c(vi) =
jic) <
(2.11)
1.
jEL(v)(lL(vj)
From Lemma 2.2 we have
n(1 - tik)
=k]Lv
P(c(v) = j|C1)
(1-
M
EkEL(v)
1
ik)
(2.12)
i=1
kLV)
Using Taylor expansion for log(1 - x), we obtain
(1
-
e
~
2(e
(1
1/p)2 >
i=1
i=1
where 0
I eIog(1-'-k)
tik) =
So
ik < tik.
9
ik
satisfies (1 -
t
)
0, ) 2 >
1/2 by Assumption
3.1. Thus, we obtain
1(1 -
e-(1+1/.)tk
/
-
E
i=1.
i1
Using the fact that arithmetic mean is greater than geometric mean and using (2.11),
we get
[J(1 - tik)
kEL(v) i=1
>
(1- tik)
qv
(kEL(v) i=1
> qexp
-q,~1(l
+ 1/0)E
i=1 kEL(v)
> (am+ #)e-(1+1/h6)Q.>M
> ame-(1+1/,6)Q.
42
ti,k
Combining with (2.12) the proof of (2.9) is now complete.
m 1 (1-tij) > (1-1/l)'" > (1-1/#). Also
(1 -
q, 5 q. So we have
P(c(v) = jIC1 )
(1
= Z
-
ti)
0
For j E L(v), define
Xj = PgX(c(v) = jIC1 ),
=
Pg,.:(c(V) = jjC2 ),
and the vector of marginals
x = (xj :j E L(v)),
y = (y,
j E L(v)).
We define a suitably chosen error function which we will use to establish decay
of correlations and prove Theorem 2.3. This error function E(x, y) is defined as
E(x, y) = max log
jEL(v)
-
yj
min log
j EL(v)
yy
By (2.10) of Lemma 2.3 we have xj, y, > 0 for j E L(v). So the above expression
is well-defined. Let jj E L(v) (j2 E L(v)) achieve the maximum (minimum) in the
above expression. Recall that for given ji, j2, we denote by
4
,j
1 j
the list associated
with graph g, which is obtained from C by removing the color ji from the lists L(vk)
43
k)
)
From (2.8) we have
for k < i and removing the color
32
from the lists L(vk) for k > i. Define for each
<<i < m and j E Ljj~ (vi) the marginals
Yij
12 (c(vi)
=
ilCI),
=
Pg1,, 4 ,
=
Pg.V,, 4J 12 (C(Vi) = jIC2).
and the corresponding vector of marginals
x = (xij : j E Lij,, (vi)),
.
y = (yi3 : j E LjjjJ2(V))
First we prove the following useful fact regarding the terms appearing in the
definition of the error function.
Lemma 2.4. With xj and y defined as before, we have
max log
jEL(v)
min log
Proof. We have
L(v)(Xi - y) =
(
0
( )
0.
jEL(v) X - EjEL(v)
y
= 0.
Since the quantities
xj and y, are non-negative for all j E L(v), there exists indices j, E L(v) and
j2
E L(v) such that xjl
yjl and xj, < y32 . The lemma follows by taking the
logarithm of both sides of the preceding inequalities.
E
We are now ready to prove the following key result which shows that the distance
between the marginals induced by the two different boundary conditions measured
with respect to the metric defined by the error function E(x, y) contracts.
44
Let
e E (0, 1) be such that
1
(1 - (1=
Assumption 3.1 guarantees that such an e exists.
Lemma 2.5. Let mj = Ag,(vi). Then
1
1
-E(x, y)
; (1 - e) max -- E(xi, yi).
m
(2.13)
"
ism>O
The expression on the right hand side of (2.13) is interpreted to be 0 if mi = 0 for
all i.
Proof. If j, = j2, then E(x, y) = 0. Otherwise,
(li)
-
log (.z\10
X 2
/
E(x, y) = log
(o
-
log ( 32
Introduce the following variables:
zij = log (X ),
-
Wij = log (lii)
45
.j
Using the recursion in Lemma 2.1 we have
=[2log
For j
= j1,j2
ez2)
(f-
-o
1 -1-e'i'"2
ew hi2
(
E(x,y) = log
(1 - ezij,) - log(1
-e
log (1
-
ezs2) - log (1
-
ews2)1
let
zgi=
Wy
=
log (1 - e
log (1 - e'i).
Then we can rewrite E(x, y) as
E(x, y) = (zj, -
Wil)
-
(z3 2
-
wj 2 ).
(2.14)
Define the continuous function f : [0, 1] -+ R as
m
f(t)
log(1 - ezei)
=
i=1
1
_-
ezil+(1i--z0)-
i=1
Then we have f(0) = 0 and f(1) = zj - wj. Applying the mean value theorem, there
exists t E (0, 1) such that
z, - w 1 = f(1) - f(0) = f'(t).
46
eziJ+t(wrj%)
ezij+t(wij-z-j) (to1,
wi
Zj
j
2=1
Observe that if j
Ljjjj2(vi), then Pg.,Le ,2(c(U)
=
-'4'
jIC1)
=
PgQL-II2 (c(v) =
jIC2 ) = 0. Hence for j 0 Ljjjj2(vj), we have we = zij. Also if mi = 0 then v is an
isolated vertex in , and in this case we also have wjj = z 1 . Using this fact, we have
zi - Wi
=
_- ezij+t(wqs-%)) (Wij - zij).
Jrw2(v)(VO
E
i:mj>O
From convexity of el and Lemma 2.3 we have
0 < ezs+'(w a-%) < tewsi + (1 - t)ei <
gei11#
Similarly, again using Lemma 2.3 we have
0<
Combining we have for j
0
=
1
1
<
1 - ei+t(-)
-
-/
ji,j2,
1 - ezei+t(uwij-%)
1
-
(1 - 1/j6) mjae-I(1+1/P)
From Lemma 2.4 we have maxkEL,,,I
Zik}
2 (,
){Wik - zi} > 0 and minkEL, 1,,,2(,)Wik
-
r-zij)
ez+t(w'
Computing the expression for f(t), we get
0. Using this
zjI -
wl !5 E
1
i:Mj>0-. M11
max
kE44jjJ2(Vi)
47
{Wik - Zik},
and
min
zj 2 - Wj 2
:mi>O
Mi kELi,',,
{Wik -
Zik}.
2 (v)
By using the above bounds in (2.14) we get
max
EO
kEL,,,j
Mi
min
{Wik - Zjk}
-
E(x, y) <
2 (v)
kiJ1,2
i
{Wm
-
Zik}
Ii
E(x, yi)
i
i:Mi>O
E(x, yi).
(2.15)
The proof of Lemma 2.5 is now complete.
EJ
< m max
i:TN>O
Mi)
We now use Lemma 2.5 to complete the proof of Theorem 2.3. Let i* achieve
the maximum in (2.15), that is, i* = arg maxi
Li = Li, 1 ,
(1
-
2
Eg,, 1 g(x., yi). Let g1 =
, i1 = vi. and (x',yl) = (x,., y.). Lemma 2.5 says that A
E(x, y)
E),l9E(x', y"). Note that the graph list pair (a1, V1) satisfies Assumption
3.1. We can then apply apply Lemma 2.5 to vi, 9', L' to obtain v 2 , g 27 L2 such that
)E(x2
y2)
A
If we let d = d(v, i9l), then applying
1
1)E(x,y)
Lemma 2.5 successively d times we obtain
1
AgMv)
E(x, y)
(1
-
(),d) E(xdyd)
Agd(Vd)
2log
(
(1-
(118)
E)d.
(1
where the second inequality follows from Lemma 2.3. This gives for any j E L(v),
log
max log
-
E(x, y) 5 2A(Iog q - A log(l - 1/,a))(1
48
-
e)d.
Let F = 2A(log q - AIog(1 - 1/#8)). The quantity F depends only on the quantities
defined in Assumption 3.1 and does not depend on the vertex v. Let do be large
enough such that exp(F(1 - e)do) < 1+ 2F(1 - E)D. Then for d > do, we have
.
P(c(v) = jjC1 ) < 1 + 2Fed
P(c(v) = jjC 2 ) ~
where -y =-log(1 - e). For d < do
P(c(v) = jC 2)
Taking B
=
--,d-
-
P(c(v) = JIG1 ) <1 + eF+
max{eF+,do 2F} we get
P(c(v) = jiCI) < 1 + Be-yd.
)
P(c(v) = jIC2
The lower bound on the ratio of probabilities is obtained in a similar fashion. This
completes the proof of Theorem 2.3.
2.5
Conclusion
In this chapter, we proved that the strong spatial mixing for the list coloring problem
holds for a general triangle free graph when for each vertex of the graph, the size
of its list is at least aA(v) +
f
and a > a* ;
1.763 and 8 is a sufficiently large
constant depending only on a. This extends the previous results for strong spatial
mixing of colorings for regular trees [20} and for amenable triangle free graphs [23].
An interesting next venture would be to use this long range independence property
to produce efficient approximation algorithms for counting colorings similar to [18J.
The main obstruction that we face here is that in order to prove contraction of the
49
recursion for a* - 1.763, we need to use bounds on the probabilities mentioned in
Lemma 2.3. This restricts our result to correlation decay with respect to distance in
the graph theoretic sense instead of correlation decay in the computation tree. This
means that while our result can be used to produce an approximation scheme for
computing marginals, it does not directly lead to an FPTAS for counting the number
of valid colorings like in [18].
It would also be interesting to establish this result for smaller a. Here again
Lemma 2.3 which necessitates a > a* proves to be the bottleneck. One possible way
to address this would be to tighten the contraction argument such that a weaker
version of inequality (2.9) suffices. It has been conjectured that a = 1 and 8 = 2
suffices but at the moment we axe quite far from this result. It also remains open
whether strong spatial mixing is necessary or sufficient or both for the existence of
an FPTAS for computing partition functions.
50
Chapter 3
Giant Component in Random
Multipartite Graphs with Given
Degree Sequences
3.1
Introduction
The problem of the existence of a giant component in random graphs was first studied
by Erd6s and R6nyi. In their classical paper [16], they considered a random graph
model on n and m edges where each such possible graph is equally likely. They
showed that if m/n > 1 +c, with high probability as n -+ oo there exists a component
of size linear in n in the random graph and that the size of this component as a
fraction of n converges to a given constant.
The degree distribution of the classical Erdbs-R6nyi random graph has Poisson
tails. However in many applications the degree distribution associated with an underlying graph does not satisfy this. For example, many so-called "scale-free" networks
51
exhibit power law distribution of degrees. This motivated the study of random graphs
generated according to a given degree sequence. The giant component problem on
a random graph generated according to a given degree sequence was considered by
Molloy and Reed [43]. They provided conditions on the degree distribution under
which a giant component exists with high probability. Further in [44], they also
showed that the size of the giant component as a fraction of the number of vertices
converges in probability to a given positive constant. They used an exploration process to analyze the components of vertices of the random graph to prove their results.
Similar results were established by Janson and Luczak in [29] using different techniques based on the convergence of empirical distributions of independent random
variables. There have been several papers that have proved similar results with similar but different assumptions and tighter error bounds [25], [8], [48]. Results for the
critical phase for random graphs with given degree sequences were derived by Kang
and Seierstad in [34]. All of these results consider a random graph on n vertices with
a given degree sequence where the distribution is uniform among all feasible graphs
with the given degree sequence. The degree sequence is then assumed to converge
to a probability distribution and the results provide conditions on this probability
distribution for which a giant component exists with high probability.
In this chapter, we consider random multipartite graphs with p parts with given
degree distributions. Here p is a fixed positive integer. Each vertex is associated
with a degree vector d, where each of its component di, i E [p] dictates the number
of neighbors of the vertex in the corresponding part i of the graph. As in previous
papers, we assume that the empirical distribution associated with the number of
vertices of degree d converges to a probability distribution. We then pose the problem
of finding conditions under which there exists a giant component in the random
graph with high probability. Our approach is based on the analysis of the Molloy
52
and Reed exploration process. The major bottleneck is that the exploration process
is a multidimensional process and the techniques of Molloy and Reed of directly
underestimating the exploration process by a one dimensional random walk does
not apply to our case. In order to overcome this difficultly, we construct a linear
Lyapunov function based on the Perron-Frobenius theorem, a technique often used
in the study of multidimensional branching processes.
Then we carefully couple
the exploration process with some underestimating process to prove our results The
coupling construction is also more involved due to the multidimensionality of the
process. This is because in contrast to the unipartite case, there are multiple types
of clones (or half-edges) involved in the exploration process, corresponding,to which
pair of parts of the multipartite graph they belong to. At every step of the exploration
process, revealing the neighbor of such a clone leads to the addition of clones of several
types to the component being currently explored. The particular numbers and types
of these newly added clones is also dependent on the kind of clone whose neighbor
was revealed. So, the underestimating process needs to be constructed in a way such
that it simultaneously underestimates the exploration process for each possible type
of clone involved. We do this by choosing the parameters of the underestimating
process such that for each type of clone, the vector of additional clones which are
added by revealing its neighbor is always component wise smaller than the same
vector for the exploration process.
All results regarding giant components typically use a configuration model corresponding to the given degree distribution by splitting vertices in'to clones and performing a uniform matching of the clones. In the standard unipartite case, at every
step of the exploration process all available clones can be treated same. However
in the multipartite case, this is not the case. For example, the neighbor of a vertex
in part 1 of the graph with degree d can lie in part j only if d > 0. Further, this
53
neighbor must also have a degree d such that di > 0. This poses the issue of the
graph breaking down into parts with some of the p parts of the graph getting disconnected from the others. To get past this we make a certain irreducibility assumption
which we will carefully state later. This assumption not only addresses the above
problem, but also enables us to construct linear Lyapunov functions by using the
Perron-Frobenius theorem for irreducible non-negative matrices. We also prove that
with the irreducibility assumption, the giant component when it exists is unique and
has linearly many vertices in each of the p parts of the graph. In
[8],
Bollobas and
Riordan show that the existence and the size of the giant component in the unipartite
case is closely associated with an edge-biased branching process. In this chapter, we
also construct an analogous edge-biased branching process which is now a multi-type
branching process, and prove similar results.
Several real world networks naturally demonstrate a multipartite nature. The
author-paper network, actor-movie network, the network of company ownership, the
financial contagion model, heterogenous social networks, etc. are all multipartite
[46], [91, [27]. Examples of biological networks which exhibit multipartite structure include drug target networks, protein-protein interaction networks and human
disease networks [221, [56], [45]. In many cases evidence suggests that explicitly
modeling the multipartite structure results in more accurate models and predictions.
Random bipartite graphs (p = 2) with given degree distributions were considered
by Newmann et. al in [471. They used generating function heuristics to identify the
critical point in the bipartite case. However, they did not provide rigorous proofs of
the result. Our result establishes a rigorous proof of this result and we show that in
the special case p = 2, the conditions we derive is equivalent to theirs.
The rest of the chapter is structured as follows. In Section 3.2, we start by introducing the basic definitions and the notion of a degree distribution for multipartite
54
graphs. In Section 3.3, we formally state our main results. Section 3.4 is devoted
to the description of the configuration model. In Section 3.5, we describe the exploration process of Molloy and Reed and the associated distributions that govern the
evolution of this process. In Section 3.6 and Section 3.7, we prove our main results
for the supercritical case, namely when a giant component exists with high probability. In Section 3.8 we prove a sublinear upper bound on the size of the largest
component in the subcritical case. We then conclude in Section 3.9 with some future
directions.
3.2
Definitions and preliminary concepts
We consider a finite simple undirected graph g = (V, C) where V is the set of vertices
path between two vertices v, and v 2 in V is a collection of vertices vI
v 2 in V such that for each i
=
= Ul, U2,
- -- ,
-
and E is the set of edges. We use the words "vertices" and "nodes" interchangeably. A
1,2,. .. ,l - 1 we have (u,u +) EL. A component, or
more specifically a connected component of a graph 9 is a subgraph C C
Q such that
there is a path between any two vertices in C. A family of random graphs {9} on n
vertices is said to have a giant component if there exists a positive constant e > 0 such
that P(There exists a component C C g for which L ;
e) -+1. Subsequently,
when a property holds with probability converging to one as n -+ oo, we will say
that the property hold with high probability or w.h.p. for short.
For any integer p, we use [p] to denote the set {1, 2, ... ,p}.
M E R",
we denote by
liMit
= maxj,,
IMji,, the
For any matrix
largest element of the matrix M
in absolute value. It is easy to check that | j| is a valid matrix norm. We use 6,, to
55
denote the Kronecker delta function defined by
1 if i = j,
0, otherwise.
We denote by 1 the all ones vector whose dimension will be clear from context.
The notion of an asymptotic degree distribution was introduced by Molloy and
Reed [431. In the standard unipartite case, a degree distribution dictates the fraction
of vertices of a given degree. In this section we introduce an analogous notion of
an asymptotic degree distribution for random multipartite graphs. We consider a
random multipartite graph
g on n vertices with p parts denoted
by G,..., G,. For
any i E [pl a vertex v E Gi is associated with a "type" d E ZPi which we call the
"type" of v. This means for each i = 1, 2,..., p, the node with type d has d(i) Ad
neighbors in G. A degree distribution describes the fraction of vertices of type d in
Gi, i E [p]. We now define an asymptotic degree distributionas a sequence of degree
distributions which prescribe the number of vertices of type d in a multipartite graph
on n vertices. For a fixed n, let D(n) =
(n4(n),
iE
7p,
d E {0, 1,... , n}P), where
nO(n) denotes the number of vertices in Gi of type d. Associated with each D(n) is a
probability distribution p(n)
=
(%!) ,
i E [p], d E {0, 1, ... , n}P which denotes the
fraction of vertices of each type in each part. Accordingly, we write pO(n)
=
For any vector degree d the quantity i'd is simply the total degree of the vertex.
We define the quantity
w(n) A max{1'd : nr4(n) > 0 for some i E [p]},
(3.1)
which is the maximum degree associated with the degree distribution D(n). To prove
56
our main results, we need additional assumptions on the degree sequence.
Assumption 3.1. The degree sequence {V(n)}nE
satisfies the following conditions:
(a) For each n E N there exists a simple graph with the degree distribution prescribed by D(n), i.e., the degree sequence is a feasible degree sequence.
(b) There exists a probability distribution p = (p', i E [p], d E Z) such that the
sequence of probability distributions p(n) associated with D(n) converges to
the distribution p.
(c) For each i E [p], Zd l'dpi(n) -+
l'dp.
(d) For each i,j E [p} such that A' A Ed dypd = 0, the corresponding quantity
A (n) A Ed dyp?(n) = 0 for all n.
(e) The second moment of the degree distribution given by Zd(1'd)2p? exists (is
finite) and Ed(1'd)2pO(n)
-+
E(1'd)2d
Note that the quantity Ed 1'dp? (n) in condition (c) is simply
. So
this condition implies that the total number of edges is O(n) , i.e., the graph is
sparse. In condition (e) the quantity Zd(1'd)2p?(n) is same as
.
so
this condition says that sum of the squares of the degrees is O(n). It follows from
condition (c) that Aj < oo and that A (n) -+ Aj. The quantity A? is asymptotically
the fraction of outgoing edges from G, to G,. For p to be a valid degree distribution
of a multipartite graph, we must have for each 1 <i < j < p, A= Aj and for every
n, we must have Aj(n) = A(n). We have not included this in the above conditions
because it follows from condition (a). Condition (d) excludes the case where there
are sublinear number of edges between G, and G,.
57
There is an alternative way to represent some parts of Assumption 3.1. For any
probability distribution p on ZPi, let D. denote the random variable distributed as
p. Then (b), (c) and (e) are equivalent to the following.
(b') DP()
-
D, in distribution.
(c') E[1'Dp(n)] -+ E[1'D,].
2
(e') E[(1'D,(n)) 2 ] -+ E[(1'Dp) I.
The following preliminary lemmas follow immediately.
Lemma 3.1. The conditions (b'), (c') and (e') together imply that the random variables {1'Dp(n) }EI and
{ (1'Dp(n) )2}
are uniformly integrable.
Then using Lemma 3.1, we prove the following statement.
Lemma 3.2. The maximum degree satisfies w(n) = o(Vri).
Proof. For any e > 0, by Lemma 3.1, there exists q E Z such that E[(1'Dp(n))21{1ID>q}I
E. Observe that for large enough n, we have max{j
n), Z}
E[(1'D(n) )2 11'Dq}]
0
c. Since c is arbitrary, the proof is complete.
Let S = {(i, j) I Al > 0} and let N
IjSj. For each i E [p], let S, A {j E
[p] I (ij) E S}.
Note that by condition (a), the set of feasible graphs with the degree distribution
is non-empty. The random multipartite graph g we consider in this chapter is drawn
uniformly at random among all simple graphs with degree distribution given by D(n).
The asymptotic behavior of D(n) is captured by the quantities p?. The existence of
a giant component in g as n -+ oo is determined by the distribution p.
58
<
3.3
Statements of the main results
The neighborhood of a vertex in a random graph with given degree distribution resembles closely a special branching process associated with that degree distribution
called the edge-biased branching process. A detailed discussion of this phenomenon
and results with strong guarantees for the giant component problem in random unipartite graphs can be found in [81 and [48]. The edge biased branching process is
defined via the edge biased degree distribution that is associated with the given degree distribution. Intuitively the edge-biased degree distribution can be thought of
as the degree distribution of vertices reached at the end point of an edge. Its importance will become clear when we will describe the exploration process in the sections
that follow. We say that an edge is of type (i, j) if it connects a vertex in Gi with a
vertex in Gj. Then, as we will see, the type of the vertex in Gj reached by following
a random edge of type (i, j) is d with probability
"s'.
We now introduce the edge-biased branchingprocess which we denote by T. Here
T is a multidimensional branching process. The vertices of T except the root are
associated with types (ij)
E S. So other than the root, T has N < p2 types of
vertices. The root is assumed to be of a special type which will become clear from
the description below. The process starts off with a root vertex v. With probability
pi, the root v gives rise to dj children of type (i, j) for each j E [p]. To describe the
subsequent levels of T let us consider any vertex with type (i,j). With probability
this vertex gives rise to (d4- Smi) children of type (j, m) for each m E
[p].
The
number of children generated by the vertices of T is independent for all vertices. For
each n, we define an edge-biased branching process T- which we define in the same
way as T by using the distribution D(n) instead of V. We will also use the notations
T(v) and T(v) whenever the type of the root node v is specified.
59
We denote the expected number of children of type (j, m) generated by a vertex
of type (i, j) by
/ijjm A Z(dm
-
dtp
(3.2)
Aij
d
It is easy to see that IAijjm > 0. Assumption 1(e) guarantees that piLjjm is finite.
Note that a vertex of type (i, j) cannot have children of type (1, m) if j =, 1. But
for convenience we also introduce
tsjgn
= 0 when j 0 1. By means of a remark we
should note that it is also possible to conduct the analysis when we allow the second
moments to be infinite (see for example [43], [81), but for simplicity, we do not pursue
this route in this chapter.
Introduce a matrix M E RN defined as follows. Index the rows and columns
of the matrix with double indices (i, j) E S. There are N such pairs denoting the
N rows and columns of M. The entry of M corresponding to row index (i, j) and
column index (1, m) is set to be ijim.
Definition 3.1. Let A E RNxN be a matrix. Define a graph 'H on N nodes where
for each pair of nodes i and j, the directed edge (ij) exists if and only if A 3 > 0.
Then the matrix A is said to be irreducible if the graph W is strongly connected,
i.e., there exists a directed path in W between any two nodes in X.
We now state the well known Perron-Frobenius Theorem for non-negative irreducible matrices. This theorem has extensive applications ii the study of multidimensional branching processes (see for example [38]).
Theorem 3.1 (Perron-Frobenius Theorem). Let A be a non-negative irreducible
matrix. Then
60
(a). A has a positive eigenvalue y > 0 such that any other eigenvalue of A is strictly
smaller than - in absolute value.
(b). There exists a left eigenvectorx of A that is unique up to scalarmultiplication
associated with the eigenvalue y such that all entries of x are positive.
We introduce the following additional assumption before we state our main results.
Assumption 3.2. The degree sequence {V(n)}En
satisfies the following conditions.
(a). The matrix M associated with the degree distribution p is irreducible.
(b). For each i E [p, Si5
0.
Assumption 3.2 eliminates several degenerate cases. For example consider a degree distribution with p = 4, i.e., a 4-partite random graph. Suppose for i = 1,2,
we have p1 is non-zero only when d3 = d4 = 0, and for i = 3,4, pd is non-zero only
when d, = d2 = 0. In essence this distribution is associated with a random graph
which is simply the union of two disjoint bipartite graphs. In particular such a graph
may contain more than one giant component. However this is ruled out under our
assumption. Further, our assumption allows us to show that the giant component
has linearly many vertices in each of the p parts of the multipartite graph.
Let
00
A 1-
(T =i) = P(1T1 = 00)
(.3
i1
Namely, - is the survival probability of the branching process T. We now state our
main results.
61
Theorem 3.2. Suppose that the Perron robenius eigenvalue of M satisfies y > 1.
Then the following statements hold.
(a) The random graph G has a giant component C C G w.h.p. Further, the size of
this component C satisfies
lim P
E < I< < 7 + 6=,
(3.4)
for any E > 0.
(b) All components of g other than C are of size O(log n) w.h.p.
Theorem 3.3. Suppose that the Perron Frobenius eigenvalue of M satisfies -y < 1.
Then all components of the random graph g are of size O(w(n)2 log n) w.h.p.
The conditions of Theorem 3.2 where a giant component exists is generally referred to in the literature as the supercritical case and that of Theorem 3.3 marked
by the absence of a giant component is referred to as the subcritical case. The conditions under which giant component exists in random bipartite graphs was derived
in [47] using generating function heuristics. We now consider the special case of a
bipartite graph and show that the conditions implied by Theorem 3.2 and Theorem
3.3 reduce to that in [47]. In this case p = 2 and N = 2. The type of all vertices d in
G1 are of the form d = (0, j) and those in G2 are of the form d = (k, 0). To match
the notation in [47], we let pd= p when d = (0, j) and pd = qk when d = (k,0). So
' kqk. Using the definition of p221
= Ai=
from equationd2pd =
g jpA =
(3.2), we get
/-11221 =
Z(di
- J1)d
A2
62
=
E
k(k 1)qk
A1
-
Similarly we can compute p2112
- From the definition of M,
M =A1221
The Perron-Frobenius norm of M is its spectral radius and is given by
(/p121)(/p2112).
So the condition for the existence of a giant component according to Theorem 3.2 is
given by
(p'l2i)(A2112) -
1> 0
which after some algebra reduces to
Ejk(jk- j -k)p,k > 0.
j~k
This is identical to the condition mentioned in [471. The rest of the chapter is devoted
to the proof of Theorem 3.2 and Theorem 3.3.
3.4
Configuration Model
The configuration model [54], [7], [4} is a convenient tool to study random graphs
with given degree distributions. It provides a method to generate a multigraph from
the given degree distribution. When conditioned on the event that the graph is
simple, the resulting distribution is uniform among all simple graphs with the given
degree distribution. We describe below the way to generate a configuration model
from a given multipartite degree distribution.
1. For each of the n?(n) vertices in G of type d introduce d, clones of type (i, j).
An ordered pair (i,j) associated with a clone designates that the clones belongs
to Gf and has a neighbor in G. From the discussion following Assumption 3.1,
the number of clones of type (i,j) is same as the number of clones of type (j,i).
63
2. For each pair (i, j), perform a uniform random matching of the clones of type
(i, j) with the clones of type (j, i).
3. Collapse all the clones associated with a certain vertex back into a single vertex.
This means all the edges attached with the clones of a vertex are now considered
to be attached with the vertex itself.
The following useful lemma allows us to transfer results related to the configuration model to uniformly drawn simple random graphs.
Lemma 3.3. If the degree sequence {D(n)}nE
satisfies Assumption 3.1, then the
probability that the configuration model results in a simple graph is bounded away
from zero as n -+ oo.
As a consequence of the above lemma, any statement that holds with high probability for the random configuration model is also true with high probability for the
simple random graph modeL So we only need to prove Theorem 3.2 and Theorem
3.3 for the configuration model.
The proof of Lemma 3.3 can be obtained easily by using a similar result on
directed random graphs proved in [12]. The specifics of the proof follow.
Proof of Lemma 3.3. In the configuration model for multipartite graphs that we described, we can classify all clones into two categories. First, the clones of the kind,
(i, i) E S and the clones of the kind (i, j) E S, i $ j. Since the outcome of the
matching associated with each of the cases is independent, we can treat them separately for this proof. For the first category, the problem is equivalent to the case
of configuration model for standard unipartite graphs. More precisely, for a fixed i,
we can construct a standard degree distribution D(n) from V(n) by taking the ith
64
component of the corresponding vector degrees of the latter. By using Assumptions
3.1, our proof then follows from previous results for unipartite case.
For the second category, first let us fix (i,j) with i : j. Construct a degree
distribution D1 (n) = (nlk(n), k E [n]) where nk(n) denotes the number of vertices of
degree k by letting nk(n) = Ed 1{d(j) = kni. Construct D2 (n) similar to 1) 1(n)
by interchanging i and j. We consider a bipartite graph where degree distribution
of the vertices in part i is given by Di(n) for i = 1, 2. We form the corresponding
configuration model and perform the usual uniform matching between the clones
generated from Vi(n) with the clones generated from D2 (n). This exactly mimics
the outcome of matching that occurs in our original multipartite configuration model
between clones of type (i, j) and (j, i). With this formulation, the problem of controlling number of double edges is very closely related to a similar problem concerning
the configuration model for directed random graphs which was studied in [12]. To
precisely match their setting, add "dummy" vertices with zero degree to both A (n)
and D2 (n) so that they have exactly n vertices each and then arbitrarily enumerate
the vertices in each with indices from [n]. From Assumption 3.1 it can be easily verified that the degree distributions VI(n) and D2 (n) satisfy Condition 4.2 in [12]. To
switch between our notation and theirs, use D1 (n) -+ M"] and D2 (n) -+ DIn]. Then
Theorem 4.3 in [12] says that the probability of having no self loops and double edges
is bounded away from zero. In particular, observing that self loops are irrelevant in
our case, we conclude that lim.,. 0 P(No double edges) > 0. Since the number of
pairs (i, j) is less than or equal to p(p - 1) which is a constant with respect to n, the
proof is now complete.
[
Exploration Process
3.5
In this section we describe the exploration process which was introduced by Molioy and Reed in [43] to reveal the component associated with a given vertex in the
random graph. We say a clone is of type (i, j) if it belongs to a vertex in Gi and
has its neighbor in G3 . We say a vertex is of type (i, d) if it belongs to Gi and has
degree type d. We start at time k = 0. At any point in time k in the exploration
process, there are three kinds of clones - 'sleeping' clones , 'active' clones and 'dead'
clones.
For each (i, j) E S, the number of active clones of type (i, j) at time k
are denoted by A;(k) and the total number of active clones at time k is given by
A(k) =
Z(ij)ES
Aj (k). Two clones are said to be "siblings" if they belong to the
same vertex. The set of sleeping and awake clones are collectively called 'living'
clones. We denote by Li(k) the number of living clones in Gi and Li(k) to be the
number of living clones of type (i, j) at time k. It follows that E)j L (k) = L (k).
If all clones of a vertex are sleeping then the vertex is said to be a sleeping vertex, if
all its clones are dead, then the vertex is considered dead, otherwise it is considered
to be active. At the beginning of the exploration process all clones (vertices) are
sleeping. We denote the number of sleeping vertices in G of type d at time k by
N5(k) and let Ns(k) = Ei,d Nid(k). Thus Nid(0) = nO(n) and Ns(O) = n. We now
describe the exploration process used to reveal the components of the configuration
model.
Exploration Process.
1. Initialization: Pick a vertex uniformly at random from the set of all sleeping
vertices and and set the status of all its clones to active.
66
2. Repeat the following two steps as long as there are active clones:
(a). Pick a clone uniformly at random from the set of active clones and kill it.
(b). Reveal the neighbor of the clone by picking uniformly at random one of
its candidate neighbors. Kill the neighboring clone and make its siblings
active.
3. If there are alive clones left, restart the process by picking an alive clone uniformly at random and setting all its siblings to active, and go back to step 2.
If there are no alive clones, the exploration process is complete.
Note that in step 2(b), the candidate neighbors of a clones of type (i, j) are the
set of alive clones of type (j, i).
The exploration process enables us to conveniently track the evolution in time
of the number of active clones of various types. We denote the change in A (k) by
writing
-A? (k + 1) = A '(k) + Z-(k + 1),
(i, j) E S.
Define Z(k) A (Zj(k), (i, j) E S) to be the vector of changes in the number of active
clones of all types. To describe the probability distribution of the changes Z (k +1),
we consider the following two cases.
Case 1: A(k) > 0.
Let E denote the event that in step 2-(a) of the exploration process, the active
clone picked was of type (i,j). The probability of this event is
A
. () that
case we kill the clone that we chose and the number of active clones of type
67
(i, j) reduces by one. Then we proceed to reveal its neighbor which of type
(j,i). One of the following events happen:
(i). E.: the neighbor revealed is an active clone. The probability of the joint
event is given by
P(E ln E.)
A(k) LI(k)
A (k)(k)-1
A(k) Li(k)-1
if i
Such an edge is referred to as a back-edge in [43]. The change in active
clones of different types in this joint event is as follows.
-
Ifi
Z (k +1) = Zz(k +1) = -1,
-
otherwise
.
Zrm(k + 1) = 0,
Ifij
Z (k +1) = -2
otherwise
.
Z7"(k +1) =0,
(ii). Ed: The neighbor revealed is a sleeping clone of type d. The probability
of this joint event is given by
A?(k) diN'(k)
P(E nch) = n
A (k) L3 (k) - J
The sleeping vertex to which the neighbor clone belongs is now active.
68
The change in the number of active clones of different types is governed
by the type d of this new active vertex. The change in active clones of
different types in this event are as follows.
- Ifi=j,
Z (k +1) = -1,
Z,"(k +1) = dm - 6im,
Zj"(k +1) = 0,
If i
-
otherwise.
j,
Zs (k +1)=
-2 +d,
Z"(k +1)= d,
for m 0 i,
Zi (k + 1)= 0, otherwise.
Note that the above events are exhaustive, i.e.,
P(EnE )+ Z P(Ej Ea)
1.
sjES
ijES d
Case 2: A(k) = 0.
In this case, we choose a sleeping clone at random and make it and all its
siblings active. Let E, be the event that the sleeping clone chosen was of type
(i, j). Further let Ed be the event that this clone belongs to a vertex of type
69
(i, d). Then we have
P(E n E
= L3 (k) dN(k)
_ d Nid(k)
L{(k)
L(k)
L(k)
In this case the change in the number of active clones of different types is given
by
Zi"(k +1) = dm, for m E Sj,
Z "'(k +1) = 0, otherwise.
We emphasize here that there are two ways in which the evolution of the exploration
process deviates from that of the edge-biased branching process. First, a back-edge
can occur in the exploration process when neighbor of an active clone is revealed to
be another active clone. Second, the degree distribution of the exploration process is
time dependent. However, close to the beginning of the process, these two events do
not have a significant impact. We exploit this fact in the following sections to prove
Theorem 3.2 and 3.3.
3.6
Supercritical Case
In this section we prove the first part of Theorem 3.2. To do this we show that the
number of active clones in the exploration process grows to a linear size with high
probability. Using this fact, we then prove the existence of a giant component. The
idea behind the proof is as follows. We start the exploration process described in the
previous section at an arbitrary vertex v E
g.
At the beginning of the exploration
process, i.e. at k = 0 , we have N'(0) = np'(n) and Lj(0) = nAj(n). So, close to
70
the beginning of the exploration, a clone of type (i, j) gives rise to d,, - 5,,, clones
of type (j,m) with probability close to
A,;(n)
which in turn is close to
A',
for large
enough n. If we consider the exploration process in a very small linear time scale, i.e.
for k < en for small enough e, then the quantities
the quantities
dt 1d1(k)
diP?
remain close to a and
are negligible. We use this observation to construct a process
which underestimates the exploration process in some appropriate sense but whose
parameters are time invariant and "close" to the initial degree distribution. We then
use this somewhat easier to analyze process to prove our result.
We now get into the specific details of the proof. We define a stochastic process
B-(k) which we will couple with A (k) such that B (k) underestimates A (k) with
probability one. We denote the evolution in time of B, (k) by
B (k +1)
To define
Zj(k +
=
B (k)+ Z(k +1),
(ij) E S.
1), we choose quantities 7rd satisfying
<,
i=1 -
> 0,
(3.5)
,(3.6)
d
for some 0 < -y < 1 to be chosen later.
We now show that in a small time frame, the parameters associated with the
exploration process do not change significantly from their initial values. This is
made precise in Lemma 3.4 and Lemma 3.5 below. Before that we first introduce
some useful notation to describe these parameters for a given n and at a given step k
in the exploration process. Let M(n) denote the matrix of means defined analogous
71
I
to M by replacing
by
. Also for a fixed n, define Mk(n) similarly by
djNO(k)
__4
replacing !,
. Note that Mo(n) = M(n). Also from Assumption 3.1 it
by
follows that
and that M(n) -+ M.
-d
M, (n)
Al
Lemma 3.4. Given 6 > 0, there exists c > 0 and some integernh such that for all n >
and for all time steps k < en in the explorationprocess we have L.d
L-j (k) - bi
'1<
Ai|
'5.
Proof. Fix el > 0. From Lemma 3.1 we have that that random variables 1'D(n)
are uniformly integrable.
N( N )
p1(n), we have
Id
Then there exists q E Z such that for all n we have
dtpjn)1{1'd>q} <e1 . Since 0
d p1(n) - d
Ed {1'd>q}
ration process we have
j
k)
.< For each time step k < en in the exploc.
(-
Ed 1{1'd1q} d Nk) - dipj(n)I <
- <q . Now we can boundwe have
Additionally, L (k) can change by at most two
n
L{'dk)q}
111>q}
d
1.
So for small enough e, we can make
~ AM(n) I < 2e. So for small enough e, for every (i, j) E S
at each step. So
(
-
L'().
A()
d N (k)
d1p2(n)
L' (k) - 6ij
Ail (n)
d
}
(3.7)
(f1q
di Nd(k)
Lj(k) - 6j
d Nd (k)
nA (n)
dpl(n)
di N (k)
nA (n)
Aj(n)
di jN'(k)
d
n
A,(n).
S6/4,
where the last inequality can be obtained by choosing small enough e 1 . Since q is a
constant, by choosing
small enough e we can ensure that Ed 111,dq}
constantLj(k)-6,y
72
I
(k)-6
dA n~n)
(n)
<(k)
d
,
-
6/4. Additionally from Assumption 3.1, for large enough n we have
6/2. The lemma follows by combining the above inequalities.
0
Lemma 3.5. Given 6 > 0, there exists e > 0 and some integer n^ such that for all n >
n and for all time steps k < en in the explorationprocess we have I IMk(n) - MI I
6.
Proof. The argument is very similar to the proof of Lemma 3.4. Fix el > 0. From
Lemma 3.1 we know that the random variables (1'Dp(n)) 2 are uniformly integrable. It
follows that there exists q E Z such that for all n, we have E[(1'D(n))1{(LjV(n))>q1 <
el. From this we can conclude that for all i, j, m we have E>(d.El. Since
6 m)dpd'(n)1{1'd>q)
(k)C < pd (n), we have
n(
(dm - 6im)dip'(n)11'd>q} d
Z(d
d
-
Jim)dsN(n) 1{1'd>q}I
E.
(3.8)
Also L (k) can change by at most 2en. So, for small enough e, by an argument
similar to the proof of Lemma 3.4, we can prove analogous to (3.7) that
- 6im)
d
-
)
V
-
Z 1{1d>ql}(dm d
m
<-.
-
ti(n)
(3.9)
(
Z 1{1y'd>q(d
By choosing E small enough, we can also ensure
o.)(3.10)
d
1(1
d
. -
djIN'(k)
)L (k) - 6m
d pd(n)
6
<'()
Since M(n) converges to M we can choose n such that IM(n) - M1
combining the last two inequalities, the proof is complete.
. By
0
Lemma 3.6. Given any 0 < ^I < 1, there exists E > 0, an integer n E Z and
quantities j satisfying (3.5) and (3.6) and the following conditionsfor all n > nh:
73
(a) For each time step k < en,
d <
d NO(k)
/(3.11)
Lt (k) - Jj
for each (i, j) E S.
by 7r
(b) The matrix M defined analogous to M by replacing
IIM - MI I
in (3.2) satisfies
(3.12)
err(y),
where err(y) is a term that satisfies lim,_ 0 err(y) = 0.
Proof. Choose q = q(-y) E Z such that
7/2. Now choose ir4
'L{1'd>q}
Zd
satisfying (3.5) and (3.6) such that 7rd = 0 whenever 1'd > q. Using Lemma 3.4, we
can now choose n and e such that for every (i, j) E S and d such that 1'd
q, (3.11)
is satisfied for all n > n and all k < en. The condition in part (a) is thus satisfied
by this choice of
4.
For any 7, let us denote the choice of 7rd made above by irj('y). By construction,
whenever Mijim =0, we also have Mijim = 0. Suppose Mijim = Ed(dm - 5im)
<
0. Also, by construction we have 0 < ir (Q) <
dt
and that
I
-+
as
drj()
>
7 -+0.
Let X, be the random variable that takes the value (dn - 3im) with probability
7rj(y) and 0 with probability 7. Similarly, let X be the random variable that takes
the value (din - Ji.) with probability
.
Then, from the above argument have
Xy -+ X as - -+ 0 and that the random variable X dominates the random variable
X, for all -y > 0. Note that X is integrable. The proof of part (b) is now complete
by using the Dominated Convergence Theorem.
0
74
Assume that the quantities Eand ig have been chosen to satisfy the inequalities
(3.11) and (3.12). We now consider each of the events that can occur at each step of
the exploration process until time en and describe the coupling between Z (k + 1)
and
Z(k +1)
in each case.
Case 1: A(k) > 0.
Suppose the event E' happens. We describe the coupling in case of each of the
following two events.
(i). E.: the neighbor revealed is an active clone. In this case we simply mimic
the evolution of the number of active clones in the original exploration
process. Namely, ZI(k + 1) = Zr(k + 1) for all 1, m.
(ii). Ed: The neighbor revealed is a sleeping clone of type d. In this case, we
split the event further into two events E 0 and Ef, that is Eg'0 UE?1 = E
and Edo nE 1 = .In particular,
(k) - 8ij
d) =irfgi(L.
(Ead
P(E PdIE3nEcI)=
ds Njd(k)
P(E 1 E nE) = 1 - P(E 01E
For the above to make sense we must have rji
l
E?).
which is guar-
anteed by our choice of ir . We describe the evolution of B (k) in each of
the two cases.
(a). Edo: in this case set Z7(k + 1) = Zm(k +1)
for all 1, m.
(b). Ed,: In this case, we mimic the evolution of the active clones of event
E. instead of Ed. More specifically,
75
- If i = j,
2Z (k +1)= Zjk+)
Z "(k +1)
-
0,
-17
otherwise.
If ij,
Z (k+1)
-2,
Zr"(k +1) =0,
otherwise.
Case 2: A(k) =0.
Suppose that event E' nEd happens. In this case we split Ed into two disjoint
events Ed and Ed such that
P (Eod IEi
irg (L4(k)
nEd) = 7ij(.,()-bj
--
6i,)
dj Nd (k)
1 - P(Ef|E flEd).
P(E'E jnEd)
Again, the probabilities above are guaranteed to be less than one for time
k < En because of the choice of r. The change in B (k + 1) in case of each of
the above events is defined as follows.
(a) Ed.
76
S(k+ 1) = -1,
Z"(k+1) =dm -im
Zm(k+1)=0,
-
for lj.
If i
Z (k +1) = -2+ ds,
Zi"(k + 1) = d,,, for m #i,
Z"(k+1)=0, for I # i.
(b) El.
-
ifi~j
Z(k + 1)
=Z %(k+ 1) =-
Zm7(k +1)= 0,
otherwise.
- If i
Z(k + 1) = -2,
Zj(k +1) = 0,
otherwise.
This completes the description of the probability distribution of the joint evolution
of the processes A (k) and B3 (k).
Intuitively, we are trying to decrease the probability of the cases that actually
77
help in the growth of the component and compensate by increasing the probability
of the event which hampers the growth of the component (back-edges). From the
description of the the coupling between Z'(k +1)
and
for time k < en, with probability one we have B (k)
Z(k +1)
it can be seen that
A(k).
Our next goal is to show that for some (i, j) E S the quantity Bi (k) grows to
a linear size by time en. Let H(k) = o-({A (r),B (r),
(ij)
E S, 1 < r < k})
denote the filtration of the joint exploration process till time k. Then the expected
conditional change in Bj(k) can be computed by considering the two cases above.
First suppose that at time step k we have A(k) > 0, i.e., we are in Case 1. We first
assume that i
#
j. Note that the only events that affect
Z' (k +
1) are El and Em
E[Z(k + 1)1H(k)] = P(Eii|H(k)) E[Zji(k + 1)|H(k), E }
(
for m E [p]. Then,
3.13)
+ E P(Em n EaIH(k)) E[Z4(k + 1)IH(k), Em n Ea]
P(Em n EjojH(k)) E[Zj(k + 1)jH(k), Em n Ed']
+
m,d
P(E, n EdIH(k)) E[Zj(k + 1)IH(k), Em nEi}.
+
m,d
The event En flEa affects
-1.
Z
(k +1) only when m = j, and in this case, 4j(k + 1) =
The same is true for the event Ed flEd. In the event Et f Ef, we have
78
Z4(k +1)
= dj - S(jm. Using this, the above expression is
7rk)
A
(k) A j(k)(-1) +
dj
(-1) + AA(k)
A(k)im~
L(k)
rnd
A
-A(k)
+E A(k)
-
8jmn)
dL((k)-d
d
+r A.2 (k)
d A(k)
dj Nid (k)
+E
d
Ai (k)
A(k)
-A()
7r) (-1).
-
(k)
+ A(k)
A(k)
( Lil (k)
dzir) (-1)
7 (dj
-
+m1
djNid(k)
L (k)
d m(dj - 6jm))
dA(k)
)
A
4s (k)
L ( )+
(-1)+
+ A
(-)
A(k)
+E"%Amk
m
Ak
M(d-
6m)),
where the last equality follows from (3.6). Now suppose that at time k we have
A(k) = 0, i.e., we are in Case 2. In this case, we can similarly compute
E[Z (k + 1)IH(k)] = P(E|IH(k)) E[Zj(k + 1)IH(k), E ]
+ E P(Ei' flEd f EoIH(k)) E[Zj(k + 1)IH(k), Eim fEd n Eo]
md
+ ZP(Ei nEd n EljH(k)) E[Zj(k + 1)fH(k), Em n Ed n E'].
md
79
Using the description of the coupling in Case 2, the above expression is
= L.(k)
L(k)
L.(k)
L(k)
+E
L(k) Zrd
(d, -6jm)
M
d
+ L(k)
Y +E
+
(
7rjiL'(k)
djNid(k)
z Lj(k)
-Il(k) 1dIN' (k)
L(k)
d
L(k)
E-m
m
(dj - Jjm).
d
For the case i = j, a similar computation will reveal that we obtain very similar expressions to the case i &j. We give the expressions below and omit the computation.
For Case 1, A(k) > 0,
As'(k)
A(k) (-
1At(k)
A(k)
+E
A*m(k)
A(k)
(zid
-
E[Zi(k + 1)IH(k)] =
-
6iim)).
and for Case 2, A(k) = 0,
E[Zl(k + 1)IH(k)
=
L(k) -1) + L
(-) +
Erd,
(d, - 6im)
Define the vector of expected change E[Z(k+1) |H(k)] A (E[Z (k + 1) IH(k)], (ij) E S).
Also define A(k) = (i,
(ij) E S if A(k) > 0 and A(k)
(,)
ifA) 0ank)k
L~k)'7 (i'j E S) if
A(k) =0. Let Q E RNXN be given by
Qjjj = 1, for (i, j) E S,
.
Qij m = 0, otherwise
Then we can write the expected change of Bj(k) compactly as
E[Z(k + 1)|H(k)} = (M - 7Q - I) A(k).
80
(3.14)
Fix 5 > 0. Let ^ be small enough such that the function err(-y) in (3.12) satisfies
err(-y) < 6. Using Lemma 3.6 we can choose Eand 7rd satisfying (3.11) and (3.12). In
particular, we have IM - Mj 15 6. For small enough 6, both M and M have strictly
positive entries in. the exact same locations. Since M is irreducible, it follows that
M is irreducible. The Perron-Frobenius eigenvalue of a matrix which is the spectral
norm of the matrix is a continuous function of its entries. For small enough 6, the
Perron-Frobenius eigenvalue of M is bigger than 1, say 1+2C for some C > 0. Let z be
the corresponding left eigenvector with all positive entries and let Zm = min(id)ES zi
and zM A max(ij)ES z . Define the random process
W(k) A
z B (k).
(3.15)
(ij)ES
Then setting AW(k +1) = W(k + 1) - W(k), from (3.14) we have
E[AW(k + 1)|H(k)] = z'EZ(k + 1)
= z' (
- IyQ) A(k)
= 2(z'A(k) - 7 z'QA(k).
The first term satisfies 2(z,,
2(z'A(k) 5 2 (zm. This is because 1'A(k) = 1 and
hence z'A(k) is a convex combination of the entries of z. By choosing 7 small enough,
we can ensure yz'QA(k) 5 Czm. Let %= Czm > 0. Then, we have
E[AW(k + 1)IH(k)} >
K.
(3.16)
We now use a one-sided Hoeffding bound argument to show that with high probability the quantity W(k) grows to a linear size by time en.
81
Let X(k + 1) =
n -- AW(k + 1). Then
E[X(k + 1)IH(k)]
0.
(3.17)
cw(n) almost surely, for some constant c > 0.
Also note that IX(k + 1)1
For any B > 0 and for any -B < x < B, it can be verified that
2
B
1 eB - e-B
e-"
+
eB
<1
ex
x<e2~
2
+i
- 2
2
I
2
e-B
B-
2
x.
Using the above, we get for any t > 0,
E
1 etew(n) - e-taa(n) E[X(k
2
2
2 2 2
e_
t c
2 2
+
k
[6tX(k+l)|H
E~ex~~lIH(k)J
:!
2
1)IH(k)] < e
where the last statement follows from (3.17). We can now compute
en-1x
E[etEko X(k+1)]
-
J
en I
t22Cj2n
en
E[etX(k+l)IH(k)] < e
2
k=O
So,
X(k + 1) > Ern/2
p
=
P(etX"-' X(k+1)-tEn/2> 1) _ e-tn+*,
(k=O
Optimizing over t, we get
en-1
<2e=
X(k + 1) > ern/2) < e~~iP->- = o(1),
(
P
k=O
82
2
t c w (n)
2
2
which follows by using Lemma 3.2. Substituting the definition of X(k + 1),
P (W(en) <
o(1).
Recall that W(k) =Z(j)ES z B (k) ; Nzm max(ij)es B (k)
Define i A
(3.18)
NzM max(ij)Es A (k).
. Then it follows from (3.18) that there exists a pair (i',j') such
that
A, (en) > n,
w.p
1 - o(1).
Using the fact that the number of active clones grows to a linear size we now show
that the corresponding component is of linear size. To do this, we continue the
exploration process in a modified fashion from time en onwards. By this we mean,
instead of choosing active clones uniformly at random in step 2(a) of the exploration
process, we now follow a more specific order in which we choose the active clones and
then reveal their neighbors. This is still a valid way of continuing the exploration
process. The main technical result required for this purpose is Lemma 3.7 below.
Lemma 3.7. Suppose that afteren steps of the explorationprocess, we have A' (en) >
/in for some pair (i', j'). Then, there exists e 1 > e and 61 > 0 for which we can continue the exploration process in a modified way by altering the order in which active
clones are chosen in step 2(a) of the exploration proces such that at time e1n, w.h.p.
for all (i, j) E S, we have Aj(ein) > 61 n.
The above lemma says that we can get to a point in the exploration process where
there are linearly many active clones of every type. An immediate consequence of
this is the Corollary 3.1 below. We remark here that Corollary 3.1 is merely one of
the consequences of Lemma 3.7 an can be proved in a much simpler way. But as we
83
will see later, we need the full power of Lemma 3.7 to prove Theorem 3.2-(b).
Corollary 3.1. Suppose that after en steps of the exploration process, we have
A,(En) >
mn for some pair (i',j'). Then there exists
62
> 0 such that w.h.p.,
the neighbors of the A~f' clones include at least 62n vertices in Gj.
Before proving Lemma 3.7, we state a well known result. The proof can be
obtained by standard large deviation techniques. We omit the proof.
Lemma 3.8. Fix m. Suppose there are there are n objects consisting of ain objects
of type i for I < i < m. Let 6 > 0 be a constant that satisfies / < maxi a . Suppose
we pick 8n objects at random from these n objects without replacement. Then for
given e' > 0 there exists z = z(e', m) such that,
#ojects chosen of type i _<
n
I
Proof of Lemma 3.7. The proof relies on the fact that the matrix M is irreducible.
If we denote the underlying graph associated with M by W, then It is strongly
connected. We consider the subgraph '
of W which is the shortest path tree in W
rooted at the node (i',j'). We traverse '7f' breadth first. Let d be the depth of 7?'.
We continue the exploration process from this point in d stages 1,2, ... , d. Stage
I begins right after time en. Denote the time at which stage 1 ends by eCn. For
convenience, we will assume a base stage 0, which includes all events until time en.
For 1 < 1 < d, let E, be the set of nodes (i, j) at depth I in 7.
We let Eo = {(i', j')}.
We will prove by induction that for I = 0, 1,... , d, there exists 6) > 0 such that
at the end of stage 1, we have w.h.p., Af > SVn for each (i,j) e
U' =01.
Note
that at the end of stage 0 we have w.h.p. Ai,' > pn. So we can choose 5(O) =
to satisfy the base case of the induction. Suppose I2,I = r. Stage I + 1 consists of
84
r substages, namely (I+ 1, 1), (1+1, 2),..., (1+1, r) where each substage addresses
exactly one (i, j) E I. We start stage (I+1,1) by considering any (i, j) E Z . We
reveal the neighbors of aJ(3n clones among the A > (l)n clones one by one. Here
0 < a < 1 is a constant that will describe shortly. The evolution of active clones in
each of these ao3Un steps is identical to that in the event E, in Case 1 of the original
exploration process. Fix any (j, m) E 24k. Note that Miim > 0 by construction of
7V. So by making e and el, ... , El smaller if necessary and choosing a small enough,
we can conclude using Lemma 3.5 that for all time steps k < ein + a(Irn we have
IIMk(n)
- MII < 3 for any 3 > 0. Similarly, by using Lemma 3.4, we get
d
d(Njd(k)
3+
j (kdi
dp
A?{k)
d\
-d.Nd(k)
i(k
i
dzpd
-
.
<
Li'(k) - Jij
. < J. (3.19)
Aj
By referring to the description of the exploration process for the event E in Case 1,
the expected change in Z,7(k+ 1) during stage (I + 1,1) can be computed similar to
(3.13) as
AI (k) - 53(
+
L7 (k) - J
(dmkin) E[Z((k
(Mk(n))i
(a)
d, N'(k)
d
L-j k) - 4i
A.7(k) - J
-
(
-
1)H(k)) +(-im)+
L (k) - J
(b)
Mijim - 23> 3,
where (a) follows from (3.19) and (b) can be guaranteed by choosing small enough
8. The above argument can be repeated for each (j, m) E Ii.
We now have all the
ingredients we need to repeat the one-sided Hoeffding inequality argument earlier in
this section. We can then conclude that there exists
85
3,"
> 0 such that w.h.p. we
have at least bjn active clones of type (j, m) by the end of stage (I + 1, 1). By the
same argument, this is also true for all children of (i, j) in re. Before starting stage
S1we
set ()
=
min{(1 - a)5( 1 , 5)}. This makes sure that at every substage of
stage I we have at least V(In clones of each kind that has been considered before.
This enables us to use the same argument for all substages of stage 1. By continuing
in this fashion, we can conclude that at the end of stage 1+ 1 we have 60(+,)n clones
of each type (i, j) for each (i,j) E
U1+1
for appropriately defined J('+'). The proof
is now complete by induction.
Proof of Corollary 3.1. Consider any j E [p]. We will prove that the giant component
has linearly many vertices in G with high probability.
Let d be such that pd > 0 and let di > 0 for some i E [p]. This means in the
configuration model, each of these type d vertices have at least one clones of type
(j, i). Continue the exploration process as in Lemma 3.7. For small enough 1Ethere
are at least n( j-
ei) of type (j, i) clones still unused at time c1 n. From Lemma 3.7,
with high probability we have at least S 1n clones of type (i, j) at this point. Proceed
by simply revealing the neighbors of each of these. Form Lemma 3.8, it follows that
with high probability, we will cover at least a constant fraction of these clones which
correspond to a linear number of vertices covered. Each of these vertices are in the
giant component and the proof is now complete.
0
We now prove part(b) of Theorem 3.2. Part (a) will be proved in the next section.
We use the argument by Molloy and Reed, except for the multipartite case, we will
need the help of Lemma 3.7 to complete the argument.
Proof of Theorem 3.2 (b). Consider two vertices u, v E g. We will upper bound the
probability that u lies in the component C, which is the component being explored at
86
time en and v lies in a component of size bigger than /log n other than C. To do so
start the exploration process at u and proceed till the time step ein in the statement
of Lemma 3.7. At this time we are in the midst of revealing the component C. But
this may not be the component of u because we may have restarted the exploration
process using the "Initialization step" at some time between 0 and Ein. If it is not
the component of u, then u does not lie in C. So, let us assume that indeed we
are exploring the component of u. At this point continue the exploration process
in a different way by switching to revealing the component of v. For v to lie in a
#log
component of size greater than
n, the number of active clones in the exploration
process associated with the component of v must remain positive for each of the first
Slog n steps.
At each step choices of neighbors are made uniformly at random. Also,
from Lemma 3.7, C has at least 6in active clones of each type. For the component
of v to be distinct from the component of u this choice must be different from any of
these active clones of the component of u. So it follows that the probability of this
event is bounded above by (1
-
3j)"*s. For large enough
/,
this gives
P(C(u) = C, C(v) # C, IC(v)I > Plogn) = o(n- 2).
Using a union bound over all pairs of vertices u and v completes the proof.
3.7
0
Size of the Giant Component
In this section we complete the proof of Theorem 3.2-(a) regarding the size of the
giant component. For the unipartite case, the first result regarding the size of the
giant component was obtained by Molloy and Reed [44] by using Wormald's results
[551 on using differential equations for random processes. As with previous results for
87
the unipartite case, we show that the size of the giant component as a fraction of n
is concentrated around the survival probability of the edge-biased branching process.
We do this in two steps. First we show that the probability that a certain vertex v
lies in the giant component is approximately equal to the probability that the edgebiased branching process with v as its root grows to infinity. Linearity of expectation
then shows that the expected fraction of vertices in the giant component is equal to
this probability. We then prove a concentration result around this expected value to
complete the proof of Theorem 3.2. These statements are proved formally in Lemma
3.10.
Before we go into the details of the proof, we first prove a lemma which is a very
widely used application of Azuma's inequality.
Lemma 3.9. Let X = (X1 , X 2 ,..., Xt) be a vector valued random variable and let
f(X) be a function defined on X. Let .Fk A a(X1,..., Xk). Assume that
IE(f(X) Fk) - E(f(X)IFk+l)I
c.
almost surely. Then
P(If(X) - E[f(X)]j > s) < 2e~2c2
Proof. The proof of this lemma is a standard martingale argument. We include it
here for completeness. Define the random variables Yo,... , Y as
Ye
E(f (X) k).
The sequence {Y} is a martingale and IY - Yk+l1I 5 c almost surely. Also Y =
88
f
(X)
and Y = E[f(X)]. The lemma then follows by applying Azuma's inequality to the
martingale sequence {Y}.
0
Lemma 3.10. Let e > 0 be given. Let v E g be chosen uniformly at random. Then
for large enough n, we have
P(v E C) - P(1TI= oo)I < E.
Proof. We use a coupling argument similar to that used by Bollobas and Riordan [8]
where it was used to prove a similar result for "local" properties of random graphs.
We couple the exploration process starting at v with the branching process Tn(v) by
trying to replicate the event in the branching process as closely as often as possible.
We describe the details below.
The parameters of the distribution associated with Tn is given by
\
. In
the exploration process, at time step k the corresponding parameters are given by
d, 4 d(k)
(see Section 3.5). We first show that for each of the first 6logn steps of
the exploration process, these two quantities are close to each other. The quantity
di N(k) is the total number of sleeping clones at time k of type (j,i) in Gj that belong
to a vertex of type d. At each step of the exploration process the total number of
sleeping clones can change by at most w(n). Also Lj(k) is the total number of living
clones of type (j,i) in Gj and can change by at most two in each step.
Then initially for all (i, j) we have L (0)
89
=
O(n) and until time 03logn it remains
e(n). Therefore,
diNd(k+ 1) d N (k) < d N'(k +1) -ds N'(k)
L3 d(k +1) -i L2 L(k) -5ij - L. (k) -ij
+di Nj (k +1)
+ (k) -6j
dt Njd(k +1)
L4 (k+1) -6
From the explanation above, the first term is O(w(n)/n) and the second term is
. From this we can conclude by using a telescopic
0(1/n). Recall th-
sum and triangle inequality that for time index k < 8 log n,
di N'd (k)
_
pd ch(n) = (kw(n)/n) = O(w(n) log n/n).
, L3dL(k) - 6j Ai(n)
So the total variational distance between the distribution of the exploration process and the branching process at each of the first 8 log n steps is 0(w(n) log n/n).
We now describe the coupling between the branching process and the exploration
process. For the first time step, note that the root of Tn has type (i, d) with probability pd. We can couple this with the exploration process by letting the vertex
awakened in the "Initialization step" of the exploration process to be of type (i, d).
Since the two probabilities- are the same, this step of the coupling succeeds with
probability one. Suppose that we have defined the coupling until time k < 8log n.
To describe the coupling at time step k + 1 we need to consider the case of two
events. The first is the event when the coupling has succeeded until time k, i.e.,
the two processes are identical. In this case, since the total variational distance
between the parameters of the two processes is O(w(n) log n/n) we perform a maximal coupling, i.e., a coupling which fails with probability equal to the total variational distance. For our purposes, we do not need to describe the coupling at time
90
k + 1 in the event that the coupling has failed at some previous time step. The
probability that the coupling succeeds at each of the first
# log n steps
is at least
(1 - O(w(n)logn/n))'O'og = 1- O(w(n)(logn)2 /n) = 1-o(1). We have shown that
the coupling succeeds till time 8 log n with high probability. Assume that it indeed
succeeds. In that case the component explored thus far is a tree. Therefore, at every
step of the exploration process a sleeping vertex is awakened because otherwise landing on an active clone will result in a cycle. This means if the branching process has
survived up until this point, the corresponding exploration process has also survived
until this time and the component revealed has at least 6 log n vertices. Hence,
P(IC(v)I > 8#log n) = P(ITrI > 3 log n) + o(1).
But Theorem 3.2 (b) states that with high probability, there is only one component
of size greater than #log n, which is the giant component, i.e.,
P(v E C) = P(IC(v)I > 8#log n) +o(1) = P(I TI > 83logn)+ o(1).
So, for large enough n, we have IP(v E C) - P(7}j > fllogn)
e/2. The survival
probability of the branching process T is given by
P(ITI
=
Choose K large enough such that
n)-+
0)=1
IP(IT|
-
P(T = i).
> K) - P(|T| = oo)j < e/4. Also, since
for all i, j, d, from the theory of branching processes, for large enough
91
jP(TnhI
K)
-
P(1TI
1P(jTj = 00) - P(TJ
K)l
=
Since for large enougn n, we have P(TnI = oo)
e/4,
oo)I < E/2.
P(T-I > ,logn)
5 P(IT.
K),
E
the proof follows by combining the above statements.
Now what is left is to show that the size of the giant component concentrates
around its expected value.
Proof of Theorem 3.2 (a)
-
(size of the giant component). From the first two parts
of Theorem 3.2, with high probability we can categorize all the vertices of g into two
parts, those which lie in the giant component, and those which lie in a component
of size smaller than 03logn, i.e., in small components. The expected value of the
fraction of vertices in small components is 1- ' + o(1). We will now show that the
fraction of vertices in small components concentrates around this mean.
Recall that cn = n
ZiEpdED1'd
Pf is the number of edges in the configuration
model. Let us consider the random process where the edges of the configuration
model are revealed one by one. Each edge corresponds to a matching between clones.
Let Ei 1 < i < cn denote the (random) edges. Let Ns denote the number of vertices
in small components, i.e., in components of size smaller than
# log n.
We wish to
apply Lemma 3.9 to obtain the desired concentration result for which we need to
bound IE[NsIEi,. .. , Ec] - E[NSIE1, ... , Ek+. 1. In the term E[NsIEi,..
], let
. , E+ 1
Ek+1 be the edge (x, y). The expectation is taken over all possible outcomes of the rest
of the edges with Ek+1 fixed to be the edge (x, y). In the first term E[NsEi,..., Ek,]
after E1,..
.
, Ek are revealed, the expectation is taken over the rest of of the edges,
92
which are chosen uniformly at random among all possible edges. All outcomes are
equally likely. We construct a mapping from each possible outcome to an outcome
that has Ek+1 = (x, y). In particular, if the outcome contains the edge (x, y) we can
map it to the corresponding outcome with Ek+1 = (x, y) by simply cross-switching
the positions of (x, y) with the edge that occured at k + 1. This does not change
the value of Ns because it does not depend on the order in which the matching is
revealed. On the other hand, if the outcome does not contain (x, y), then we map it
to one of the outcomes with Ek+1 = (x, y) by switching the two edges connected to
the vertices x and y. We claim that switching two edges in the configuration model
can change NS by at most 4,6 log n. To see why observe that we can split the process
of cross-switching two edges into four steps. In the first two steps we delete each of
the two edges one by one and in the next two steps we put them back one by one
in the switched position. Deleting an edge can increase Ns by at most 2/6 log n and
can never reduce Ns. Adding an edge can decrease Ns by at most 2/6 log n and can
never increase Ns. So cross-switching can either increase or decrease NS by at most
4/0 log n. Using this we conclude
IE[NsIEi,. .. , Ek] - E[NsjE1,... , Ek+]
4log n.
We now apply Lemma 3.9 to obtain.
P
(Ns - (1 - 71)) >
< e-
(1).
Since with high probability, the number of vertices in the giant component is n - Ns,
the above concentration result completes the proof.
93
0
3.8
Subcritical Case
In this section we prove Theorem 3.3. The idea of the proof is quite similar to that
of the supercritical case. The strategy of the proof is similar to that used in [43].
More specifically, we consider the event E, that a fixed vertex v lies in a component
of size greater than Cw(n)2 log n for some C > 0. We will show that P(E,) = o(n').
Theorem 3.3 then follows by taking a union bound over v E G.
Assume that we start the exploration process at the vertex v. For v to lie in
a component of size greater than (w(n) 2 log n the exploration process must remain
positive for at least (w(n) 2 logn time steps, at each step of the exploration process,
at most one vertex is new vertex is added to the component being revealed. This
means at time (w(n) 2 log n we must have A (Cw(n) 2 log n) > 0, where recall that
A(k) denotes the total number of active clones at time k of the exploration process.
Let H(k) = c-({A(r),
(ij) E S, I < r < k}) denote the filtration of the ex-
ploration process till time k. We will assume that A(k) > 0 for 0 < k < (w(n) 2 log n
and upper bound P(A((w(n) 2 log n) > 0). We first compute the expected conditional
change in the number of active clones at time k for 0 < k < (w(n) 2 logn by splitting
the outcomes into the several possible cases that affects Z-'(k + 1) as in (3.13).
E[Z (k + 1)|H(k)} = P(EI H(k)) E[Z(k + 1)1H(k), Efl
+ 1] P(Eln EaIH(k)) E[Z(k + 1)IH(k), E,' n Eal
m,d
+ P(E,. n Ed|H(k)) E[Z(k + 1)jH(k), E n Ed]
94
Ai(k) _1) + E
A(k)
A
md A(k)
A?-(k)
A(k).
nk)'n)(-j5mj)
A(k) LV(k)
N(k)diN(k) (d 5
Rin(k)
A. (k) Ai(k)
A+(k) L(j)(k) +
A
4'(k)
1:di N (k) d
i)
d 1-6"
L k
A(k)
We proceed with the proof in a similar fashion to the proof of the supercritical
case. Let E[Z(k + 1)IH(k)] = (E[Z '(k + 1)jH(k), (ij) E S) and define the vector
quantity A(k) = (j),
(ij) E S). Also define the matrix Q(k) E RNxN where
rows and columns are indexed by double indices and for each (i, j) E S, and
A (k)
Qi1 (k)=- 14k
Qijim(k) = 0 for (l, m)
# (j, i).
Then the expected change in the number of active clones of various types can be
compactly written as
E[Z(k + 1)IH(k)]= (M(k) - I + Q(k)) A(k).
As the exploration process proceeds, the matrix M(k) changes over time. However
for large enough n, it follows from Lemma 3.5 that the difference between M(k) and
M is small for 0 < k < 1(w(n) 2 log n. In particular given any e > 0, for large enough
n, we have IIM(k) - Mjj < E. Also from Lemma 3.4 we also have IIQ(k)I < e. Let
z be the Perron-Frobenius eigenvector of M. By the assumption in Theorem 3.3, we
95
have
z'M= (1 -6)z'
for some 0 < 6 < 1, where (1-6) =
-yis the Perron-Frobenius
eigenvalue of M. Also
let zm = mini z, and zM 4 max, zi. Define the random process
W(k)
zA(k)
Then the expected conditional change in W(k) is given by
E(AW(k + 1)jH(k)) = z'EZ(k +1)
= z'(M(k) - I+ Q(k)) A(k)
=
z'(M - I)A(k) + z'(M(k) - M + Q(k))A(k)
=
(-6)z'A(k) + z'(M(k) - M + Q(k))A(k).
We can choose E small enough such that z'(M(k) - M + Q(k)) < 6z', where the
inequality refers to element wise inequality. Thus
E(AW(k)jH(k)) <-
z'A(k) <
2z,= K.
We can now repeat the one-sided Hoeffding bound argument following equation (3.16)
in the supercritical case and obtain the following inequality:
2
P(JW(ai) + Ka)j > 6) <
96
2e
2-.(n)
Setting a = (w 2 (n) logn and 5=
ra, we get
P(W(Cw2 (n) log n) > 0) :; 2e
=
o(n-'),
for large enough C. We conclude
P(G has a component bigger than Cw 2(n) logn) <E
P(C(v) > Clogn)
o(1).
VE9
This completes the proof of the theorem.
3.9
Future Work
Our result addresses the super-critical (Theorem 3.2) and sub-critical (Theorem 3.3)
case, but leaves the critical case unresolved. This has been studied in detail for
random graphs with degree distributions for the uni-partite case [34]. It may be
possible to extend these results to the multipartite case. It may also be possible to
strengthen the concentration bounds in our results by establishing exponential decay
like in [8], via a purely branching process based analysis.
97
98
Chapter 4
MAX-CUT on Bounded Degree
Graphs with Random Edge
Deletions
4.1
Introduction
The problem of finding a cut of maxi1um size in an arbitrary graph g = (V, C) is a
well known NP-hard problem in algorithmic complexity theory. In fact, even when
restricted to the set of 3-regular graphs, MAX-CUT is NP-hard with a constant
factor approximability gap [51, i.e., it is NP-hard to find a cut whose size is at least
(1- ecr,.t) times the size of the maximum cut, where e,.j
= 0.003. The weighted
MAX-CUT is a generalization of the MAX-CUT problem, where each edge e e E is
associated with a weight we, and one is required to find a cut such that the sum of
the weights of the edges in the cut is maximum.
We study a randomized version of the weighted MAX-CUT problem, where each
99
weight W is a Ber(p) random variable, for some 0 < p < 1, and the weights associated with distinct edges are independent. Since the weights are binary, weighted
MAX-CUT on the randomly weighted graph is equivalent to MAX-CUT on a thinned
random graph where the edges associated with zero weights have been deleted. We
call this problem the thinned MAX-CUT. The variable p controls the amount of
thinning. It is particularly simple to analyze thinned MAX-CUT, when p takes one
of the extreme values, i.e., p = 1 and p = 0. When p = 1, all edges are retained
and the thinned graph is same as the original graph and finding the maximum cut
remains computationally hard. On the other hand, when p = 0, the thinned graph
has no edges, and the MAX-CUT problem is trivial. This leads to a natural question
of whether there is a hardness phase transition at some value of 0 < p = pc < 1.
In this chapter, we identify a threshold for hardness phase transition in the thinned,
MAX-CUT problem on graphs with maximum degree d = 0(1). We show that on
the set of all graphs with degree at most d, the phase transition occurs at pc =
. This phase transition coincides with a phase transition in the connectivity
properties of the random thinned graph. For p < pc we show that the random
d1
graph resulting from edge deletions undergoes percolation, i.e., it disintegrates into
disjoint connected components of size O(logn). The existence of a polynomial time
algorithm to compute MAX-CUT then follows easily because it is possible to simply
employ brute force computation in this case. On the other hand, For p > pc, we
show NP-hardness by constructing a reduction from EiW-optimal MAX-CUT on
3-regular graphs. Our reduction proof uses the random bipartite graph based gadget
Wi similar to [50], where it was used to establish hardness of computation of the
partition function of the hardcore model. Given a 3-regular graph g on n vertices,
the gadget '? is d-regular and is constructed by first replacing each vertex v of g by
a random bipartite graph J, of size 2n2 consisting of two parts
100
, and S,, and then
adding connector edges between each pair of these bipartite graphs whenever there
is an edge between the corresponding vertices in g. The key result in our reduction
argument is that the gadget 1 satisfies a certain polarizationproperty which carries
information about the MAX-CUT on the 3-regular graph g used to construct it.
Namely, we show that any maximal cut of the thinned graph 1, obtained from I
must fully polarize each bipartite graph and include all its edges in the cut, i.e.,
either assign the value 1 to almost all vertices in X and 0 to almost all vertices in
S,, or vice versa. Then, we show that the cut Cg of g obtained by assigned binary
values to its vertices v based on the polarity of J,, must be at least (1 - eit.a)optimal. To establish this polarization property we draw heavily on the results and
proof techniques from Chapter 3 about the giant component on random multipartite
graphs.
Our study of the thinned MAX-CUT problem is motivated from the point of view
of establishing a correspondence between phase transition in hardness of computation
and decay of correlations. Our formulation of the thinned MAX-CUT problem was
inspired by a similar random weighted maximum independent set problem studied
by Gamarnik et. al in [19].
The rest of the chapter is organized as follows. In Section 4.2, we state our main
theorems that establish the hardness phase transition threshold for thinned MAXCUT. In Section 4.3 we prove that below the critical threshold thinned MAX-CUT
can be solved in polynomial time and in Section 4.4, we construct a reduction from
-optimal MAX-CUT to show NP-hardness of thinned MAX-CUT above the critical
threshold. Finally in Section 4.5, we end with some concluding remarks and open
problems.
101
4.2
Main Results
In this section, we state our main theorems regarding the hardness phase transition
in thinned MAX-CUT on bounded degree graphs. Given an integer d > 3, define the
critical probability pc as
(4.1)
pe(d)
Theorem 4.1. Suppose d is fixed and let p <pe(d). There exists a polynomial
time algorithmA such that given any graph g on n vertices and maximum degree d,
with high probabilityA solves the thinned-MAX-CUT problem on G, i.e., produces a
maximal cut of the thinned random graph G,.
Theorem 4.2. Suppose d is fixed and let p > pe(d). There exists a pairof polynomial
time algorithms A 1 and A 2 such that, given an arbitrary3-regular graph G on n
vertices:
" A, constructs a random d-regulargraph W from g on 2n2 vertices.
" Let Cu, be a solution to thinned-MAX-CUT on
, i.e., a maximal cut on
,.
Then A 2 uses C, to produce a cut Cg on g such that with high probability,
the cut Cg is an c,.gweoptimal cut of G.
The following result from [5] justifies the reduction in Theorem 4.2. A cut Cg of
a graph G is said to be c-optimal if its size is at least (1 - c) times the size of the
maximum cut of G.
Theorem 4.3 ([5]). Let ec,tic = 0.003. The problem of finding an 6c,.iew-optimal
cut on the set of 3-regulargraphs is NP-hard.
102
Theorem 4.2 in conjunction with Theorem 4.3 states that if p > pc(d), the
thinned-MAX-CUT problem is NP-hard by providing a reduction from c,-itma-optimal
MAX-CUT on 3-regular graphs.
Along with Theorem 4.1, this shows that the
thinned-MAX-CUT problem undergoes a computational hardness phase transition
at p = Pc.
4.3
Proof of Theorem 4.1
The proof of Theorem 4.1 is closely related to percolation on random graphs. Indeed
we can show that the following holds.
Proposition 4.1. With high probability, all connected components of 9, are of size
O(log n).
We first show how this proposition can be used to prove Theorem 4.1.
Proof of Theorem 4.1. Since g, is a disjoint union of connected components, finding
the maximum cut on g, is equivalent to finding the maximum cut on each of the
disjoint components individually. Since from Proposition 4.1 each of the component
is of size O(log n) w.h.p, we can use brute force evaluation to find the maximum
cut of each component in time 2 0(2*sn) = no(i). Since, there are at most n such
components the proof is complete.
0
What is left is to prove Proposition 4.1.
Proof of Proposition4.1. Set p(d - 1) = 1 - E with e > 0. Let v E V be any vertex.
Let C, = (Vt, E.) be the connected component of g containing v. Similarly, let
Cv,,
= (V,,,, E , ,,) be the connected component of .,,
containing v. We reveal the
component C,,,, sequentially via the following exploration process.
103
(a) Initialization. Set for time t = 0,
-
The set of explored vertices Ve = 0
-
The set of unexplored vertices Ve,o = v
-
The set of initial edge E, = E
(b) Repeat until V,,
0 and time
1 < t < n:
1. Pick any u E V,,t.
2. Let u1 ,...um be the neighbors of u in the graph (Vt, Ce) with edges
(u, U) E Ce. For each 1 < i < M, delete each edge (u, uj) independently with probability (1 - p). Remove these deleted edges from the set
4e. Let (u, uj),. .. , (u, uiN(t) be the undeleted edges.
3. Set Ve +-Ve U u and Vu,t+1 +- VUt U {u,
...
USN(t) }\u.
The above exploration process terminates in at most n steps with Ve = C,,p and
Ee = S,,,. The way the exploration process proceeds ensures that every edge in S, is
verified for deletion exactly once.
In each iterate of step (b), the number of vertices added to Ve is N(t) which is
a random variable distributed as N(t) ~ Bin(M,p). Since G has maximum degree
d, we must have M < d - 1 (other than the first step when M = d). In addition,
the random variables N(t) are independent for different t. If IV,,,J
k log n, then
we must have IV,,tI > 0 for 1 < t < klogn. Note that IVu,tI = 1 + E'.
N(r) - t.
Let B 1, B2 ,..., Bn be i.i.d. Bin(d - 1, p). Then from the preceding discussion, the
random variables Bt stochastically dominate the random variables N(t) and hence
104
Vu, I is stochastically dominated by 1+ Et
B, - t. Therefore,
klogn
(4.2)
N(r) > klogn - 1)
P(IVu,kIgn > 0) = P(
r=-1
klogn
P(Z B, > klogn - 1)
(4.3)
klogn
)
(4.4)
=P
-
Therefore, P(Cvp,
o(1/n)
E
k
[Br - (1
>
log
nk
for large enough k.
4.4.1
(4.5)
> klog n) = o(1/n). Taking a union bound over all vertices in g
completes the proof of the proposition.
4.4
log n
0
Proof of Theorem 4.2
Construction of gadget for reduction
In this section, we describe the steps in the construction of the graph Wi in Theorem
4.2 from any given 3-regular graph G. These steps together form the description of
the algorithm A 1 . A similar construction was used by Sly in [501 to prove hardness
of approximation of the partition function in the hardcore model.
Let G = (V, E) with IVI = n. For each v E V, we first construct a random
bipartite graph J, on 2n2 vertices. Here 0 < - < 1 is a fixed constant that we will
choose later. Let us denote by R and S, the two parts of the bipartite graph J,
each of size n2 . To construct the random graph J,, we first fix a degree sequence
consisting of (1 - y) fraction of degree d vertices and y fraction of degree d - 1
vertices in each of R, and S,. We then construct a random bipartite graph using the
105
configuration model described in Chapter 3 with the degree distribution described
above. Note that the number of edges in the bipartite graph is given by cn2 , where
(4.6)
c =d(l - -y)+ (d - 1)-j.
We call the -yn2 vertices with degree d - 1 in & and S, the connector vertices
and denote them by C, = CR, U Cs,. Divide CR, and Cs, into three equal parts
each of size in2 and denote them by CR,.,, and Cs,,,, for 1 < i < 3 where vi, v 2 , v3
are the neighbors of v in g. We construct ^nb regular trees with out degree d - 1
and depth 2 [O.511 log nj where it
2b
. Note that each tree has n2-b leaves and
there are -n 2 leaves in total. Identify these leaves with the -n 2 connector vertices
CS. Is. We refer to the roots of the corresponding trees as TR;,, and Ts,,
.
CR,,V1 . Repeat the same process for the rest of the connector vertices in CR,,,, and
For each (i,j) E E, add edges between Ji and J by forming a matching between
nbroots in TRy and 'n' roots in TR,,i. At the end of the construction all of the roots
have been matched. This concludes the construction of W and hence the steps in
the algorithm A 1 . From the above description, it is clear that A 1 runs in polynomial
time.
Note that each vertex in W has degree exactly equal to d. The graph
7, is obtained
from W by deleting each edge independently with probability 1 - p. This particular
way of constructing
7-, allows us to relate the
MAX-CUT on
7,
to the MAX-CUT
on g. In the following sections, we prove properties of 7, and the MAX-CUT on
that will lead us to the proof of Theorem 4.2.
106
7,
JJu
n
Figure 4-1: Illustration of the bipartite graphs J. and J, in 7- associated with an
edge (u, v) E
4.4.2
Properties of '-4,
In this section, we prove a series of properties of the thinned gadget 7%,that helps
us in determining the key properties of any maximum cut of N7-,.
First, we consider the properties of each of the bipartite graphs Jo, individually.
Performing the thinning process on the random bipartite graph J, leads to a random
bipartite graph with a specific degree distribution. To analyze this degree distribution, it is useful to think in terms of the configuration model used for generating
J,. Since the edge deletion process is performed independently of the random graph
construction, we can think of the deleted edges directly terms of the clones in the
configuration model. Each clone in R, and S, is associated with an edge. So deleting an edge in the configuration model corresponds to deleting a clone in R, and
its neighboring clone in S,. Because of the independence, the process of random
matching in the configuration model and then deleting each edge with probability
1
-
p is equivalent to:
107
0
Sampling N ~ Bin(cn2 ,p).
* Retaining N clones from the en 2 clones in each of R,, and S, uniformly at
random and deleting the rest.
e Performing a random matching between the remaining N clones in R, and S,.
Let Nd,, denote the number of non-connector vertices in R, that have degree r
after the thinning, and let Nd-l,, denote the number of connector vertices in R, that
have degree r after the thinning. Let b,(m, r) denote the binomial coefficient given
by b,(m, r) = (')p_(1 -)"
The following concentration inequalities can be obtained by standard Chernoff
bounds: there exists el > 0 such that
" (IN/cn 2 _
P
"
(4-7)
>C)<-,,n2
N(( , 2 - b,(d, r) >
(1( - -y)nI
N- '1,-b,(d
- 1, r) > f
< e-1" 2 ,
(4.8)
< c-i"2(49
The degree distribution of J, after random edge deletions can be inferred from
(4.7)-(4.9). Denoting by p3 the fraction of vertices of degree j in & and S, we get
that
=
(1 - y)b,(d,j)+yb,(d-1,j),
(1 - y)bp(d, d),7if
if0
jd-1
(4.10)
j = d.
In the discussion after Theorem 3.2 in Chapter 3, the condition for the existence of
108
a giant component is stated for the special case of bipartite graphs as
jk(jk - j - k)pjqk > 0.
(4.11)
j,k
When pj = qj, the above condition reduces to
Zjk(jk - j - k)p pk > 0
(4.12)
j,k
2
2
2pj
(4.13)
>2pj
0
2
(4.14)
- 2jp 3 > 0.
j
Plugging in the values of pj in (4.14), we get
d-1
2pj
- 2jpj =
2
Z(j
1
-
2j)[(1 - 7)b,(d, j) + 7b,(d - 1, j)]
(4.15)
+ (1 - 7) (d2 - 2d)b,(d, d)
d
(1 - Y)
(4.16)
d-1
(
2
U(2 - 2j)b,(d - 1, j)
- 2j)b,(d, j) +7
(4.17)
j=1
j=1
(Z)
(1 - -y)dp[(d - 1)p - 1] + -y(d - 1)p[(d - 2)p - 1] > 6,,
(4.18)
where 6, > 0 is a constant and (x) follows by choosing -f small enough, and using
the fact that p(d - 1) > 1.
We can use the results of Chapter 3 to conclude the existence of a giant component
in J, with high probability. We however need sharper concentration inequalities than
those in Chapter 3. So, instead we follow the proofs and exploit the fact that the
109
degree distribution of J, has maximum degree d = 0(1) to obtain the following
lemmas.
Lemma 4.1. With probability at least 1
-
e-2n2 , for every v E V, the random
bipartite graph J, has a giant component.
Before proving the above lemma, we first prove the following variation of Lemma
3.7 with a tighter concentration inequality.
Lemma 4.2. Suppose that the maximum degree w(n) = d = 0(1). There exists a
constant t > 0, such that after time tn in the exploration process, the number of
active clones in each part A2(tn 2 ) and A'(tn2 ) is greaterthan 6n2 with probability at
least 1 - e~,*2 for some positive constants 6 and ei.
Proof. The proof of the lemma is along the lines of the proof of Theorem 3.3. Following the proof of Theorem 3.3 and taking into account the fact that the maximum
degree w(n) = d in (3.1), we can strengthen the inequality in Eq. (3.18) to obtain
where el =
2
8c'd
and W(.) is defined in (3.15). We can then proceed by using the
above argument in the proof of Lemma 3.7 (and make el smaller if necessary) to
0
complete the proof.
Proof of Lemma 4.1. For any v E V, by Lemma 4.2, the number of active clones in
the exploration process associated with the random bipartite graph J, reaches size
greater than 6n2 with probability at least 1 - e-,n 2 . Taking a union bound over all
v E V, we can extend the previous statement to all Jo,. The proof is then complete
by noticing that the component corresponding to these active clones must be a giant
0
component.
110
We would now like to show that the trees TH, , , have a linear fraction of their
leaves in the giant component. The next lemma formalizes this statement.
Lemma 4.3. The following statements hold:
(a) There exists 6e > 0 such that, with large enough n with probability greaterthan
1
-
n- 9 8 , for every v E V, J, has a giant component and there are at least
en2 -b
connector leaves of each tree in T in the giant component of J,.
(b) Let NT be the number of leaves of a tree T E T in the giant component of J,.
Then conditioned on NT, the distribution of of these NT leaves is uniformly at
random among all n2 - leaves of T.
Proof. We prove the lemma by using a similar line of argument to proving the concentration inequalities on the size of the giant component in Section 3.7 in the course
of the proof of Theorem 3.2 (a). We rewrite the modified proof here for completeness. Recall that the probability that every other component other than the giant
component is of size less than 6 log n is at least 1-n-1 00 for some constant [ > 0 and
large enough n. Taking a union bound over all vertices v E V, we can ensure that
the preceding statement holds for all J, with probability at least 1-n-9 = o(1). In
the subsequent parts of the proof, we will assume that this is the case.
The proof then proceeds in two parts. In the first part, we show that there exists
a constant %e > 0 such that the probability that a connector vertex is in the giant
component is at least %e. Let T be a random Galton-Watson branching process which
is the edge-biased branching process specifically associated with a connector vertex.
Then in T, the root has offspring distribution Bin(d -1, p) and each offspring of the
root acts as a root of the standard edge-biased branchingprocess associated with the
degree distribution in (4.10)(see Section 3.3 for details). Recall from Chapter 3 that
111
the condition for existence of a giant component is equivalent to the condition that the
edge-biased branching process has a positive survival probability. Hence, denoting
by qr,,,
the survival probability of the standard edge-biased branching process, we
have that p(d - 1) > 0 implies q.,,j
> 0. Define q = P(IT = o) as the survival
probability of the branching process T. Then q = E'
bp(d - 1, i)q ,.,ivc > 0.
Fix one of the connector trees T. Recall that T has n 2 - leaves that axe identified
with n2-b of the connector vertices. Denote the set of leaves of T by LT. Let I E LT
be any leaf. Then using the coupling argument in the proof of Lemma 3.10, we can
show that for large enough n,
P(l E Giant component of J,) > 0.9q A qg.
(4.20)
Now we show concentration of the number of vertices in LT in the giant component of J around its mean value. Let Ns denote the number of vertices in LT that
are in small components, i.e., components of size less than 8 log n. Following the
proof of Theorem 3.2 (a) and denoting by Ej the random edges of the configuration
model, we get as before
IE[NsIEi, ...
, Ek]
- E[NsIEi,..., Ek+,] <_ 4/3logn.
(4.21)
Note that E[Ns < n2-'(1 - q.). So using Lemma 3.9 we get,
62n2-2b
P (in(b(Ns - (1 - qc))I > 6) < 2e 8c,
2
1og 2 n.
(4.22)
The proof then follows by choosing 6 < qc and defining 6 =qc - 6 and then taking
a union bound over all the -ynb+1 trees in 7W,.
112
E
Equipped with the above lemmas regarding the giant component of each J, we
now proceed in the following section to analyze the structure of any maximum cut
of 74.
4.4.3
Properties of MAX-CUT on '4
In this section, we analyze the properties of a maximum cut CH, of 74. Before that,
we first state some well known results regarding Galton-Watson branching processes
relevant to our case. These results apply directly to the connector trees attached to
the bipartite graphs J,.
Let T be a Galton-Watson branching process with offspring distribution X
Bin(d - 1,p). Denote by m = E[XI = p(d - 1) > 1 and let ZN be the number of
vertices at depth N in T. It is known that (see for e.g., [2]) there exists a non-negative
random variable W such that
lim m-NN = W,
N-+oo
w.p. 1.
(4.23)
We now state a couple of known lemmas about the branching process T and the
distribution of W.
Lemma 4.4. Let 1 < y < m be a constant. Then,
P(ZN = 0) =Pezt + ON()
(4.24)
where p_ is the extinction probability of the branching process, and
P(0 < ZN < yN)
P(O < W < (y/m)N)(1 + oN(-)).
113
(4.25)
Here oN(1) denotes terms that converge to 0 as N -+ oo.
Proof. Both the statements can be proved directly by using KN= yN in Corollary
0
5 in [17] and observing that E[Zi log Z1] <00.
Let f,(s) A E__~ b,(d - 1, i)s' be the probability generating function of the
offspring distribution of T. Define a A
9o f'(Pext) > 0.
Lemma 4.5. There exists C > 0 such that for any 0 < x
<
1, we have
(4.26)
P(0 < W < X) < Cxa.
Proof. From [6], [14], we have that there exists a constant C > 0 such that the
density function w(x) of W exists and satisfies for all 0 < x < 1,
.
w(x) < Cx"-
The proof then follows by integrating both sides of the above inequality.
L
We will assume that the quantity b > 0 is chosen such that it satisfies the following
inequality:
b < min
2m
1.9 log (+
,
0.01, aplog
2
(~1 + m) 9 log(d - 1)
1
.
(4.27)
As we will prove shortly, the branching process T truncated at depth Plog n is
either already extinct or has O(mAIon) leaves w.h.p. as n -+ oo. This motivates the
definition of the following property for the bipartite graphs J,
114
Property 4.1. All trees associated with J, have either no leaves or have at least
ntL leaves, where 6L > 0 is a constant defined by
or19
log (1-')og(2
log(d - 1) ~
2
109) 12 .(428
log(d - 1)
(4.28)
The inequality follows from (4.27).
The following lemma characterizes when Property 4.1 holds.
Lemma 4.6. The following statements hold for the trees in 7i.
(a) There exzists pL > 0 such that for any v E V, Property 4.1 is satisfied by J,
with probability at least 1 - n~PL for large enough n.
(b) Let NL be the number of vertices v e V for with J, satisfies Property4.1. Then
there exists 6 3 > 0 such that for large enough n,
P(NL > 0.9999n) > 1 - e-n.
Proof. Let T be any tree connected to J,. Then T is a branching process with
offspring distribution Bin(d - 1, p) and depth N = p log n where recall that M =
2b.
Let 1 < y A I-" < m. Using this value of N and y in Lemma 4.4 and
log(d-1)
2
==>
<( Zw<
Z
< nWL) < n-P
where the second inequality follows from the fact that yo
(4.29)
= n (2-b)
>tn6
,
P(O < Z, iogn < y''*g") < C(Y/y/m
*
Lemma 4.5, we get
with 0 < p A aplog (2n). Taking a union bound over all n' trees in J(v), we get
115
that J, satisfies Property 4.1 with probability at least n-()
that p - b > 0). By defining p
(note that (4.27) says
- p - b > 0, the proof of part (a) is complete.
Since the gadgets J, are independent, part (b) follows by using standard large
0
deviation inequalities.
A cut on 74, is equivalent to assigning binary values to the vertices of 74,, where
the size of the cut is determined by the number of edges whose adjacent vertices
have opposite assigned values. We can now prove the following property that must
be satisfied by any maximum cut of 7ip. We call this property the
full
polarization
property.
Proposition 4.2. With high probability, every maximum cut Cu, satisfies the full
polarizationproperty, i.e., for every v E V, one of the following is true:
Option 1: All but O(nL-2b) vertices in R, that are in the giant component of J,, are
assigned the value 1 and all but O(n&-Lb) vertices in S. that are in the giant
component of J, are assigned the value 0.
Option 2: All but Q(nL- 2 b) vertices in R that are in the giant component of J,.,, are
assigned the value 0 and all but O(n'L-) vertices in S, that are in the giant
component of J,' are assigned the value 1.
To prove that each J,, satisfies the full polarization property under any maximal
cut C1 ,, we need to prove a property of the giant component on J, that we call the
strong connectivity property. This is stated in the following lemma.
Lemma 4.7 (Strong connectivity of the giant component). The following statement
is true with high probability: For every partition of the giant component of J, into
116
two parts A , and B each of size at least nk- 2 b, there exists at least n6 edges between
.
A,, and B,
Proof. Fix any v E V. Let E be the event that the statement given in the lemma is
false, i.e., there exists a partition A., B, of the giant component such that
jBvj > nb-L
IA, >
but there are less than n' edges between A , and B,, . Then deleting
these edges leads to B,, being disconnected from A, , . Suppose B , ,1, B, 2 ,..., B,,
are the components of B, after being disconnected from A,. Then since each of the
components were originally a part of one singe connected component, each B,, must
have had at least one edge connecting it to A,.
We claim that, conditioned on the event E, there exists an 1 < i < M such
that |B ,,5 > n8 -5b. To prove this, suppose that for all i, jB,,j 5 n M
=
. Then
n . Since, each of the components has at least one edge with A,
there are at least n' edges between A , and B, . This contradicts the assumption
that event E holds. Since IA , 1
tB,, , the above claim holds for A. as well. In
other words, denoting by A , ,* and B,,, the largest components in A,, and B,, after
removing the connecting edges, we have that conditioned on the event E, both A.,*
and B,, are of size at least nk-b
Suppose that we introduce further randomness into the random graph
7,.
In
particular, we delete each edge of W, independently with probability f(n) = e-nb
and obtain the random graph W,. Note that 74 can also be obtained directly from
W by retaining each edge of W with probability p(l - e-nb). Denote the bipartite
gadget corresponding to J, in N, by J..
Let A,,,. and B,, be the two largest
connected components of J.. Let F be the event that in the random graph JA, we
have |A,4*
B,,,*j
First, we think of
n6 L-. We will now find upper and lower bounds on P(F).
7,
being obtained by first generating W, and then deleting
117
each edge of Up with probability f(n). Then, conditioned on the occurrence of the
event E in
4,, the
probability that all of the edges between A, and B, are deleted
and none of the other edges are deleted is at least (e-nb)n2 (l
enb) 2 . This results
-_
in at least two components Av,, and B,,. each of size at least n61-'. This leads to
the following fact:
enb)c
2
.
Fact 4.1. P(FIE) > (e-n")(1
Second, we think of obtaining 'p, directly from W by retaining each edge with
probability p(l - e').
For large enough n the difference between the degree distri-
bution of J, and J,, is insignificant. More precisely, all of the results in (4.7)-(4.9)
about the degree distribution still holds for J. As a consequence, the statement
in Lemma 4.2 holds as well, i.e., with probability at least 1 - e~ 1 "2 there exists a
time tn in the exploration process of J when both Al(tn) and Al(tn) are at least
6n2 . In this case, the component corresponding to these active cones is the giant
component and is of size
E(n').
Now, occurrence of the event F requires that there
are two components in , of size greater than n - b. This means for F to occur,
there must be a component other than the giant component in
au with size at least
n6L-. Let w be any vertex of J. unexplored till time tin. Then the probability
that the component associated with w is of size at least n6L-b and is not the giant
component is at most (1 - 6)
-
= e1O(1-)nL
. Taking a union bound over all
vertices in J, we have that
P(F)
2 n2,1og(1-6)fl-Lb(_
-
+ e~1" 2 ,
(4.30)
where the second term corresponds to the probability of the converse event in Lemma
118
4.2. Combining with Fact 4.1, we have that
(E
P(F)
P(FIE)
e-1n 2) + e~e"2
e-nb) CI 2
<4n2e1Og(1.A)niL
2n2e10-(1)ntL-"(
e-nb(1
-
-
5-+n
From (4.27) we have that b < 6 L/9. Hence, we get
.
P(E) < e-e(nb)
The proof then follows by taking a union bound over all v E V.
0
Proof of Proposition4.2. Let R,, c R be the part of the giant component of J,
that is assigned the value 1 and Rv,o
c
R, be the part of the giant component of
J, that is assigned the value 0 in a maximal cut Cw,. Similarly define S,,1 and S.,o.
Define A. = R,1 U Sv,o and B. = Rv,o U S,,1 . Assume without loss of generality
that IB.1
IA,|.
Suppose that IB,1 > n6 L-b. Then using Lemma 4.7, with high
probability, there must be at least n' edges between A, and B,. These edges are
not a part of the cut CW,.
Consider the cut Cu, obtained from Cu, by modifying only the binary assignments in J, and the trees T, as follows:
* Assign all of R, the value 1, all of S, the value 0.
e Assign all even levels of the trees TR. the value 1 and all odd levels 0.
e Assign all even levels of the trees Ts. the value 0 and all odd levels 1.
Let Z, be the set of all edges of 74 with at least one end in J, U T,. Then, the
number of edges in the cuts Cu, and Cu, outside of the set Z, are equaL Within
119
the set Z, the cut Cu has at least IZj
-
0(n') edges in it, i.e., all edges of Z,
with the possible exclusion of the connector edges. On the other hand, the number
IZvI -
of edges of Z, in the cut CW, is at most
0(n). Hence, the cut Cw cannot
be a maximal cut of T, and we arrive at a contradiction.
0
We have established that all of the giant components are (almost) fully polarized
in any maximum cut. However, there are two different choices for the polarity of
each giant component, obtained by switching 0 and 1. We now show that the choice
of polarity affects the size of the cut. In other words, the number of connector edges
between two gadgets J. and Jv that are in a cut depends on whether the giant
components in J. and J, have the same or different polarity. For this to happen, the
trees forming the connection should have some of their leaves in the giant component.
This is summarized in the lemma below.
Lemma 4.8. With high probability, for every v E V, every tree associated with J,
6 L leaves in the giant component of
that has greater than n6L leaves, has at least e.n
J,, where be is the constant in Lemma 4.3.
Proof. Let E be the event that for every v E V, every tree in T, has at least &n 2 -b
of its original leaves (before edge deletions in the trees) in the giant component of
J,. From Lemma 4.3 (a), we have that P(E) > 1 - n-'.
Let T be any tree in J, with at least nL leaves after edge deletions. Let NT be
the original number of leaves of T in the giant component of J,. Then conditioned
on E, we have NT
Scn2-.
Then the expected number of leaves of T in the giant
component of J, after edge deletions is
6cfnl.
The proof then follows by standard
large deviation arguments and taking a union bound over all trees in N,.
LI
120
Definition 4.1. We call two trees adjacent or neighboring if their roots were matched
during the construction of W-.
Note that if a tree has zero leaves, or its neighboring tree has zero leaves, then it
is inconsequential in affecting the polarity of the maximum cut Cu,. This motivates
the following definition.
Definition 4.2. We call a pair of adjacent trees (T, t) with T E T&,, (or Ts,,u) and
T E T,,
(or Tsu,, ) useful if
(a) T has non-zero number of leaves in the connector vertices of J,.
(b) T has non-zero number of leaves in the connector vertices of Ju.
(c) The edge connecting the roots of T and T was not deleted in the edge deletion
process.
We call a tree T useful if it belongs to a useful pair.
Let EL > 0 be a fixed constant that satisfies
1 - EL
1.5
1.5007
l+EL
(4.31)
We now prove the following fact about the number of useful trees associated with
any edge e = (u,v) E E.
Lemma 4.9. For any e = (u, v) E E, let NR,e be the number of useful pairs of trees
in T.,, and T&,u and let Ns,e be the number of useful pairs of trees in T&,, and
Ts.,U. There exists a constant 6 u.ep > 0 such that, with high probability, for every
121
(u,v) E E, both of the following conditions are satisfied:
(1
-
EL)usepfnl
< NR,e < (1 + EL)6tsefuinb,
(4.32)
(1
-
EL)usefuinf
Nse < (1 + EL)(usefun.
(4.33)
Proof. For any e = (u, v) E E, there are 2nb pairs of trees in Ru, R, and n pairs
of trees in Su, S, associated with it. Let pet be the extinction probability of a
Galton-Watson branching process with offspring distribution Bin(d - 1,p). Then
the probability that any tree T has zero leaves is peet + o(1). Let T E TR,,, and
T E TR,,u be a pair of adjacent trees associated with e. Then the pair (T, T) is useful
iff neither of them is extinct and the edge connecting their roots is not deleted. Using
independence,
P((T, T) is useful) = (1 - p,,t + o(1))2p.
(4.34)
Note that conditioned on the event E, a pair of adjacent trees (T, T) being useful is
independent of all other pairs. Let Jumful
=
2(1 -
pe.t)2 p. The proof then follows
by standard large deviation inequalities and taking a union bound over all pairs of
adjacent trees in 74.
Recall that NL denotes the number of vertices in V for which J, satisfies Property
4.1. Define VL to be the set of these vertices. Also, let EL
c
E be the set of all edges
(u, v) E E such that both u, v E VL. We now prove the following fact about the
useful trees associated with any edge (u, v) E EL.
Lemma 4.10. With high probability, for every e = (u, v) E EL, every useful tree
T e
,
(or Ts,,,) associated with e has at least An&L of its leaves in the giant
component of J,.
122
Proof Follows immediately by using Lemma 4.8 and the definition of a useful tree.
0
We are now ready to prove our main proposition which relates a maximum cut
to an E-optimal cut CG of g. First we give the following definition:
Definition 4.3. Let C
be any cut of 4 where all the giant components satisfy
the full polarizationproperty (see Proposition 4.2). Call the polarity of J, to be 1 if
Option 1 in Proposition 4.2 holds, and 0 otherwise. Let Cg be a cut of g. Then Cu,
is said to have polarity according to Cg if one of the following holds:
1. For all v E V the polarity of J, in C
is same as the binary assignment of v in
C9.
2. For all v E V the polarity of J, in Cu, is opposite of the binary assignment of
v in Cg.
Proposition 4.3. Let C be a maximal cut of g. Let Cg be any cut of g such that
jCgI < |CgI - 0.001n. Then any cut Ch, of
14
with polarity according to Cg is not
a maximal cut of l4.
Proof. Recall that CL is the set of vertices in E both of whose incident vertices are
in V,. From Lemma 4.6, with high probability, the number NL of vertices in VL is
at least 0.9999n. Assume that this is true. Then, since g is 3-regular, we must have
CEL I > 1.5n - 0.0003n. Further, also assume that the statements of Lemma 4.9 and
Lemma 4.10 hold.
Since |CgI <
IQ
- 0.001n, the cut Cg has at- most
Hence there are at least (1.5n - 0.0003n) - (IC
IQ
- 0.001n edges in EL.
- 0.001n) edges in EL that are not
in Cg. Let (u, v) E EL be such that (u, v) 0 Cg. Then, we know that there are at
123
least J,,sejft(1 - EL)nb useful pairs of trees associated with (u, v) and each of these
trees have at least
rnt 6 L leaves in the corresponding giant component. Let T and T
be such a useful tree pair. Since Cu, satisfies the full polarization property, in each
J., all but O(nWL2b) vertices in the giant component of J, obey the polarity of J,.
Let the polarity of Jv be b, E {O, 1}. Since T has at least an5 L leaves in the giant
component of Jv. there must exist at least one leaf L of T in the giant component of
J, such that L is assigned the binary value b,. Similarly, there must exist at least
one leaf L of T in the giant component of Ju such that L is assigned the binary value
bu.
Since (u, v) 0 Cg, we must have bu = bv. Observe that the path from L to L
passing through T and T has odd number of hops. Hence, there is at least one edge
in this path that is not in CH,. This means that for each (u, v) E EL such that
(U, v) 0 Cg, there are at least 6 usefui(1 - EL)nb edges not in Cw,. Then, the number
of edges in the cut CH, is bounded above by
ICu,,
1;i,,l - (1.5007n -1 Cg)8,ufe(1
- EL)nb.
(4.35)
Now consider a cut C s of 74, constructed as follows:
(a) For each v E V, if v is assigned the value in 1 in the cut C, then assign
all vertices in R, the value 1 and all vertices of S, the value 0. Perform an
analogous assignment if v is assigned 0 in C.
(b) Suppose T is a useful tree. Then assign all even levels of T including the root,
the value assigned to the leaves of T.
(c) Suppose T is not a useful tree. Then either T has no leaves, or its adjacent
tree T has no leaves, or the edge connecting the roots of T and T has been
124
deleted. In either case, perform an assignment such that all of the edges of T
are in the cut C*.
From the above description, the only edges of EN, that are NOT in C*
are:
* The connector edges between roots of useful pairs of trees T E Tu and ' E T,
such that u and v have the same binary assignment in C, i.e., (u, v) 0 C.
The number of such edges is upper bounded by (1.5n |Cj)6 efu(1 + EL)n6.
So the number of edges in the cut C, is lower bounded as
ICUt|
tEUiI
- (1.5n - JCg*)6taeIt(1 + EL)n
(4.36)
Combining (4.35) and (4.36) we have,
IC
I
-
IC'4
(1.5007n - IC3I)&u 1f 1(1 - EL)nb
-
(1.5n -
)6
(1+ EL))nb
> [1.5007(1 - EL) - 1.5(1 + EL)J6.,efun b+1
> 0,
where the last line follows from (4.31). Hence Cw, cannot be a maximal cut of 74
and the proof is complete.
0
With the properties of Cu, proved in the preceding sections, we are now ready
to complete the proof of Theorem 4.2.
Proof of Theorem 4.2. We are now in a position to provide the details of the algorithm A 2 . Given a maximal cut Cu, of 74,
* Find the largest connected component C, of every bipartite graph J,.
125
* Find the polarity of C, i.e., declare the polarity to be 1 if the number of
vertices in C, fl & assigned the value 1 is at least as many as the number of
vertices in C, n S,, and declare the polarity to be 0 otherwise.
" Produce a cut Cg of g by assigning the vertex v the value same as the polarity
of J".
From Proposition 4.3 it follows that the cut Cg must satisfy
jCgj
ICZJ - 0.001n > (1 - ecita)IC ,
with high probability, where the inequality follows from the fact that
4.5
(4.37)
IQI > n-1.
l
Conclusion
We showed that the thinned MAX-CUT problem on graphs with maximum degree
d undergoes a computational hardness phase transition at pc = 1. However, it is
unclear whether or not the problem is computationally tractable when p = pc. We
conjecture that the problem remains NP-hard in this case.
126
Chapter 5
Algorithms for Low Rank Matrix
Completion
5.1
Introduction
Matrix completion refers to the problem of recovering a low rank matrix from an
incomplete subset of its entries. This problem arises in a vast number of applications that involve collaborativefiltering, where one attempts to predict the unknown
preferences of a certain user based on collective known preferences of a large number
of users. It attracted a lot of attention in recent times due to its application in
recommendation systems and the well-known Netflix Prize.
5.1.1
Formulation
Let M = a#' be a rank r matrix, where a E Rx,
P E Rnx and suppose that
r << m, n. Let E c [m] x [n} be a subset of the indices and let Me denote the entries
of M on the subset E. The two major questions that arise in matrix completion are:
127
(a) Given Me, is it possible to reconstructM? and (b) Are there efficient algorithms
to perform this reconstruction?
Without any further assumptions, matrix completion is NP-hard [41]. However
under certain conditions, the problem has been shown to be tractable. The most
common assumption adopted is that the matrix M is "incoherent" and the subset C is
chosen uniformly at random. The incoherence condition was introduced in [10], [11],
where it was shown that convex relaxation resulting in nuclear norm minimization
succeeds with further assumptions on the size of S. In [35] and [36}, the authors
use an algorithm consisting of a truncated singular value projection followed by a
local minimization subroutine on the Grassmann manifold and show that it succeeds
when JEj = Q(nrlog n). In [28], it was shown that the local minimization in [351 can
be successfully replaced by Alternating Minimization. The use of Belief Propagation
for matrix factorization has also been studied by physicists in [33].
For the rest of the chapter we will assume that m = n for simplicity of notation
and the results easily extend to the more general m = O(n). Let ( be a bipartite
graph on the vertex set V = VR U Vs corresponding to the rows and columns of M
and with edge set E. Let VR = {ri, i E [n]} and Vs = {si, i E [n]}. Denote by
A = A(n) the maximum degree of g. The graph g represents the structure of the
revealed entries of M. We denote the ith row of a by ai and the jth row of 6 by 8j.
Note that M1 , = a'i#. Matrix completion can be recast as the following optimization
problem over !:
3 - Mij 12,
min Xx'sy
X,YERnxr (jE
(5.1)
where xi and y3 are associated with vertices ri E VR and s3 E Vs and denote the
ith row of X and jth row of Y respectively. The above optimization problem is
128
non-convex and as expected, it is in general NP-hard to solve.
5.1.2
Algorithms
We provide three algorithms for matrix completion that operate on the graph g:
Information Propagation (IP), Vertex Least Squares (VLS) and Edge Least Squares
(ELS).
Information Propagation (IP) is a simple sequential decoding algorithm on the
graph g. In Section 5.2, we provide sufficient conditions when IP successfully recovers
M from Me, and show that under these conditions it takes only O(n) computation
time when r = 0(1).
In Section 5.3, we cover VLS and ELS, which are iterative decentralized algorithms. The VLS is identical to the Alternating Minimization algorithm in [28],
where it was used as a local optimization subroutine following a singular value projection, i.e., with a warm start. This was motivated by the fact that VLS was a major
component in the award winning algorithms for the Netflix challenge [401, [39]. In
this chapter, we also study the use of VLS with a cold start, where it is used as the
sole procedure for matrix completion. For the special case when M is rank one, we
provide sufficient conditions when VLS converges to the right solution.
The ELS is a new algorithm which is obtained as a message-passing variation of
the VLS. We provide experimental results, where we observe that ELS significantly
outperforms VLS both in terms of sample complexity and convergence speed. This
also suggests that replacing VLS with warm start by ELS with warm start in existing
algorithms might significantly improve their performance.
129
5.2
Information Propagation Algorithm
Information Propagation (IP) is a simple sequential decoding algorithm on g that
works when g has certain "strong connectivity" properties.We first formally state
the steps in the algorithm:
INFORMATION PROPAGATION (IP)
1. Initialization
o Initialize the set of decoded vertices Do = {r,... ,r,.} and X = Y = [0].
o Initialize the set of potential vertices Po = {v E Vjv has at least r neighbors in Do}.
o Set xi = e. for i
2. Repeat until P
=
1,
...
,r, where ej is the ith standard unit vector.
0:
o Pick v E Pt. Assume w.l.o.g. that v E Vs, i.e., v = sj for some j.
o Let ri,, ... , r, E Dt be neighbors of s . Set yj to be a solution to the set
of linear equations x'y =
o Set Pt+l = Pt\s and Dt+1 = Dt U sj.
3. Declare M = XY'.
Since, IDt+iI=
IDtI + 1,
the algorithm terminates after at most 2n -r steps. This
algorithm has similarities to the rank one recursive completion algorithm in [371. We
present the following theorem regarding the performance of IP which admits a simple
proof.
130
Theorem 5.1. Assume that M = a3' with a,,8 E R"'. Also assume that every
collection of r rows of either a or 6 are linearly independent. Suppose that the
algorithm IP startingfrom some initial Do terminates uith DT = V. Then M = M.
Proof. Let B be the sub matrix of a consisting of the first r rows of a. Since the
first r rows of a given by a1 ,..., a, are linearly independent, B is full rank, we can
set a +- aB
and
+- PB'. So w.l.o.g. assume that ai = e, 1 < i < r. By the
initialization step, XD. = avD. Assume that after t steps of the algorithm we have
XDtnvR = aDtnvR and YDtnvs = ODtnvs. Let sj E P be the vertex picked at time
t + 1 in step 2 of the algorithm and let r,...
, r,. E Dt be neighbors of s. Then
we must have for agp3 = M ,,. But since, a1 ,,..., aj,. are linearly independent, the
update at step t +1 indeed sets yj = /3,. The theorem then follows by induction.
L
The main requirement for the success of IP is that it doesn't terminate prematurely, i.e. we do not encounter Pt = 0 for some t < 2n - r. This requires that g
satisfies a certain "strong connectivity" property. For the special case when M is
rank one, assuming that all entries of a and 8 are non-zero, the only requirement
on g is that it is connected. More generally, for r > 1, the required "strong connectivity" property can be defined via the steps of IP itself. The following condition is
sufficient, but not necessary.
Assumption 5.1. For each B C V such that r < JBI < 2n and jB n RI > r, there
exists a vertex v E V\B with at least r neighbors in B.
When the above assumption holds, then unless IDtI = 2n, i.e., all vertices have
been decoded, the corresponding set of potential vertices Pt that can be decoded
is non-empty. Furthermore, this sufficient condition can be verified easily for some
graphs, for example, radom Erdos Renyi graphs. Specifically, we have the following
result.
131
Theorem 5.2. Let A be a large enough number such that Are-A+r < 1/4. Let c be a
fixed constant that satisfies (c/2)e- 2 A > r + 1. Let
g
be a random bipartitegraph on
2n vertices with n vertices in each part where every edge is present with probability
p = (.*n)1/r.
Then if r = 0(1), the random graph
g
satisfies Assumption 5.1
w.h.p. as n -+ oo.
Theorem 5.2 says that
This requires an extra n'
1
[ej
= O(n 2-1/r(logn)1/r) is sufficient for IP to succeed.
/' factor of revealed entires as opposed to 6 1= 0(n log n)
in [35]. The benefit is that, with these extra entries we can now perform matrix
completion using IP in only O(n) time. For the case of rank one matrices r = 1,
we only need connectivity of the graph, and Theorem 5.2 reduces to the well-known
l"o
threshold of connectivity of Erdos-Renyi random graphs.
Proof of Theorem 5.2. For r = 1, the theorem is same as the well-known result about
connectivity of Erdos-Renyi random graphs. We will assume for the rest of the proof
that r > 2. We will show that for every partition of g = (R U S, S) into two parts
B and BC with
IBI
> r, there exists at least one vertex in BC that has at least r
neighbors in B. Assume w.l.o.g. that jB n RI
lB n S|. Recall that lB f R
> r.
Let v E BC f S. Then the number of neighbors of v in B n R which we denote by
D, is distributed as D, ~ Bin(IB n RI,p). Hence, the probably that none of the
vertices in BC n S has at least r neighbors in B n R is given by:
[P (Bin(IBnAR|, p) < r)]"~I"nsi < [P (Bin(IB n RI, p) < r)] max{n-jsnai,1},
where the inequality follows from the assumption IBfRI |
(5.2)
IBnS and that IBCI> 1.
Let E be the event that Assumption 5.1 is not satisfied. Then by the union bound
132
over all possible sizes of lB n RI we get,
n-1
P(E) <
n [P (Bin(bp) < r)]"- "+ [P (Bin(n,p) < r)].
(n)
We will show that for every r < b < n - 1 the quantity
[P (Bin(b, p) < r)]nb is
o(n-') and that the same is also true for [P (Bin(n,p) < r).
We divide the proof into three cases depending on the size of b:
Case 1: b > n/2
In this case we have
P (Bin(b,p) < r) < P (Bin(n/2,p) < r)
<
(p -ln/2-r
r n/2)r
r
)
\
2
(Clgfl1/, n/ r
< rnr clogn
n
clog
n n )I
< crnr-' log n
=
e~(V')
(n/2-r)
e~( O)
(since r > 2).
Therefore,
Q[P (Bin(b,p) < r)]
"* <
Case 2: Alp < b < n/2
133
"-b e-n-"
(5.3)
"-~
= o(n-1)
Again we have
P (Bin(b,p) < r) < r ()p
(1 - p)br
< 1/4.
<(bp)re -p(b)
Hence,
(n) [P (Bin(b,p) < r)]nb < n
-2
=n
-)
Case 3: r < b < A/p
We get
P (Bin(b,p) < r) < 1 - P (Bin(b, p) = r)
p)br
((1
b1-
clogn
b clogn
(1r1 n L
<
-e-clo
<
-
n
n(
o
134
e-2A.
f
~
for large enough n
.-
- .. .. .--.
-. JULLJ.1--.
- - .- -
-.
-.
- "-0--r." -. "' --
Hence,
()
[P (Bin(b,p) < r)]"~b
1 -
cl5
clon -2
< e* *lon-01) nM~-
1o-'gn[(C/2)e-2A(b)-b}
= o(n1).
5.3
Vertex Least Squares and Edge Least Squares
Algorithms
In this section we provide two iterative decentralized algorithms that attempt to
solve the non-convex least squares problem in (5.1). The first is what we call the
Vertex Least Squares (VLS) algorithm.
VERTEX LEAST SQUARES (VLS)
1. Initialization:
* Xi,yyj,o E R',
ij
E [n]
* Maximum Iterations = T
135
2. For i = 1 to T: For each ri E R, set
xi+t
= arg min E
ji)2.
(5.4)
arg nun ~II (y'xj,t+i - Mi3 ).
(5.5)
XERr
(X'g~ -
j:s3 -'r,
For each s E S, set
=jt~
3. Declare M
XTYI.
Each iteration of the VLS consists of solving 2n least squares problems, so the
total computation time per iteration is O(r2LAn).
The VLS algorithm in the above form is identical to Alternating Minimization
[28] and exploits the biconvex structure of the objective in (5.1). We prefer to write
the iterations of this algorithm in the above form to highlight the local decentralized
nature of the updates at each vertex. In [28], this algorithm was used as a local
minimization subroutine with a warm start provided by an SVD projection step
prior to it. However, we observe in our experiments that in many cases, VLS by
itself can succeed in reconstructing M.
To support our observations, we present the following theorem regarding the
convergence of VLS in the special case of positive matrices with rank 1. Recall
that the Frobenius norm of a matrix A is denote by IIAIIF and is given by IIAIIF =
Theorem 5.3. Let M = a#' with a,/8 E R
and suppose there exists 0 < b < 1
such that for all i, j E [n], we have b < ai, I3 < 1/b.
136
Suppose that the graph g
is connected and has diameter d = clog n for some fixed constant c and maximum
degree A = 0(1). Suppose that VLS is initialized at b < xj,O
1/b for i E [n].
Then, there exists a constant a1 > 0 such that given any e > 0, there exists a
second constant a2 > 0 for which after T = a2 n 1 log n iterations of VLS, we have
IIIXTY
- MIIF <
Before proceeding to the proof of Theorem 5.3, we remark here that in [28], the
success of VLS was established by showing that the VLS updates resemble a power
method with bounded error. In our proof we also show that VLS updates are like
time varying power method updates, but without any error term. In [28], the warm
start VLS required that principal angle distance between the left and right singular
vector subspaces of the actual matrix M and the initial iterates are at most 0.5.
With the conditions given in Theorem 5.3, this may not always be the case. From
[28], the subspace distance between two vectors u and v (rank 1 case) is given by
d(u,v)=-
-(5.6)
Suppose that
b,
1 < i < n/2,
1/b,
n/2+1ti
Cii
Then d(xo, a) = 1 -
,iO=
n.
1/b,
1 < i < n/2
b,
n/2+1 < i < n.
(5.7)
is greater than 1/2 when b is a small constant. In fact the
subspace distance can be very close to one. Nevertheless, according to Theorem 5.3
VLS converges to the correct solution.
137
Proof of Theorem 5.3. From the update rules for VLS in Eq. (5.4)-(5.5), we can write
tiy't
yj+1 and
2
t
(5.8)
xi,t+i =
2
:sg~
,Ei:s ~-ri
I4,+1
1,
Let A be the partial adjacency matrix of g, i.e., A:, =
0,
if ri ~- s
i
otherwise
Define U,t =x~t and Vjt= Yt. With the chosen initial conditions in the theorem,
we have that b2 < ui,o, vj,o
1/b 2 . Using (5.8), the updates for ut,t and v,,t can be
written as,
=1,t+=
and
iU.t..
(5.9)
The convex combination update rules in (5.9) imply that all future iterates satisfy
b2
uit, ,,t
1/b 2 and bP < xi,t, yt 5 1/b. Combining the two updates in (5.9),
we see that u,,t+, can be expressed as a convex combination of ui,t, i.e., there exists
a stochastic matrix Pt such that ut+1 = Ptut, where Ut = (ui,t, i E [n]) expressed
f
as a column vector. It is apparent that the support of Pt is same as the support
EE
where Aij
matrix of 1
i.e., Pt is the transition
of AA', probability
0, otherwise
a random walk on (VR, ER), where (i 1, i2 ) E ER if and only if i2 is a distance two
neighbor of i 1 in g. Although Pt depends on t, we can prove some useful properties
satisfied by Pt that hold for all times t as stated in the following lemma.
Lemma 5.1. There exists a constant 0 < -/ < 1 that depends on b and A such that
the non-zero entries of Pt satisfy y : P ,ij for all t.
138
i' k0t, A
04,$
A ji
E2 k:sjvrjk
>
2
Y3,t
Ek:ak~ri
>
A1-2A.j
AOL
k't
and R( )
-
P7wf. Notice that Pt = Rt(')Rt(2) 7 where R(1)
t'ij
A j. Hence the non-zero entries of Pjj must be at least -y A
520
03
Given that the the diameter of g is d, define the sequence of matrices {Qk}Mi_
as
kd
Qk
=
1
(5.10)
Pt.
t=(k-1)d+1
Then for any k, Qk satisfies
Qiy>
-Id = yclog*
A &, where a1 = -clogy > 0. Let
wt,i = udt,i, i E [n]. Then,
maxwt+,s
max wt,j(1 - n~") + min wt,in",
min wt+l,t > min wt, (1 - n~") + max wt,in".
Combining the above gives
(max wt+,1 - min wt+i,j) < (1 - 2n~"')(maxw,i - min wt,).
Si
i
(5.11)
Let 6 > 0 be a small enough constant such that + < 1+ e < ---. Choose a2 to be a
large enough constant such that
-202
< S. Then, for T = a 2n"a we get by applying
(5.11) recursively,
<6o.
-
min wo,j)
(5.12)
(5.13)
Substituting the definition of wtj, we get (max &uri- mini uor,) ; 6, where T
139
-
(max w 1,j - nun wt,) < (1 - 2n" )a2na(max wo,
ItS
a 2 cn*l log n. This means there exists a constant b < B
< 1/b such that
(B - J, B + 6) for all i E [n]. From 5.9, we get that
E (B - 3, B + J) for all
1
UTi E
j E [n]. This gives
Ui,TV,T E
Hence, Iaj/3i -
Xi,TYj,Ti =
(B-
B+5
c(l
jaifi(1 - Ui,TVj,T)
E,1+E).
,
(5.14)
which completes the proof.
We now proceed to describe the Edge Least Square (ELS) algorithm, which is a
message passing version of the VLS algorithm.
EDGE LEAST SQUARES (ELS)
1. Initialization:
0 X,
yj. E R,
(i,j) EC
* Maximum Iterations = T
2. For t = 1,..., T: Foreachri E Randj: ri ~ sj set
xi-jt+l = arg mm
XERr
('y
k:sk,,rj,kAj
140
,,
-
M )2.
(5.15)
For each sj E S and i : ri ~ si set
+<,++1t=
i
arg
(Y'Xk-+j,t+1 - Mi ) 2 .
(5.16)
k:r1 '-'a3 ,kfi
3. Compute
XtT
4. Declare M
=
Xi.+.j,T and Yj,T
-
=
dg(e,
i,T-
>.ijr-.ij
XTYT.
Each iteration of the ELS consists of solving 2161 least squares problems, so the
total computation time per iteration is O(r 2 AICI).
For the special case of rank one matrices, it is possible to conduct an analysis on
the ELS iterations along the lines of the proof of Theorem 5.3 for VLS. Let W be the
dual graph on the directed edges of g. Here W = (VH,
6H)
where
VH = VHR,
U VH,c
and VH,R consists of all directed edges (r,, s,) of g and VHs consists of all directed
edges (sj, ri) of g. Additionally, (ri, sj) E VH,R and (sk, r) E VHS are neighbors if
and only if j = k. Similar to (5.8), we can write the corresponding update rules for
ELS as follows
=k:kI.J~,~jM
ikYk.4i,t
2-+j,t+2
Ek:sA;~ri,k-Aj yk-i~kki
anE
and
k:srrk,k96i MkjXk-..j,t+1
yj-*i,t+-
2
Ek--j
(5.17)
Define u
,t
=
and vj.+.,t =
.
Then, similar to (5.9), we can write the
corresponding update rules for ELS as follows.
92.+,
U-+j,t+
1
1
and
ykit
k:ak~rgr,kfj Z1:8gLfj Y-+it Vk-+i,t
Vj-+i,t
k__+____
LaXk
_____
-
k:aiirxirkk#i
43
:s
f
,t _uk.+jt
l
(5.18)
141
Again, as before, defining u = (uj_+j, ri ~ sj) we can write ut+i = Put for some
stochastic matrix Pt. The support of Pt is the graph VHR where the two vertices
are neighbors if and only if they are distance two neighbors in X. From the above
equations, it is apparent that it is possible to prove a result similar to Theorem 5.3
for ELS. We state the result below. We omit the proof as it is identical to the proof
of Theorem 5.3.
Theorem 5.4. Let M = a#' with a, P E R' and suppose there exists 0 < b < 1
such that for all i, j E [n], we have b < ai, /j
1/b. Suppose that the graph W
is connected and has diameterd = clog n for some fixed constant c and maximum
degree A = 0(1). Suppose that VLS is initialized at b < xio
1/b for i e [n].
Then, there exists a constant a1 > 0 such that given any e > 0, there exists a
second constant a2 > 0 for which after T = a2 n"i log n iterations of VLS, we have
1
<C.
IIF
-M
tXTYT'
5.4
Experiments
In this section, we provide simulation results for the VLS and ELS algorithms with
particular emphasis on
(a) Showing faster convergence of ELS as compared to VLS in several randomly
generated instances.
(b) Success rates for VLS and ELS algorithms for randomly generated instances
and random initializations.
In view of Theorem 5.3, we generate a, 6 E RI at random from U[0.01, 0.99]. We
then compare the decay in root mean square error (RMS) defined below in (5.19)
142
with number of iterations for the two algorithms. We perform this experiment on
a random 3-regular bipartite graph G = (V, E) on 2n vertices with n vertices on
We then run VLS and ELS on
each side and keep it fixed for the experiment.
Me. Random regular graphs are known to be connected with high probability, and
we did not find significant variation in results by changing the graph. Since ELS
requires about a A factor more computation per iteration, we plot the decay of RMS
number for VLS and
vs normalized iterations index which is defined as iteration
Total iteratins
Aiteration number for ELS.
Total iterations
The root mean square (RMS) error after T iterations of ether VLS or ELS is
defined as
ilif
(5.19)
.
1
*1
n-|nM- XMTYTF.
04
al
am
0
O.1
0.2
as
0.4
as5
07
&.
Figure 5-1: RMS vs number of iteration (normalized) for VLS and ELS
The comparison in Figure 5-1 (computed for n = 100) demonstrates that ELS
converges faster than VLS.
143
For the case rank r > 1, we choose a, 3 E R!IT randomly by generating each
entry of the two matrices independently and uniformly from the interval [-1, 1]. We
then construct the random revelation graph g randomly in two steps. First, we
generate a random r + 1 regular bipartite graph g = (V, E). Second, we generate
another edge set E, where each edge exists independently with probability c/n. We
then superimpose the two sets of edges and set E = E U El. The first step ensures
that G has minimum degree r + 1. This ensures that the least square optimization
problems (5.4), (5.5) and (5.15), (5.16) involved in the update rules of VLS and ELS
have unique solutions. The second step allows us to control the number of revealed
entries via the quantity c. In particular, the above random graph model results in
g having an average degree of ~ (r + 1 + c) which corresponds to ~,-, (r + 1 + c)n
revealed entries.
We plot the empirical fraction of failure obtained from 50 iterations as a function
of c. A failure is assumed to occur when the algorithm (VLS or ELS) fails to achieve
an RMS less than 10-. Figure 5-2 and Figure 5-3 show the results for ELS when
r = 2 and r = 3 respectively. This provides evidence for the success of ELS even with
a cold start. Figure 5-4 does the same for VLS with cold start for r = 2, showing
that it does not always succeed.
5.5
Discussion and Future Directions
The performance of ELS in our experiments ssuggest that cold start ELS has good
sample complexity and is quite usable on its own. However, we currently have very
limited understanding of the reason for its success, especially when the rank of M
is greater than one. It would be interesting in future work to find a theoretical
framework for analysis of ELS.
144
0LI
0.7
0J
0,3
0A
0
t
2
3
4
5
7
8
S
10
Figure 5-2: ELS: Failure fraction vs c for r = 2 with planted 3-regular graph (n =
100)
The faster convergence and better sample complexity of ELS also suggests that a
warm start version of ELS (perhaps following a SVD projection step like in [35], [28])
can be very successful and can offer significant improvements over algorithms that
use other warm start subroutines such as VLS or manifold gradient descent. From a
theoretical stand point it may be possible to extend the analysis of warm start VLS
in [28] to prove convergence of warm start ELS.
145
as
0.4
07
I::
0
1
2
3
4
5
6
7
a
9
io
.
. ..
.
.
.
.
.
.
9
10
12
14
if
Is
.
Figure 5-3: ELS: Failure fraction vs c for r = 3 with planted 4-regular graph (n =
100)
07
0.6
Q.5
I04
02
0.2
al
2
4
6
20
Figure 5-4: VLS: Failure fraction vs c for r = 2 with planted 3-regular graph (n =
100)
146
Bibliography
[1] S. Arora. Polynomial time approximation schemes for euclidean traveling salesman and other geometric problems. Journal of the A CM, 45(6), 1998.
[2] A. Asmussen and H. Hering.
Branching Processes, volume 3 of Progress in
probability and statistics. Birkhauser Boston Inc., Boston, MA, 1983.
[3] A. Bandyopadhyay and D. Gamarnik. Counting without sampling: Asymptotics
of the log-partition function for certain statistical physics models.
Random
Structures and Algorithms, 33(4):452-479, 2008.
[4J E. A. Bender and E. R. Canfield. The asymptotic number of labelled graphs with
given degree sequences. Journal of Combinatorial Theory, 24:296-307, 1978.
[5] P. Berman and M. Karpinski. On some tigher inapproximability results, further
improvements. ECCC, TR98-065, 1998.
[6] J. D. Biggins and N. H. Bingham. Large deviations in the supercritical branching
process. Adv. Appl. Probab., 23(4):757-772, 1993.
[7] B. Bollobis. Random Graphs. Academic Press, 1985.
[8] B. Bollobis and 0. Riordan. An old approach to the giant component problem.
2012.
147
[9] M. Boss, H. Elsinger, M. Summer, and S. Thurner. Network topology of the
interback market. QuantitativeFinance, 4(6), 2004.
[10] E. Candes and B. Recht. Exact matrix completion via convex optimization.
Foundationsof computational mathematics, 9(6), 2009.
[11] E. Candes and T. Tao. The power of convex relaxation: near optimal matrix
completion. IEEE Transactionson Information Theory, 56(5):2053-2080, 2009.
[12] N. Chen and M. Olvera-Cravioto. Directed Random Graphs with Given Degree
Distributions. Arxiv.org, 1207.2475, 2013.
[13] R. Dobrushin. Prescribing a system of random variables by the help of conditional distributions. Theory Prob. and its Appl., 15:469-497, 1970.
[14] S. Dubuc. La densite de la loi-limite d'un processus en cascade expansif. Z.
Wahrsch. Verw. Gebiete, 19:281-290, 1971.
[151 M. Dyer, A. Frieze, and R. Kannan. A random polynomial-time algorithm for
approximating the volume of convex bodies. Journal of the ACM, 38(1):1-7,
1991.
[16] P. Erdos and A. R6nyi. On the Evolution of Random Graphs. Magayr Tud.
Akad. Mat. Kutato Int. Kozl, 5:17-61, 1960.
[17] K. Fleischmann and V. Wachtel. Lower deviation probabilities for supercritical
galton-watson processes. Annales de l'Institut Henri Poincare (B) Probability
and Statistics, 43(2):233-255, 2007.
148
[18] D. Gamarnik and D. Katz. Correlation decay and deterministic FPTAS for
counting list-colorings of a graph. Journal of Discrete Algorithms, 12:29-47,
2012.
[19] D. Gamarnik, T. Nowicki, and G. Swirszcz.
Maximum weight independent
sets and matchings in sparse random graphs. exact results using the local weak
convergence method. Random Structures and Algorithms, 28(1), 2006.
[20]
Q. Ge and D.
Stefankovic. Strong spatial mixing of q-colorings on bethe lattices.
arXiv:1102.2886v3, November 2011.
[21] H. 0. Georgii. Gibbs measures and phase transitions. Walter de Gruyter and
Co., Berlin, 1988.
[22] K. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A. Barabasi. The
human disease network. PNAS, 104(21), 2007.
[23] L. A. Goldberg, R. Martin, and M. Paterson. Strong spatial mixing with fewer
colours for lattice graphs. SICOMP, 35(2):486-517, 2005.
[24] G. R. Grimmett.
A theorem about random fields.
Bulletin of the London
Mathematical Society, 5(1):81-84, 1973.
[25] H. Hatami and M. Molloy. The scaling window for a random graph with a given
degree sequence. Random Structures and Algorithms, 41:99 - 123, 2012.
[26] T.P. Hayes and E. Vigoda. Couplings with the Stationary Distribution and Improved Samplings for Colorings and Independent sets. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages
971-979, 2005.
149
[271 M. 0. Jackson. Social and Economic Networks. Princeton University Press,
2008.
[28] P. Jain, P. Netrapalli, and S. Sanghavi.
Low rank matrix completion using
alternating minimization. arXiv:1212.0467v1, 2012.
[29] S. Janson and M. Luczak. A new approach to the Giant Component Problem.
Random Structures and Algorithms, 37(2):197-216, 2008.
[30] M. Jerrum, A. Sinclair, and E. Vigoda. A polynomial time approximation algorithm for the permanent of a matrix with non-negative entries. Journal of the
ACM, 51(4):671-697, 2004.
[31] M. R. Jerrum. A very simple algorithm for counting the number of k-colourings
of a low-degree graph. Random Structures and Algorithms, 7(2):157-165, 1995.
[32] J. Jonasson. Uniqueness of uniform random colorings on regular trees. Statistics
and ProbabilityLetters, 57:243-248, 2002.
[331 Y. Kabashima, F. Krzakala, M. M6zard, A. Sakata, and L. Zdeborova.
Phase transitions and sample complexity in bayes-optimal matrix factorization.
arXiv:1402.1298., 2014.
[34] M. Kang and T.G. Seierstad. The critical phase for random graphs with a given
degree sequence. Combinatorics, Probability and Computing, 17:67-86, 2008.
[35] R. H. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries.
IEEE Transactions on Information Theory, 2010.
[36] R. H. Keshavan, A. Montanari, and S. Oh. Matrix completion from noisy entries.
Journal of Machine Learning Research, 11:2057-2078, 2010.
150
[37] R. H. Keshavan, S. Oh, and A. Montanari. Learning low rank matrices from
o(n) entries. Proceedings of the Allterton Conference on Comm, Control and
Computing, September 2008.
[38] H. Kesten and B. P. Stigum. A Limit Theorem for Multidimensional GaltonWatson Processes. The Annals of Mathematical Statistics, 37(5):1211 - 1223,
1966.
[39] Y. Koren. The BellKor solution to the Netflix grand prize. 2009.
[40] Y. Koren, R. M. Bell, and C. Volinsky.
Matrix factorization techniques for
recommender systems. IEEE Computer, 42(8):30-37, 2009.
[41] R. Meka, P. Jain, C. Caramanis, and I. S. Dhillon. Rank minimization via online
learning. ICML, pages 656-663, 2008.
[42] M. Molloy. The Glauber dynamics on colorings of a graph with high girth and
maximum degree. SIAM Journal on Computing, 33(3):712-734, 2004.
[43] M. Molloy and B. Reed. A critical point for Random Graphs with a given degree
sequence. Random Structures and Algorithms, 6:161-180, 1995.
[44] M. Molloy and B. Reed. The Size of the Largest Component of a Random
Graph on a fixed Degree Sequence. Combinatorics, Probabilityand .Computing,
7:295-306, 1998.
[45] J. L. Morrision, R. Breitling, D. J. Higham, and D. R. Gilbert. A lock-and-key
model for protein-protein interactions. Bioinformatics, 22(16), 2006.
[46] M.E.J. Newmann. The structure of scentific collaboration networks. Proc. Natl.
Acad. Sci. USA, 98, 2001.
151
[471 M.E.J. Newmann, S.H. Strogatz, and D.J. Watts. Random graphs with arbitrary
degree distributions and their applications. Phys. Rev. E, 64(026118), 2001.
[48] 0. Riordan. The phase transition in the configuration model. Combinatorics,
Probabilityand Computing, 21(265-299), 2012.
[49} B. Simon. The statistical mechanics of lattics gases. Princetonseries in physycs,
Princeton university press, Princeton, NJ, 1, 1993.
[501 A. Sly. Computational transition at the uniqueness threshold. pages 287-296.
Foundations of Computer Science (FOCS), IEEE, 2010.
[511 E. Vigoda. Improved bounds for sampling colorings. Journal of Mathematical
Physics, 41(3):1555-1569, 2000.
[52] D. Weitz.
Mixing in time and space for discrete spin systems.
PhD thesis,
University of California Berkeley, May 2004.
[531 D. Weitz. Counting independent sets up to the tree threshold. In Proceedings
of the thi'ty-eighth annual ACM symposium on theory of computing (STOC),
pages 140-149, 2006.
[54] N. C. Wormald. Some Problems in the Enumeration of Labelled Graphs. PhD
thesis, Newcastle University, 1978.
[55] N. C. Wormald.
Differential Equations for Random Processes and Random
Graphs. Annals of Applied Probability,5:1217-1235, 1995.
[56] M. Yildrim, K. Goh, M. E. Cusick, A. Barabasi, and M. Vidal. Drug-target
network. Nat Biotechnol, 25, 2007.
152
Download