MAT 585: Exact Recovery of the Semidefinite Afonso S. Bandeira

advertisement
MAT 585: Exact Recovery of the Semidefinite
Relaxation for Stochastic Block Model
Afonso S. Bandeira
April 7, 2015
Today we consider a semidefinite programming relaxation algorithm for
SBM and derive conditions for exact recovery. The main ingredient for the
proof will be duality theory.
1
Quick Recap
• n nodes, two clusters of equal size
cluster and inter-cluster edges.
n
2,
p and q probabilities of intra-
• MLE approach:
max
X
Aij xi xj
i,j
s.t. xi = ±1, ∀i
X
xj = 0,
(1)
j
where A is the adjacency matrix of the graph G
1 if (i, j) ∈ E(G)
Aij =
0
otherwise.
(2)
• MLE recovery condition
α+β p
− αβ > 1
2
1
(3)
2
The algorithm
P
Note that if we remove the constraint that j xj = 0 in (1) then the optimal
solution becomes x = 1. Let us define B = 2A − (11T − I), meaning that


0
1
Bij =

−1
if i = j
if (i, j) ∈ E(G)
otherwise
(4)
It is clear that the problem
max
X
Bij xi xj
i,j
s.t. xi = ±1, ∀i
X
xj = 0
(5)
j
has the same solution as (1). However, when the constraint is dropped,
max
X
Bij xi xj
i,j
s.t. xi = ±1, ∀i ,
(6)
x = 1 is no longer an optimal solution. Intuitively, there is enough “−1”
contribution to discourage unbalanced partitions. In fact, (6) is the problem
we’ll set ourselves to solve.
Unfortunately (6) is in general NP-hard (one can encode, for example,
Max-Cut by picking an appropriate B). We will relax it to an easier problem
by the same technique used to approximate the Max-Cut problem in the
previous section (this technique is often known as matrix lifting). If we
write X = xxT then we can formulate the objective of (6) as
X
Bij xi xj = xT Bx = Tr(xT Bx) = Tr(BxxT ) = Tr(BX)
i,j
Also, the condition xi = ±1 implies Xii = x2i = 1. This means that (6) is
equivalent to
2
max
Tr(BX)
Xii = 1, ∀i
s.t.
X=
xxT
for some x ∈
(7)
Rn .
The fact that X = xxT for some x ∈ Rn is equivalent to rank(X) = 1
and X 0.This means that (6) is equivalent to
max
Tr(BX)
s.t.
Xii = 1, ∀i
(8)
X0
rank(X) = 1.
We now relax the problem and, as before, remove the non-convex rank
constraint
max
Tr(BX)
s.t.
Xii = 1, ∀i
(9)
X 0.
This is an SDP that can be solved (up to arbitrary precision) in polynomial
time [2].
Since we removed the rank constraint, the solution to (9) is no longer
guaranteed to be of rank-1. We will take a different approach from the one
we used before to obtain an approximation ratio for Max-Cut, which was
a worst-case approximation ratio guarantee. What we will show is that,
for some values of α and β, with high probability, the solution to (9) not
only satisfies the rank constraint but it coincides with X = gg T where g
corresponds to the true partition. After X is computed, g is simply obtained
as its leading eigenvector.
3
The analysis
Without loss of generality, we can assume that g = (1, . . . , 1, −1, . . . , −1)T ,
meaning that the true partition corresponds to the first n2 nodes on one side
and the other n2 on the other.
3
3.1
Some preliminary definitions
Recall that the degree matrix D of a graph G is a diagonal matrix where
each diagonal coefficient Dii corresponds to the number of neighbours of
vertex i and that λ2 (M ) is the second smallest eigenvalue of a symmetric
matrix M .
Definition 1 Let G+ (resp. G− ) be the subgraph of G that includes the edges
that link two nodes in the same community (resp. in different communities)
and A the adjacency matrix of G. We denote by DG+ (resp. DG− ) the degree
matrix of G+ (resp. G− ) and define the Stochastic Block Model Laplacian to
be
LSBM = DG+ − DG− − A
3.2
Convex Duality
A standard technique to show that a candidate solution is the optimal one
for a convex problem is to use convex duality.
We will describe duality with a game theoretical intuition in mind. The
idea will be to rewrite (9) without imposing constraints on X but rather have
the constraints be implicitly enforced. Consider the following optimization
problem.
max
X
min
Z, Q
Z is diagonal
Q0
Tr(BX) + Tr(QX) + Tr (Z (In×n − X))
(10)
Let us give it a game theoretical interpretation. Suppose there is a primal
player (picking X) whose objective is to maximize the objective and a dual
player, picking Z and Q after seeing X, trying to make the objective as small
as possible. If the primal player does not pick X satistying the constraints
of (9) then we claim that the dual player is capable of driving the objective
to −∞. Indeed, if there is an i for which Xii 6= 1 then the dual player can
1
simply pick Zii = −c 1−X
and make the objective as small as desired by
ii
taking large enough c. Similarly, if X is not positive semidefinite, then the
dual player can take Q = cvv T where v is such that v T Xv < 0. If, on the
other hand, X satisfies the constraints of (9) then
Tr(BX) ≤
min
Z, Q
Z is diagonal
Q0
Tr(BX) + Tr(QX) + Tr (Z (In×n − X)) ,
4
since equality can be achieve if, for example, the dual player picks Q = 0n×n ,
then it is clear that the values of (9) and (10) are the same:
max Tr(BX) = max
X,
Xii ∀i
X0
X
min
Z, Q
Z is diagonal
Q0
Tr(BX) + Tr(QX) + Tr (Z (In×n − X))
With this game theoretical intuition in mind, it is clear that if we change
the “rules of the game” and have the dual player decide her variables before
the primal player (meaning that the primal player can pick X knowing the
values of Z and Q) then it is clear that the objective can only increase,
which means that:
max Tr(BX) ≤
X,
Xii ∀i
X0
min
Z, Q
Z is diagonal
Q0
max Tr(BX) + Tr(QX) + Tr (Z (In×n − X)) .
X
Note that we can rewrite
Tr(BX) + Tr(QX) + Tr (Z (In×n − X)) = Tr ((B + Q − Z) X) + Tr(Z).
When playing:
min
Z, Q
Z is diagonal
Q0
max Tr ((B + Q − Z) X) + Tr(Z),
X
if the dual player does not set B + Q − Z = 0n×n then the primal player can
drive the objective value to +∞, this means that the dual player is forced
to chose Q = Z − B and so we can write
min
Z, Q
Z is diagonal
Q0
max Tr ((B + Q − Z) X) + Tr(Z) =
X
min
Z,
Z is diagonal
Z−B0
max Tr(Z),
X
which clearly does not depend on the choices of the primal player. This
means that
max Tr(BX) ≤
min
Tr(Z).
X,
Xii ∀i
X0
Z,
Z is diagonal
Z−B0
This is known as weak duality (strong duality says that, under some conditions the two optimal values actually match, see, for example, [2]). Also,
the problem
5
min
Tr(Z)
s.t.
Z is diagonal
(11)
Z −B 0
is called the dual problem of (9).
The derivation above explains why the objective value of the dual is
always greater or equal to the primal. Nevertheless, there is a much simpler
proof (although not as enlightening): let X, Z be respectively a feasible point
of (9) and (11). Since Z is diagonal and Xii = 1 then Tr(ZX) = Tr(Z).
Also, Z − B 0 and X 0, therefore Tr[(Z − B)X] ≥ 0. Altogether,
Tr(Z) − Tr(BX) = Tr[(Z − B)X] ≥ 0,
as stated.
Recall that we want to show that gg T is the optimal solution of (9).
Then, if we find Z diagonal, such that Z − B 0 and
Tr[(Z−B)gg T ] = 0,
(this condition is known as complementary slackness)
then X = gg T must be an optimal solution of (9). To ensure that gg T is
the unique solution we just have to ensure that the nullspace of Z − B only
has dimension 1 (which corresponds to multiples of g). Essentially, if this
is the case, then for any other possible solution X one could not satisfy
complementary slackness.
This means that if we can find Z with the following properties:
1. Z is diagonal
2. Tr[(Z − B)gg T ] = 0
3. Z − B 0
4. λ2 (Z − B) > 0,
then gg T is the unique optima of (9) and so recovery of the true partition is
possible (with an efficient algorithm).
Z is known as the dual certificate, or dual witness.
6
3.3
Building the dual certificate
The idea to build Z is to construct it to satisfy properties (1) and (2) and
try to show that it satisfies (3) and (4) using concentration.
If indeed Z − B 0 then (2) becomes equivalent to (Z − B)g = 0.
This means that we need to construct Z such that Zii = g1i B[i, :]g. Since
B = 2A − (11T − I) we have
Zii =
1
1
(2A − (11T − I))[i, :]g = 2 (Ag)i + 1,
gi
gi
meaning that
Z = 2(DG+ − DG− ) + I
is our guess for the dual witness. As a result
Z − B = 2(DG+ − DG− ) − I − 2A − (11T − I) = 2LSBM + 11T
It trivially follows (by construction) that
(Z − B)g = 0.
Therefore
Lemma 2 If
λ2 (2LSBM + 11T ) > 0,
(12)
then the relaxation recovers the true partition.
Note that 2LSBM + 11T is a random matrix and so this boils down to “an
exercise” in random matrix theory.
3.4
Matrix Concentration
Clearly,
E 2LSBM + 11T = 2ELSBM + 11T = 2EDG+ − 2EDG− − 2EA + 11T ,
and EDG+ =
n
2
×
n
2
n α log(n)
I,
2
n
EDG− =
n β log(n)
I,
2
n
and EA is a matrix such with 4
α log(n)
and the off-diagonal
n
β log(n)
1 α log(n)
=2
+
11T +
n
n
blocks where the diagonal blocks have
blocks have β log(n)
. We can write this as EA
n
β log(n)
1 α log(n)
−
gg T
2
n
n
7
This means that
E 2LSBM + 11
T
log n
log n T
= ((α − β) log n) I+ 1 − (α + β)
11T −(α−β)
gg .
n
n
Since 2LSBM g = 0 we can ignore what happens in the span of g and it
is not hard to see that
log n T
log n
11T − (α − β)
gg = (α−β) log n.
λ2 ((α − β) log n) I + 1 − (α + β)
n
n
This means that it is enough to show that
kLSBM − E [LSBM ]k <
α−β
log n,
2
(13)
which is a large deviations inequality. (k · k denotes operator norm)
We will not go into the details here but the idea is to write LSBM −
E [LSBM ] as a sum of independent random matrices and use Matrix Bernstein inequality (as in [1]). In fact, using this strategy one can show that,
with high probability, 13 holds as long as
8
(α − β)2 > 8(α + β) + (α − β).
3
3.5
(14)
Comparison with phase transition
To compare (14) with (3) we note that the latter can be rewritten as
(α − β)2 > 4(α + β) − 4 and α + β > 2
and so the relaxation achieves exact recovery almost at the threshold, essentially only suboptimal by a factor of 2.
References
[1] J. A. Tropp. User-friendly tail bounds for sums of random matrices.
Foundations of Computational Mathematics, 12(4):389–424, 2012.
[2] L. Vanderberghe and S. Boyd. Semidefinite programming. SIAM Review,
38:49–95, 1996.
8
Download