Adaptive annealing: sampling and counting a near-optimal connection between

advertisement
Adaptive annealing: a near-optimal
connection between
sampling and counting
Daniel Štefankovič
(University of Rochester)
Santosh Vempala
Eric Vigoda
(Georgia Tech)
Counting
independent sets
spanning trees
matchings
perfect matchings
k-colorings
(approx) counting ⇔ sampling
Valleau,Card’72 (physical chemistry),
Babai’79 (for matchings and colorings),
Jerrum,Valiant,V.Vazirani’86,
the outcome of the JVV reduction:
random variables: X1 X2 ... Xt
such that
1) E[X X ... X ] = “WANTED”
1 2
t
2) the Xi are easy to estimate
V[Xi]
squared coefficient
=
O(1)
2
of variation (SCV)
E[Xi]
(approx) counting ⇔ sampling
1)
E[X1 X2 ... Xt]
= “WANTED”
2) the Xi are easy to estimate
V[Xi]
= O(1)
2
E[Xi]
Theorem (Dyer-Frieze’91)
2
2
O(t /ε ) samples (O(t/ε ) from each X )
2
i
give
1±ε estimator of “WANTED” with prob≥3/4
JVV for independent sets
GOAL: given a graph G, estimate the
number of independent sets of G
1
# independent sets =
P(
)
JVV for independent sets
P(
P(
?
?
?
X1
)P(
P(A∩B)=P(A)P(B|A)
)=
?
?
) P( )P( )
?
X2
Xi ∈ [0,1] and E[Xi] ≥½
X3
⇒
V[Xi]
X4
=
O(1)
E[Xi]2
JVV: If we have a sampler oracle:
graph G
SAMPLER
ORACLE
random
independent
set of G
then FPRAS using O(n2) samples.
JVV: If we have a sampler oracle:
graph G
SAMPLER
ORACLE
random
independent
set of G
then FPRAS using O(n2) samples.
ŠVV: If we have a sampler oracle:
β, graph G
SAMPLER
ORACLE
set from
gas-model
Gibbs at β
then FPRAS using O*(n) samples.
Application – independent sets
O*( |V| ) samples suffice for counting
Cost per sample (Vigoda’01,Dyer-Greenhill’01)
time = O*( |V| ) for graphs of degree ≤ 4.
Total running time:
O* ( |V|2 ).
Other applications
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for β<βC
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2Δ
(using Jerrum’95)
total running time
easy = hot
hard = cold
Hamiltonian
4
2
1
0
Big set = Ω
Hamiltonian
H : Ω → {0,...,n}
Goal: estimate
-1
|H (0)|
-1
|H (0)|
= E[X1] ... E[Xt ]
Distributions between hot and cold
β = inverse temperature
β = 0 ⇒ hot ⇒ uniform on Ω
β = ∞ ⇒ cold ⇒ uniform on H-1(0)
μβ (x) ∝ exp(-H(x)β)
(Gibbs distributions)
Distributions between hot and cold
μβ (x) ∝ exp(-H(x)β)
exp(-H(x)β)
μβ (x) =
Z(β)
Normalizing factor = partition function
Z(β)= ∑ exp(-H(x)β)
x∈Ω
Partition function
Z(β)= ∑ exp(-H(x)β)
x∈Ω
have:
want:
Z(0) = |Ω|
-1
Z(∞) = |H (0)|
Assumption:
we have a sampler oracle for μβ
exp(-H(x)β)
μβ (x) =
Z(β)
graph G
β
SAMPLER
ORACLE
subset of V
from μβ
Assumption:
we have a sampler oracle for μβ
exp(-H(x)β)
μβ (x) =
Z(β)
W ∼ μβ
Assumption:
we have a sampler oracle for μβ
exp(-H(x)β)
μβ (x) =
Z(β)
W ∼ μβ
X = exp(H(W)(β - α))
Assumption:
we have a sampler oracle for μβ
exp(-H(x)β)
μβ (x) =
Z(β)
W ∼ μβ
X = exp(H(W)(β - α))
can obtain the following ratio:
E[X] = ∑ μβ(s) X(s) =
s∈Ω
Z(α)
Z(β)
Our goal restated
Partition function
Z(β) = ∑ exp(-H(x)β)
x∈Ω
Goal: estimate
Z(∞) =
Z(β1) Z(β2)
Z(β0) Z(β1)
-1
Z(∞)=|H (0)|
...
Z(βt)
Z(βt-1)
β0 = 0 < β1 < β 2 < ... < βt = ∞
Z(0)
Our goal restated
Z(∞) =
Z(β1) Z(β2)
Z(β0) Z(β1)
...
Z(βt)
Z(βt-1)
Z(0)
Cooling schedule:
β0 = 0 < β1 < β 2 < ... < βt = ∞
How to choose the cooling schedule?
minimize length, while satisfying
V[Xi]
E[Xi]2
= O(1)
E[Xi] =
Z(βi)
Z(βi-1)
Parameters: A and n
Z(β) = ∑ exp(-H(x)β)
x∈Ω
Z(0) = A
H:Ω → {0,...,n}
n
Z(β) =
∑
ak e-β k
k=0
ak = |H-1(k)|
Parameters
Z(0) = A
H:Ω → {0,...,n}
A
n
V
2
E
≈ V!
V
perfect matchings
V!
V
k-colorings
V
k
E
independent sets
matchings
Previous cooling schedules
Z(0) = A
H:Ω → {0,...,n}
β0 = 0 < β1 < β 2 < ... < βt = ∞
“Safe steps”
β → β + 1/n
(Bezáková,Štefankovič,
β → β (1 + 1/ln A)
Vigoda,V.Vazirani’06)
ln A → ∞
Cooling schedules of length
O( n ln A)
O( (ln n) (ln A) )
(Bezáková,Štefankovič,
Vigoda,V.Vazirani’06)
No better fixed schedule possible
Z(0) = A
H:Ω → {0,...,n}
A schedule that works for all
-βn
A
Za(β) =
(1 + a e
)
1+a
(with a∈[0,A-1])
has LENGTH ≥ Ω( (ln n)(ln A) )
Parameters
Z(0) = A
H:Ω → {0,...,n}
Our main result:
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Previously:
non-adaptive schedules
of length Ω*( ln A )
Existential part
Lemma:
for every partition function there exists
a cooling schedule of length O*((ln A)1/2)
s
t
s
i
x
e
e
r
e
h
t
can get adaptive schedule
of length O* ( (ln A)1/2 )
Express SCV using partition function
(going from β to α) E[X] =
W ∼ μβ
E[X2]
2
E[X]
Z(α)
Z(β)
X = exp(H(W)(β - α))
=
Z(2α-β) Z(β)
Z(α)2
≤ C
E[X2]
2
E[X]
β
α
=
Z(2α-β) Z(β)
Z(α)2
≤ C
2α-β
f(γ)=ln Z(γ)
Proof:
≤ C’=(ln C)/2
f(γ)=ln Z(γ)
Proof:
1
f is decreasing
f is convex
f’(0) ≥ –n
f(0) ≤ ln A
either f or f’
changes a lot
Let K:=Δf
1
Δ(ln |f’|) ≥
K
f:[a,b] → R, convex, decreasing
can be “approximated” using
f’(a)
(f(a)-f(b))
f’(b)
segments
Technicality: getting to 2α-β
Proof:
β
α
2α-β
Technicality: getting to 2α-β
βi
Proof:
β
βi+1
α
2α-β
Technicality: getting to 2α-β
βi
Proof:
β
βi+1
α
2α-β
βi+2
Technicality: getting to 2α-β
βi
Proof:
ln ln A
extra
steps
β
βi+1
α
2α-β
βi+2
βi+3
Existential → Algorithmic
s
t
s
i
x
e
e
r
e
th
can get adaptive schedule
of length O* ( (ln A)1/2 )
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Algorithmic construction
Our main result:
using a sampler oracle for μβ
exp(-H(x)β)
μβ (x) =
Z(β)
we can construct a cooling schedule of length
≤ 38 (ln A)1/2(ln ln A)(ln n)
Total number of oracle calls
≤ 107 (ln A) (ln ln A+ln n)7 ln (1/δ)
Algorithmic construction
current inverse temperature β
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
E[X] =
Z(α)
Z(β)
Algorithmic construction
current inverse temperature β
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
E[X] =
X is “easy to estimate”
Z(α)
Z(β)
Algorithmic construction
current inverse temperature β
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
E[X] =
Z(α)
Z(β)
we make progress (assuming B1>1)
Algorithmic construction
current inverse temperature β
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
E[X] =
need to construct a “feeler” for this
Z(α)
Z(β)
Algorithmic construction
current inverse temperature β
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
=
E[X] =
Z(β)
Z(2β−α)
Z(α)
Z(α)
need to construct a “feeler” for this
Z(α)
Z(β)
Algorithmic construction
current inverse temperature β
bad “feeler”
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
=
E[X] =
Z(β)
Z(2β−α)
Z(α)
Z(α)
need to construct a “feeler” for this
Z(α)
Z(β)
Rough estimator for
n
Z(β) =
∑
Z(β)
Z(α)
ak e-β k
k=0
For W ∼ μβ we have P(H(W)=k) =
ak e-β k
Z(β)
Rough estimator for
Z(β)
Z(α)
If H(X)=k likely at both α, β ⇒ rough
n
estimator
Z(β) =
∑
ak e-β k
k=0
For W ∼ μβ we have P(H(W)=k) =
For U ∼ μα we have P(H(U)=k) =
ak e-β k
Z(β)
ak e-α k
Z(α)
Rough estimator for
Z(β)
Z(α)
For W ∼ μβ we have P(H(W)=k) =
For U ∼ μα we have P(H(U)=k) =
P(H(U)=k) k(α-β) Z(β)
=
e
P(H(W)=k)
Z(α)
ak e-β k
Z(β)
ak e-α k
Z(α)
Z(β)
Rough estimator for
Z(α)
n
Z(β) =
∑
ak e-β k
k=0
For W ∼ μβ we have
P(H(W)∈[c,d]) =
d
∑ ak e-β k
k=c
Z(β)
Rough estimator for
If |α-β|⋅ |d-c| ≤ 1 then
Z(β)
Z(α)
Z(β)
1 Z(β)
P(H(U)∈[c,d]) ec(α-β)
≤
≤e
P(H(W)∈[c,d])
e Z(α)
Z(α)
We also need P(H(U) ∈ [c,d])
P(H(W) ∈ [c,d])
to be large.
Split {0,1,...,n} into h ≤ 4(ln n) ln A
intervals
[0],[1],[2],...,[c,c(1+1/ ln A)],...
for any inverse temperature β there
exists a interval with P(H(W)∈ I) ≥ 1/8h
We say that I is HEAVY for β
Algorithm
repeat
find an interval I which is heavy for
the current inverse temperature β
see how far I is heavy (until some β*)
use the interval I for the feeler
either
* make progress, or
* eliminate the interval I
Z(β)
Z(2β−α)
Z(α)
Z(α)
Algorithm
repeat
find an interval I which is heavy for
the current inverse temperature β
see how far I is heavy (until some β*)
use the interval I for the feeler
either
* make progress, or
* eliminate the interval I
* or make a “long move”
Z(β)
Z(2β−α)
Z(α)
Z(α)
if we have sampler oracles for μβ
then we can get adaptive schedule
of length t=O* ( (ln A)1/2 )
independent sets
O*(n2)
(using Vigoda’01, Dyer-Greenhill’01)
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for β<βC
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2Δ
(using Jerrum’95)
Download