Adaptive annealing: sampling and counting a near-optimal connection between

advertisement
Adaptive annealing: a near-optimal
connection between
sampling and counting
Daniel Štefankovič
(University of Rochester)
Santosh Vempala
Eric Vigoda
(Georgia Tech)
Adaptive annealing: a near-optimal
connection between
counting
If sampling
you want and
to count
using MCMC then
statistical physics
Daniel Štefankovič
is useful.
(University of Rochester)
Santosh Vempala
Eric Vigoda
(Georgia Tech)
Outline
1. Counting problems
2. Basic tools: Chernoff, Chebyshev
3. Dealing with large quantities
(the product method)
4. Statistical physics
5. Cooling schedules (our work)
6. More…
Counting
independent sets
spanning trees
matchings
perfect matchings
k-colorings
Counting
independent sets
spanning trees
matchings
perfect matchings
k-colorings
Compute the number of
spanning trees
Compute the number of
spanning trees
Kirchhoff’s Matrix Tree Theorem:
2
0
0
0
0
2
0
0
0
0
2
0
D
0
0
0
2
-
0
1
0
1
1
0
1
0
0
1
0
1
A
1
0
1
0
det(D – A)vv
2 -1 0
det
-1 2 -1
0 -1 2
Compute the number of
spanning trees
G
polynomial-time
algorithm
number of spanning trees of G
Counting
independent sets
spanning trees
matchings
perfect matchings
k-colorings
?
Compute the number of
independent sets
(hard-core gas model)
independent set
subset S of vertices,
=
of a graph
no two in S are neighbors
# independent sets = 7
independent set = subset S of vertices
no two in S are neighbors
# independent sets =
G1
G2
G3
Gn-2
...
Gn-1
...
Gn
...
# independent sets =
2
G1
3
G2
5
G3
Gn-2
...
Fn-1
Gn-1
...
Fn
Gn
...
Fn+1
# independent sets = 5598861
independent set = subset S of vertices
no two in S are neighbors
Compute the number of
independent sets
G
polynomial-time
algorithm
?
number of independent sets of G
Compute the number of
independent sets
G
polynomial-time
algorithm
)
y
l
ke
i
l
(un
!
number of independent sets of G
graph G 6 # independent sets in G
#P-complete
#P
NP
FP
P
#P-complete even for 3-regular graphs
(Dyer, Greenhill, 1997)
graph G 6 # independent sets in G
?
approximation
randomization
graph G 6 # independent sets in G
?
approximation
randomization
which is
more important?
graph G 6 # independent sets in G
My world-view:
?
(true) randomness is important
conceptually but NOT computationally
(i.e., I believe P=BPP).
which is
approximation makes problems
approximation
more important?
easier (i.e., I believe #P=BPP)
randomization
We would like to know Q
Goal: random variable
Y
such that
P( (1-ε)Q ≤ Y ≤ (1+ε)Q ) ≥ 1-δ
“Y gives (1±ε)-estimate”
We would like to know Q
Goal: random variable
Y
such that
P( (1-ε)Q ≤ Y ≤ (1+ε)Q ) ≥ 1-δ
(fully polynomial randomized approximation scheme):
FPRAS:
G,ε,δ
polynomial-time
algorithm
Y
Outline
1. Counting problems
2. Basic tools: Chernoff, Chebyshev
3. Dealing with large quantities
(the product method)
4. Statistical physics
5. Cooling schedules (our work)
6. More...
We would like to know Q
1. Get an unbiased estimator
X, i. e.,
E[X] = Q
2. “Boost the quality” of X:
X1 + X2 + ... + Xn
Y=
n
The Bienaymé-Chebyshev inequality
P( Y gives (1±ε)-estimate )
≥1-
V[Y]
E[Y]2
1
ε2
The Bienaymé-Chebyshev inequality
P( Y gives (1±ε)-estimate )
≥1-
V[Y]
E[Y]2
1
ε2
squared coefficient
of variation SCV
Y=
X1 + X2 + ... + Xn
n
⇒
V[Y]
1
=
2
E[Y]
n
V[X]
E[X]2
The Bienaymé-Chebyshev inequality
Let X1,...,Xn,X be independent, identically
distributed random variables,
Q=E[X]. Let
Y=
X1 + X2 + ... + Xn
n
Then
P( Y gives (1±ε)-estimate of Q )
V[X] 1
≥12
2
ε
n E[X]
Chernoff’s bound
Let X1,...,Xn,X be independent, identically
distributed random variables, 0 ≤ X ≤ 1,
Q=E[X]. Let
Y=
X1 + X2 + ... + Xn
n
Then
P( Y gives (1±ε)-estimate of Q )
≥1– e
- ε2 . n . E[X] / 3
V[X]
n=
E[X]2
1
ε2
1
δ
Number of samples to achieve precision ε with confidence δ.
n=
0≤X≤1
1
E[X]
3
ln (1/δ)
ε2
BAD
V[X]
n=
E[X]2
1
ε2
1
δ
Number of samples to achieve precision ε with confidence δ.
n=
0≤X≤1
1
E[X]
BAD
3
ln (1/δ)
ε2
GOOD
Median “boosting trick”
n=
1
E[X]
4
ε2
Y=
X1 + X2 + ... + Xn
n
BY BIENAYME-CHEBYSHEV:
P( ∈
) ≥ 3/4
=
(1-ε)Q
Y
(1+ε)Q
Median trick – repeat 2T times
(1-ε)Q
BY BIENAYME-CHEBYSHEV:
P( ∈
P(
) ≥ 3/4
⇒
BY CHERNOFF:
(1+ε)Q
> T out of 2T
)≥1-e
⇒
P(
median is in )
≥1-e
-T/4
-T/4
n=
V[X] 32
ln (1/δ)
2
2
E[X] ε
+ median trick
n=
0≤X≤1
1
E[X]
BAD
3
ε2
ln (1/δ)
Creating “approximator” from X
ε = precision
δ = confidence
n=
Θ(
V[X] 1
E[X]2 ε2
ln (1/δ)
)
Outline
1. Counting problems
2. Basic tools: Chernoff, Chebyshev
3. Dealing with large quantities
(the product method)
4. Statistical physics
5. Cooling schedules (our work)
6. More...
(approx) counting ⇔ sampling
Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and
colorings), Jerrum,Valiant,V.Vazirani’86
the outcome of the JVV reduction:
random variables: X1 X2 ... Xt
such that
1) E[X X ... X ] = “WANTED”
1 2
t
2) the Xi are easy to estimate
V[Xi]
squared coefficient
=
O(1)
2
of variation (SCV)
E[Xi]
(approx) counting ⇔ sampling
1)
E[X1 X2 ... Xt]
= “WANTED”
2) the Xi are easy to estimate
V[Xi]
= O(1)
2
E[Xi]
Theorem (Dyer-Frieze’91)
2
2
O(t /ε ) samples (O(t/ε ) from each X )
2
i
give
1±ε estimator of “WANTED” with prob≥3/4
JVV for independent sets
GOAL: given a graph G, estimate the
number of independent sets of G
1
# independent sets =
P(
)
JVV for independent sets
P(
P(
?
?
?
X1
)P(
P(A∩B)=P(A)P(B|A)
)=
?
?
) P( )P( )
?
X2
Xi ∈ [0,1] and E[Xi] ≥½
X3
⇒
V[Xi]
X4
=
O(1)
E[Xi]2
JVV for independent sets
P(
P(
?
?
?
X1
)P(
P(A∩B)=P(A)P(B|A)
)=
?
?
) P( )P( )
?
X2
Xi ∈ [0,1] and E[Xi] ≥½
X3
⇒
V[Xi]
X4
=
O(1)
E[Xi]2
Self-reducibility for independent sets
P(
?
?
?
)
5
=
7
Self-reducibility for independent sets
P(
?
?
?
7
=
5
)
5
=
7
Self-reducibility for independent sets
P(
?
?
?
7
=
5
)
5
=
7
7
=
5
Self-reducibility for independent sets
P(
?
?
)
5
=
3
3
=
5
Self-reducibility for independent sets
P(
?
?
)
5
=
3
3
=
5
5
=
3
Self-reducibility for independent sets
7
=
5
7 5 3
=
5 3 2
7 5
=
5 3
=7
JVV: If we have a sampler oracle:
graph G
SAMPLER
ORACLE
random
independent
set of G
then FPRAS using O(n2) samples.
JVV: If we have a sampler oracle:
graph G
SAMPLER
ORACLE
random
independent
set of G
then FPRAS using O(n2) samples.
ŠVV: If we have a sampler oracle:
β, graph G
SAMPLER
ORACLE
set from
gas-model
Gibbs at β
then FPRAS using O*(n) samples.
Application – independent sets
O*( |V| ) samples suffice for counting
Cost per sample (Vigoda’01,Dyer-Greenhill’01)
time = O*( |V| ) for graphs of degree ≤ 4.
Total running time:
O* ( |V|2 ).
Other applications
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for β<βC
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2Δ
(using Jerrum’95)
total running time
Outline
1. Counting problems
2. Basic tools: Chernoff, Chebyshev
3. Dealing with large quantities
(the product method)
4. Statistical physics
5. Cooling schedules (our work)
6. More…
easy = hot
hard = cold
Hamiltonian
4
2
1
0
Big set = Ω
Hamiltonian
H : Ω → {0,...,n}
Goal: estimate
-1
|H (0)|
-1
|H (0)|
= E[X1] ... E[Xt ]
Distributions between hot and cold
β = inverse temperature
β = 0 ⇒ hot ⇒ uniform on Ω
β = ∞ ⇒ cold ⇒ uniform on H-1(0)
μβ (x) ∝ exp(-H(x)β)
(Gibbs distributions)
Distributions between hot and cold
μβ (x) ∝ exp(-H(x)β)
exp(-H(x)β)
μβ (x) =
Z(β)
Normalizing factor = partition function
Z(β)= ∑ exp(-H(x)β)
x∈Ω
Partition function
Z(β)= ∑ exp(-H(x)β)
x∈Ω
have:
want:
Z(0) = |Ω|
-1
Z(∞) = |H (0)|
Partition function - example
Z(β)= ∑ exp(-H(x)β)
x∈Ω
4
2
1
0
have:
want:
Z(0) = |Ω|
Z(∞) = |H-1(0)|
Z(β) =
1 e-4.β
+ 4 e-2.β
+ 4 e-1.β
+ 7 e-0.β
Z(0) = 16
Z(∞)=7
Assumption:
we have a sampler oracle for μβ
exp(-H(x)β)
μβ (x) =
Z(β)
graph G
β
SAMPLER
ORACLE
subset of V
from μβ
Assumption:
we have a sampler oracle for μβ
exp(-H(x)β)
μβ (x) =
Z(β)
W ∼ μβ
Assumption:
we have a sampler oracle for μβ
exp(-H(x)β)
μβ (x) =
Z(β)
W ∼ μβ
X = exp(H(W)(β - α))
Assumption:
we have a sampler oracle for μβ
exp(-H(x)β)
μβ (x) =
Z(β)
W ∼ μβ
X = exp(H(W)(β - α))
can obtain the following ratio:
E[X] = ∑ μβ(s) X(s) =
s∈Ω
Z(α)
Z(β)
Our goal restated
Partition function
Z(β) = ∑ exp(-H(x)β)
x∈Ω
Goal: estimate
Z(∞) =
Z(β1) Z(β2)
Z(β0) Z(β1)
-1
Z(∞)=|H (0)|
...
Z(βt)
Z(βt-1)
β0 = 0 < β1 < β 2 < ... < βt = ∞
Z(0)
Our goal restated
Z(∞) =
Z(β1) Z(β2)
Z(β0) Z(β1)
...
Z(βt)
Z(βt-1)
Z(0)
Cooling schedule:
β0 = 0 < β1 < β 2 < ... < βt = ∞
How to choose the cooling schedule?
minimize length, while satisfying
V[Xi]
E[Xi]2
= O(1)
E[Xi] =
Z(βi)
Z(βi-1)
Our goal restated
Z(∞) =
Z(β1) Z(β2)
Z(β0) Z(β1)
...
Z(βt)
Z(βt-1)
Z(0)
Cooling schedule:
β0 = 0 < β1 < β 2 < ... < βt = ∞
How to choose the cooling schedule?
minimize length, while satisfying
V[Xi]
E[Xi]2
= O(1)
E[Xi] =
Z(βi)
Z(βi-1)
Outline
1. Counting problems
2. Basic tools: Chernoff, Chebyshev
3. Dealing with large quantities
(the product method)
4. Statistical physics
5. Cooling schedules (our work)
6. More...
Parameters: A and n
Z(β) = ∑ exp(-H(x)β)
x∈Ω
Z(0) = A
H:Ω → {0,...,n}
n
Z(β) =
∑
ak e-β k
k=0
ak = |H-1(k)|
Parameters
Z(0) = A
H:Ω → {0,...,n}
A
n
V
2
E
≈ V!
V
perfect matchings
V!
V
k-colorings
V
k
E
independent sets
matchings
Parameters
Z(0) = A
H:Ω → {0,...,n}
n
A
independent sets
matchings
2V
E
≈ V!
V
V!
V
V
k
E
perfect
matchings
matchings
= # ways
of marrying them so that no
unhappy couple
k-colorings
Parameters
Z(0) = A
H:Ω → {0,...,n}
n
A
independent sets
matchings
2V
E
≈ V!
V
V!
V
V
k
E
perfect
matchings
matchings
= # ways
of marrying them so that no
unhappy couple
k-colorings
Parameters
Z(0) = A
H:Ω → {0,...,n}
n
A
independent sets
matchings
2V
E
≈ V!
V
V!
V
V
k
E
perfect
matchings
matchings
= # ways
of marrying them so that no
unhappy couple
k-colorings
Parameters
Z(0) = A
H:Ω → {0,...,n}
n
A
independent sets
matchings
2V
E
≈ V!
V
V!
V
V
k
E
perfect
marry
ignoringmatchings
“compatibility”
hamiltonian = number of unhappy couples
k-colorings
Parameters
Z(0) = A
H:Ω → {0,...,n}
A
n
V
2
E
≈ V!
V
perfect matchings
V!
V
k-colorings
V
k
E
independent sets
matchings
Previous cooling schedules
Z(0) = A
H:Ω → {0,...,n}
β0 = 0 < β1 < β 2 < ... < βt = ∞
“Safe steps”
β → β + 1/n
(Bezáková,Štefankovič,
β → β (1 + 1/ln A)
Vigoda,V.Vazirani’06)
ln A → ∞
Cooling schedules of length
O( n ln A)
O( (ln n) (ln A) )
(Bezáková,Štefankovič,
Vigoda,V.Vazirani’06)
Previous cooling schedules
Z(0) = A
H:Ω → {0,...,n}
β0 = 0 < β1 < β 2 < ... < βt = ∞
“Safe steps”
β → β + 1/n
(Bezáková,Štefankovič,
β → β (1 + 1/ln A)
Vigoda,V.Vazirani’06)
ln A → ∞
Cooling schedules of length
O( n ln A)
O( (ln n) (ln A) )
(Bezáková,Štefankovič,
Vigoda,V.Vazirani’06)
“Safe steps”
β → β + 1/n
β → β (1 + 1/ln A)
ln A → ∞
W ∼ μβ
(Bezáková,Štefankovič,
Vigoda,V.Vazirani’06)
X = exp(H(W)(β - α))
n
Z(β) =
∑
ak e-β k
k=0
1/e ≤ X ≤ 1
V[X]
E[X]2
≤
1
E[X]
≤e
“Safe steps”
β → β + 1/n
β → β (1 + 1/ln A)
ln A → ∞
W ∼ μβ
(Bezáková,Štefankovič,
Vigoda,V.Vazirani’06)
X = exp(H(W)(β - α))
n
Z(β) =
∑
ak e-β k
k=0
Z(∞) = a0 ≥ 1
Z(ln A) ≤ a0 + 1
E[X] ≥ 1/2
“Safe steps”
β → β + 1/n
β → β (1 + 1/ln A)
ln A → ∞
W ∼ μβ
(Bezáková,Štefankovič,
Vigoda,V.Vazirani’06)
X = exp(H(W)(β - α))
n
Z(β) =
∑
ak e-β k
k=0
E[X] ≥ 1/2e
Previous cooling schedules
1/n, 2/n, 3/n, .... , (ln A)/n, .... , ln A
“Safe steps”
β → β + 1/n
(Bezáková,Štefankovič,
β → β (1 + 1/ln A)
Vigoda,V.Vazirani’06)
ln A → ∞
Cooling schedules of length
O( n ln A)
O( (ln n) (ln A) )
(Bezáková,Štefankovič,
Vigoda,V.Vazirani’06)
No better fixed schedule possible
Z(0) = A
H:Ω → {0,...,n}
THEOREM:
A schedule that works for all
-βn
A
Za(β) =
(1 + a e
)
1+a
(with a∈[0,A-1])
has LENGTH ≥ Ω( (ln n)(ln A) )
Parameters
Z(0) = A
H:Ω → {0,...,n}
Our main result:
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Previously:
non-adaptive schedules
of length Ω*( ln A )
Related work
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Lovász-Vempala
Volume of convex bodies in O*(n4)
schedule of length O(n1/2)
(non-adaptive cooling schedule, using specific properties
of the “volume” partition functions)
Existential part
Lemma:
for every partition function there exists
a cooling schedule of length O*((ln A)1/2)
s
t
s
i
x
e
e
r
e
h
t
can get adaptive schedule
of length O* ( (ln A)1/2 )
Cooling schedule (definition refresh)
Z(∞) =
Z(β1) Z(β2)
Z(β0) Z(β1)
...
Z(βt)
Z(βt-1)
Z(0)
Cooling schedule:
β0 = 0 < β1 < β 2 < ... < βt = ∞
How to choose the cooling schedule?
minimize length, while satisfying
V[Xi]
E[Xi]2
= O(1)
E[Xi] =
Z(βi)
Z(βi-1)
Express SCV using partition function
(going from β to α) E[X] =
W ∼ μβ
E[X2]
=
2
E[X]
V[X]
+1
2
E[X]
Z(α)
Z(β)
X = exp(H(W)(β - α))
=
Z(2α-β) Z(β)
Z(α)2
≤ C
E[X2]
2
E[X]
β
α
=
Z(2α-β) Z(β)
Z(α)2
≤ C
2α-β
f(γ)=ln Z(γ)
graph of f
(f(2α-β) + f(β))/2 ≤
(ln C)/2 + f(α)
Proof:
≤ C’=(ln C)/2
Properties of partition functions
f(γ)=ln Z(γ)
f is decreasing
f is convex
f’(0) ≥ –n
f(0) ≤ ln A
Properties of partition functions
f is decreasing
f is convex
f’(0) ≥ –n
f(0) ≤ ln A
f(γ)=ln Z(γ)
n
f(β) = ln
∑
k=0
ak e-β k
f’(β)
(ln f)’ =
f’
f
n
∑
ak k e-β k
k=0
= n
ak e-β k
k=0
∑
GOAL: proving Lemma:
for every partition function there exists
a cooling schedule of length O*((ln A)1/2)
f(γ)=ln Z(γ)
Proof:
1
f is decreasing
f is convex
f’(0) ≥ –n
f(0) ≤ ln A
either f or f’
changes a lot
Let K:=Δf
Then
1
Δ(ln |f’|) ≥
K
Proof:
1
a
Let K:=Δf
Then
1
Δ(ln |f’|) ≥
K
c b
c := (a+b)/2, Δ := b-a
have f(c) = (f(a)+f(b))/2 – 1
f is convex
(f(a) – f(c)) /Δ ≤ f’(a)
(f(c) – f(b)) /Δ ≥ f’(b)
f’(b)
≤ 1-1/Δf ≤ e-Δf
f’(a)
Let K:=Δf
Then
1
Δ(ln |f’|) ≥
K
c := (a+b)/2, Δ := b-a
have f(c) = (f(a)+f(b))/2 – 1
f is convex
(f(a) – f(c)) /Δ ≤ f’(a)
(f(c) – f(b)) /Δ ≥ f’(b)
f:[a,b] → R, convex, decreasing
can be “approximated” using
f’(a)
(f(a)-f(b))
f’(b)
segments
Technicality: getting to 2α-β
Proof:
β
α
2α-β
Technicality: getting to 2α-β
βi
Proof:
β
βi+1
α
2α-β
Technicality: getting to 2α-β
βi
Proof:
β
βi+1
α
2α-β
βi+2
Technicality: getting to 2α-β
βi
Proof:
ln ln A
extra
steps
β
βi+1
α
2α-β
βi+2
βi+3
Existential → Algorithmic
s
t
s
i
x
e
e
r
e
th
can get adaptive schedule
of length O* ( (ln A)1/2 )
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Algorithmic construction
Our main result:
using a sampler oracle for μβ
exp(-H(x)β)
μβ (x) =
Z(β)
we can construct a cooling schedule of length
≤ 38 (ln A)1/2(ln ln A)(ln n)
Total number of oracle calls
≤ 107 (ln A) (ln ln A+ln n)7 ln (1/δ)
Algorithmic construction
current inverse temperature β
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
E[X] =
Z(α)
Z(β)
Algorithmic construction
current inverse temperature β
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
E[X] =
X is “easy to estimate”
Z(α)
Z(β)
Algorithmic construction
current inverse temperature β
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
E[X] =
we make progress (where B1>1)
Z(α)
Z(β)
Algorithmic construction
current inverse temperature β
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
E[X] =
need to construct a “feeler” for this
Z(α)
Z(β)
Algorithmic construction
current inverse temperature β
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
=
E[X] =
Z(β)
Z(2β−α)
Z(α)
Z(α)
need to construct a “feeler” for this
Z(α)
Z(β)
Algorithmic construction
current inverse temperature β
bad “feeler”
ideally move to α such that
B1 ≤
E[X2]
E[X]2
≤ B2
=
E[X] =
Z(β)
Z(2β−α)
Z(α)
Z(α)
need to construct a “feeler” for this
Z(α)
Z(β)
estimator for
n
Z(β) =
∑
Z(β)
Z(α)
ak e-β k
k=0
For W ∼ μβ we have P(H(W)=k) =
ak e-β k
Z(β)
estimator for
Z(β)
Z(α)
If H(X)=k likely at both α, β ⇒ estimator
n
Z(β) =
∑
ak e-β k
k=0
For W ∼ μβ we have P(H(W)=k) =
For U ∼ μα we have P(H(U)=k) =
ak e-β k
Z(β)
ak e-α k
Z(α)
estimator for
Z(β)
Z(α)
If H(X)=k likely at both α, β ⇒ estimator
n
Z(β) =
∑
ak e-β k
k=0
For W ∼ μβ we have P(H(W)=k) =
For U ∼ μα we have P(H(U)=k) =
ak e-β k
Z(β)
ak e-α k
Z(α)
estimator for
Z(β)
Z(α)
For W ∼ μβ we have P(H(W)=k) =
For U ∼ μα we have P(H(U)=k) =
P(H(U)=k) k(α-β) Z(β)
=
e
P(H(W)=k)
Z(α)
ak e-β k
Z(β)
ak e-α k
Z(α)
estimator for
Z(β)
Z(α)
For W ∼ μβ we have P(H(W)=k) =
For U ∼ μα we have P(H(U)=k) =
ak e-β k
Z(β)
ak e-α k
Z(α)
P(H(U)=k) k(α-β) Z(β)
=
e
P(H(W)=k)
Z(α)
PROBLEM: P(H(W)=k) can be too small
Z(β)
Rough estimator for
n
Z(β) =
∑
ak e-β k
Z(α)
interval instead
of single value
k=0
For W ∼ μβ we have
P(H(W)∈[c,d]) =
For U ∼ μα we have
P(H(W)∈[c,d]) =
d
∑ ak e-β k
k=c
Z(β)
d
∑ ak e-α k
k=c
Z(α)
Rough estimator for
If |α-β|⋅ |d-c| ≤ 1 then
Z(β)
Z(α)
Z(β)
1 Z(β)
P(H(U)∈[c,d]) ec(α-β)
≤
≤e
P(H(W)∈[c,d])
e Z(α)
Z(α)
We also need P(H(U) ∈ [c,d])
P(H(W) ∈ [c,d]) to be large.
d
∑ ak e-α k
k=c
d
∑ ak e-β k
k=c
d
ec(α-β)
∑ ak e-α (k-c)
=
k=c
d
∑ ak e-β (k-c)
k=c
We will:
Split {0,1,...,n} into h ≤ 4(ln n) ln A
intervals
[0],[1],[2],...,[c,c(1+1/ ln A)],...
for any inverse temperature β there
exists a interval with P(H(W)∈ I) ≥ 1/8h
We say that I is HEAVY for β
We will:
Split {0,1,...,n} into h ≤ 4(ln n) ln A
intervals
[0],[1],[2],...,[c,c(1+1/ ln A)],...
for any inverse temperature β there
exists a interval with P(H(W)∈ I) ≥ 1/8h
We say that I is HEAVY for β
Algorithm
repeat
find an interval I which is heavy for
the current inverse temperature β
see how far I is heavy (until some β*)
use the interval I for the feeler
ANALYSIS:
either
* make progress, or
* eliminate the interval I
* or make a “long move”
Z(β)
Z(2β−α)
Z(α)
Z(α)
I is
heavy
β
distribution of h(X) where X∼μβ
...
I = a heavy interval at β
I is
heavy
I is NOT
heavy
β
γ
distribution of h(X) where X∼μγ
...
no longer
heavy at γ
I = a heavy interval at β
!
I is
heavy
I is
heavy
I is NOT
heavy
β
γ’
γ
distribution of h(X) where X∼μγ’
heavy at γ’
...
I = a heavy interval at β
I is
heavy
I is
heavy
I is NOT
heavy
β
γ’
γ
I is
heavy
β*
β*+1/(2n)
I is NOT
heavy
use binary search to find β*
α = min{1/(b-a), ln A}
I=[a,b]
I is
heavy
I is
heavy
I is NOT
heavy
β
γ’
γ
I is
heavy
β*
β*+1/(2n)
I is NOT
heavy
I=[a,b]
use binary search to find β*
α = min{1/(b-a), ln A}
How do you know that you can use binary search?
How do you know that you can use binary search?
I is NOT
heavy
I is
heavy
I is NOT
heavy
I is
heavy
Lemma: the set of temperatures for which I
is h-heavy is an interval.
I is h-heavy at β
∑
k∈I
ak
-β
k
e
P(h(X)∈ I) ≥ 1/8h for X∼μβ
1
≥
8h
n
∑
k=0
ak e-β k
How do you know that you can use binary search?
∑
ak
-β
k
e
k∈I
1
≥
8h
n
∑
ak
-β
k
e
k=0
x=e-β
Descarte’s rule of signs:
c0 x0 + c1 x1 + c2 x2 + .... + cn xn
+
+
+
sign change
number of
positive roots
≤
number of
sign changes
How do you know that you can use binary search?
∑
k∈I
n
ak
-β
k
e
∑
1
-1+x+x
≥ 2+x3+...+xank e-β k
20
1+x+x
h
k=0
x=e-β
Descarte’s rule of signs:
c0 x0 + c1 x1 + c2 x2 + .... + cn xn
+
+
+
sign change
number of
positive roots
≤
number of
sign changes
How do you know that you can use binary search?
∑
ak
-β
k
e
k∈I
1
≥
8h
n
∑
ak
-β
k
e
k=0
x=e-β
Descarte’s rule of signs:
c0 x0 + c1 x1 + c2 x2 + .... + cn xn
+
+
+
sign change
number of
positive roots
≤
number of
sign changes
I=[a,b]
I is
heavy
β
I is
heavy
β*
β*+1/(2n)
I is NOT
heavy
can roughly
compute ratio of
Z(α)/Z(α’)
for α, α’∈ [β,β*]
if |α-α|.|b-a|≤ 1
1. success
I=[a,b]
I is
heavy
2. eliminate interval
β
I is
heavy
β*
β*+1/(2n)
3. long move
I is NOT
heavy
find largest α such that
Z(β)
Z(2β−α)
Z(α)
Z(α)
≤C
can roughly
compute ratio of
Z(α)/Z(α’)
for α, α’∈ [β,β*]
if |α-α|.|b-a|≤ 1
if we have sampler oracles for μβ
then we can get adaptive schedule
of length t=O* ( (ln A)1/2 )
independent sets
O*(n2)
(using Vigoda’01, Dyer-Greenhill’01)
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for β<βC
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2Δ
(using Jerrum’95)
Outline
1. Counting problems
2. Basic tools: Chernoff, Chebyshev
3. Dealing with large quantities
(the product method)
4. Statistical physics
5. Cooling schedules (our work)
6. More...
Outline
6. More…
a) proof of Dyer-Frieze
b) independent sets revisited
c) warm starts
Appendix – proof of:
1)
E[X1 X2 ... Xt]
= “WANTED”
2) the Xi are easy to estimate
V[Xi]
= O(1)
2
E[Xi]
Theorem (Dyer-Frieze’91)
2
2
O(t /ε ) samples (O(t/ε ) from each X )
2
i
give
1±ε estimator of “WANTED” with prob≥3/4
How precise do the Xi have to be?
First attempt – term by term
Main idea:
ε
ε
ε
ε
(1±
)(1±
)(1±
)... (1±
) ≈ 1±ε
t
t
t
t
n=
Θ(
V[X]
E[X]2
1
ε2
ln (1/δ)
)
each term Ω (t2) samples ⇒ Ω (t3) total
How precise do the Xi have to be?
Analyzing SCV is better
(Dyer-Frieze’1991)
X=X1 X2 ... Xt
GOAL: SCV(X) ≤ ε2/4
squared coefficient
of variation (SCV)
P( X gives (1±ε)-estimate )
V[X]
≥1E[X]2
1
ε2
How precise do the Xi have to be?
Analyzing SCV is better
(Dyer-Frieze’1991)
Main idea:
ε2/4
2/4
SCV(Xi) ≤
⇒
SCV(X)
<
ε
t
≈
proof:
SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1
SCV(X)=
V[X]
E[X]2
=
E[X2]
E[X]2
-1
How precise
do the X⇒i have
X1, X2 independent
E[X1 X2] =to
E[Xbe?
1]E[X2]
Analyzing
is better
X1, X2SCV
independent
⇒ X12,X22 independent
(Dyer-Frieze’1991)
Main idea:
X1,X2 independent ⇒
SCV(X1X2)=(1+SCV(X1))(1+SCV(X2))-1
ε2/4
2/4
SCV(Xi) ≤
⇒
SCV(X)
<
ε
t
≈
proof:
SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1
SCV(X)=
V[X]
E[X]2
=
E[X2]
E[X]2
-1
How precise do the Xi have to be?
Analyzing SCV is better
(Dyer-Frieze’1991)
Main idea:
X = X1 X2 ... Xt
ε2/4
2/4
SCV(Xi) ≤
⇒ SCV(X) <
ε
≈
t
each term Ο(t /ε2) samples ⇒ Ο(t2/ε2) total
Outline
6. More…
a) proof of Dyer-Frieze
b) independent sets revisited
c) warm starts
Hamiltonian
4
2
1
0
Hamiltonian – many possibilities
2
1
0
(hardcore lattice gas model)
What would be a natural hamiltonian
for planar graphs?
What would be a natural hamiltonian
for planar graphs?
H(G) = number of edges
natural MC
pick u,v uniformly at random
1/(1+λ)
try G - {u,v}
λ/(1+λ)
try G + {u,v}
1/(1+λ)
n(n-1)/2
G
G’
v
v
u
λ/(1+λ)
n(n-1)/2
natural MC
pick u,v uniformly at random
1/(1+λ)
try G - {u,v}
λ/(1+λ)
try G + {u,v}
u
1/(1+λ)
n(n-1)/2
G
G’
v
v
u
λ/(1+λ)
n(n-1)/2
u
π(G) ∝ λnumber of edges
satisfies the detailed balance condition
π(G) P(G,G’) = π(G’) P(G’,G)
(λ = exp(-β))
Outline
6. More…
a) proof of Dyer-Frieze
b) independent sets revisited
c) warm starts
(n=3)
Mixing time:
τmix = smallest t such that
| μt - π |TV ≤ 1/e
Θ(n ln n)
Relaxation time:
τrel = 1/(1-λ2)
Θ(n)
τrel ≤ τmix ≤ τrel ln (1/πmin)
(discrepancy may be substantially bigger for, e.g., matchings)
Estimating π(S)
Mixing time:
τmix = smallest t such that
| μt - π |TV ≤ 1/e
Relaxation time:
τrel = 1/(1-λ2)
X∼π
METHOD 1
{
1
if
X∈
S
Y=
0 otherwise
E[Y]=π(S)
X1
X2
X3
...
Xs
Mixing time:
τmix = smallest t such that
| μt - π |TV ≤ 1/e
Estimating π(S)
Relaxation time:
τrel = 1/(1-λ2)
METHOD 1
X∼π
{
1
if
X∈
S
Y=
0 otherwise
E[Y]=π(S)
X1
X2
X3
...
METHOD 2
(Gillman’98, Kahale’96, ...)
X1
X2
X3
...
Xs
Xs
Further speed-up
Mixing time:
τmix = smallest t such that
| μt - π |TV ≤ 1/e
Relaxation time:
τrel = 1/(1-λ2)
|μt - π |TV ≤ exp(-t/τrel) Varπ(μ0/π)
(∑ π(x)(μ0(x)/π(x)-1)2)1/2
small ⇒ called warm start
METHOD 2
(Gillman’98, Kahale’96, ...)
X1
X2
X3
...
Xs
Further speed-up
Mixing at
time:
sample
β can be used as a
τmix
= smallest
t such that
warm
start
for β’
| μt - π |TV ≤ 1/e
⇔
Relaxation
time: can step
cooling
schedule
= β1/(1-λ2)
relto
from τβ’
|μt - π |TV ≤ exp(-t/τrel) Varπ(μ0/π)
(∑ π(x)(μ0(x)/π(x)-1)2)1/2
small ⇒ called warm start
METHOD 2
(Gillman’98, Kahale’96, ...)
X1
X2
X3
...
Xs
sample at β can be used as a
warm start for β’
⇔
cooling schedule can step
from β’ to β
m=O( (ln n)(ln A) )
β0
β1
β2
β3
βm
....
= “well mixed” states
β0
β1
β2
β3
βm
....
= “well mixed” states
run the our cooling-schedule
algorithm with METHOD 2
using “well mixed” states
as starting points
METHOD 2
X1
Xs
X2
X3
...
Xs
k=O*( (ln A)1/2 )
Output of our algorithm:
β0
β0
β1
βk
small augmentation (so that we can use
sample from current β as a warm start at next)
still O*( (ln A)1/2 )
β3
β2
β1
βm
....
Use analogue of Frieze-Dyer for independent samples
from vector variables with slightly dependent coordinates.
if we have sampler oracles for μβ
then we can get adaptive schedule
of length t=O* ( (ln A)1/2 )
independent sets
O*(n2)
(using Vigoda’01, Dyer-Greenhill’01)
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for β<βC
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2Δ
(using Jerrum’95)
Download