Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovič (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech) Counting independent sets spanning trees matchings perfect matchings k-colorings Compute the number of independent sets (hard-core gas model) independent set subset S of vertices, = of a graph no two in S are neighbors # independent sets = 7 independent set = subset S of vertices no two in S are neighbors # independent sets = 5598861 independent set = subset S of vertices no two in S are neighbors graph G 6 # independent sets in G #P-complete #P-complete even for 3-regular graphs (Dyer, Greenhill, 1997) graph G 6 # independent sets in G ? approximation randomization We would like to know Q Goal: random variable Y such that P( (1-ε)Q ≤ Y ≤ (1+ε)Q ) ≥ 1-δ “Y gives (1±ε)-estimate” (approx) counting ⇔ sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86 the outcome of the JVV reduction: random variables: X1 X2 ... Xt such that 1) E[X X ... X ] = “WANTED” 1 2 t 2) the Xi are easy to estimate V[Xi] squared coefficient = O(1) 2 of variation (SCV) E[Xi] (approx) counting ⇔ sampling 1) E[X1 X2 ... Xt] = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) 2 E[Xi] Theorem (Dyer-Frieze’91) 2 2 O(t /ε ) samples (O(t/ε ) from each X ) 2 i give 1±ε estimator of “WANTED” with prob≥3/4 JVV for independent sets GOAL: given a graph G, estimate the number of independent sets of G 1 # independent sets = P( ) JVV for independent sets P( P( ? ? ? X1 )P( P(A∩B)=P(A)P(B|A) )= ? ? ) P( )P( ) ? X2 Xi ∈ [0,1] and E[Xi] ≥½ X3 ⇒ V[Xi] X4 = O(1) E[Xi]2 Self-reducibility for independent sets P( ? ? ? ) 5 = 7 Self-reducibility for independent sets P( ? ? ? 7 = 5 ) 5 = 7 Self-reducibility for independent sets P( ? ? ? 7 = 5 ) 5 = 7 7 = 5 Self-reducibility for independent sets P( ? ? ) 5 = 3 3 = 5 Self-reducibility for independent sets P( ? ? ) 5 = 3 3 = 5 5 = 3 Self-reducibility for independent sets 7 = 5 7 5 3 = 5 3 2 7 5 = 5 3 =7 JVV: If we have a sampler oracle: graph G SAMPLER ORACLE random independent set of G then FPRAS using O(n2) samples. JVV: If we have a sampler oracle: graph G SAMPLER ORACLE random independent set of G then FPRAS using O(n2) samples. ŠVV: If we have a sampler oracle: β, graph G SAMPLER ORACLE set from gas-model Gibbs at β then FPRAS using O*(n) samples. Application – independent sets O*( |V| ) samples suffice for counting Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O*( |V| ) for graphs of degree ≤ 4. Total running time: O* ( |V|2 ). Other applications matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for β<βC (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2Δ (using Jerrum’95) total running time easy = hot hard = cold Hamiltonian 4 2 1 0 Big set = Ω Hamiltonian H : Ω → {0,...,n} Goal: estimate -1 |H (0)| -1 |H (0)| = E[X1] ... E[Xt ] Distributions between hot and cold β = inverse temperature β = 0 ⇒ hot ⇒ uniform on Ω β = ∞ ⇒ cold ⇒ uniform on H-1(0) μβ (x) ∝ exp(-H(x)β) (Gibbs distributions) Distributions between hot and cold μβ (x) ∝ exp(-H(x)β) exp(-H(x)β) μβ (x) = Z(β) Normalizing factor = partition function Z(β)= ∑ exp(-H(x)β) x∈Ω Partition function Z(β)= ∑ exp(-H(x)β) x∈Ω have: want: Z(0) = |Ω| -1 Z(∞) = |H (0)| Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) graph G β SAMPLER ORACLE subset of V from μβ Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) W ∼ μβ Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) W ∼ μβ X = exp(H(W)(β - α)) Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) W ∼ μβ X = exp(H(W)(β - α)) can obtain the following ratio: E[X] = ∑ μβ(s) X(s) = s∈Ω Z(α) Z(β) Our goal restated Partition function Z(β) = ∑ exp(-H(x)β) x∈Ω Goal: estimate Z(∞) = Z(β1) Z(β2) Z(β0) Z(β1) -1 Z(∞)=|H (0)| ... Z(βt) Z(βt-1) β0 = 0 < β1 < β 2 < ... < βt = ∞ Z(0) Our goal restated Z(∞) = Z(β1) Z(β2) Z(β0) Z(β1) ... Z(βt) Z(βt-1) Z(0) Cooling schedule: β0 = 0 < β1 < β 2 < ... < βt = ∞ How to choose the cooling schedule? minimize length, while satisfying V[Xi] E[Xi]2 = O(1) E[Xi] = Z(βi) Z(βi-1) Parameters: A and n Z(β) = ∑ exp(-H(x)β) x∈Ω Z(0) = A H:Ω → {0,...,n} n Z(β) = ∑ ak e-β k k=0 ak = |H-1(k)| Parameters Z(0) = A H:Ω → {0,...,n} A n V 2 E ≈ V! V perfect matchings V! V k-colorings V k E independent sets matchings Previous cooling schedules Z(0) = A H:Ω → {0,...,n} β0 = 0 < β1 < β 2 < ... < βt = ∞ “Safe steps” β → β + 1/n (Bezáková,Štefankovič, β → β (1 + 1/ln A) Vigoda,V.Vazirani’06) ln A → ∞ Cooling schedules of length O( n ln A) O( (ln n) (ln A) ) (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) No better fixed schedule possible Z(0) = A H:Ω → {0,...,n} A schedule that works for all -βn A Za(β) = (1 + a e ) 1+a (with a∈[0,A-1]) has LENGTH ≥ Ω( (ln n)(ln A) ) Parameters Z(0) = A H:Ω → {0,...,n} Our main result: can get adaptive schedule * 1/2 of length O ( (ln A) ) Previously: non-adaptive schedules of length Ω*( ln A ) Related work can get adaptive schedule * 1/2 of length O ( (ln A) ) Lovász-Vempala Volume of convex bodies in O*(n4) schedule of length O(n1/2) (non-adaptive cooling schedule) Existential part Lemma: for every partition function there exists a cooling schedule of length O*((ln A)1/2) s t s i x e e r e h t can get adaptive schedule of length O* ( (ln A)1/2 ) Express SCV using partition function (going from β to α) E[X] = W ∼ μβ E[X2] 2 E[X] Z(α) Z(β) X = exp(H(W)(β - α)) = Z(2α-β) Z(β) Z(α)2 ≤ C E[X2] 2 E[X] β α = Z(2α-β) Z(β) Z(α)2 ≤ C 2α-β f(γ)=ln Z(γ) Proof: ≤ C’=(ln C)/2 f(γ)=ln Z(γ) Proof: 1 f is decreasing f is convex f’(0) ≥ –n f(0) ≤ ln A either f or f’ changes a lot Let K:=Δf 1 Δ(ln |f’|) ≥ K f:[a,b] → R, convex, decreasing can be “approximated” using f’(a) (f(a)-f(b)) f’(b) segments Technicality: getting to 2α-β Proof: β α 2α-β Technicality: getting to 2α-β βi Proof: β βi+1 α 2α-β Technicality: getting to 2α-β βi Proof: β βi+1 α 2α-β βi+2 Technicality: getting to 2α-β βi Proof: ln ln A extra steps β βi+1 α 2α-β βi+2 βi+3 Existential → Algorithmic s t s i x e e r e th can get adaptive schedule of length O* ( (ln A)1/2 ) can get adaptive schedule * 1/2 of length O ( (ln A) ) Algorithmic construction Our main result: using a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) we can construct a cooling schedule of length ≤ 38 (ln A)1/2(ln ln A)(ln n) Total number of oracle calls ≤ 107 (ln A) (ln ln A+ln n)7 ln (1/δ) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = Z(α) Z(β) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = X is “easy to estimate” Z(α) Z(β) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = Z(α) Z(β) we make progress (assuming B1>1) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = need to construct a “feeler” for this Z(α) Z(β) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 = E[X] = Z(β) Z(2β−α) Z(α) Z(α) need to construct a “feeler” for this Z(α) Z(β) Algorithmic construction current inverse temperature β bad “feeler” ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 = E[X] = Z(β) Z(2β−α) Z(α) Z(α) need to construct a “feeler” for this Z(α) Z(β) Rough estimator for n Z(β) = ∑ Z(β) Z(α) ak e-β k k=0 For W ∼ μβ we have P(H(W)=k) = ak e-β k Z(β) Rough estimator for Z(β) Z(α) If H(X)=k likely at both α, β ⇒ rough n estimator Z(β) = ∑ ak e-β k k=0 For W ∼ μβ we have P(H(W)=k) = For U ∼ μα we have P(H(U)=k) = ak e-β k Z(β) ak e-α k Z(α) Rough estimator for Z(β) Z(α) For W ∼ μβ we have P(H(W)=k) = For U ∼ μα we have P(H(U)=k) = P(H(U)=k) k(α-β) Z(β) = e P(H(W)=k) Z(α) ak e-β k Z(β) ak e-α k Z(α) Z(β) Rough estimator for Z(α) n Z(β) = ∑ ak e-β k k=0 For W ∼ μβ we have P(H(W)∈[c,d]) = d ∑ ak e-β k k=c Z(β) Rough estimator for If |α-β|⋅ |d-c| ≤ 1 then Z(β) Z(α) Z(β) 1 Z(β) P(H(U)∈[c,d]) ec(α-β) ≤ ≤e P(H(W)∈[c,d]) e Z(α) Z(α) We also need P(H(U) ∈ [c,d]) P(H(W) ∈ [c,d]) to be large. Split {0,1,...,n} into h ≤ 4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature β there exists a interval with P(H(W)∈ I) ≥ 1/8h We say that I is HEAVY for β Algorithm repeat find an interval I which is heavy for the current inverse temperature β see how far I is heavy (until some β*) use the interval I for the feeler either * make progress, or * eliminate the interval I Z(β) Z(2β−α) Z(α) Z(α) Algorithm repeat find an interval I which is heavy for the current inverse temperature β see how far I is heavy (until some β*) use the interval I for the feeler either * make progress, or * eliminate the interval I * or make a “long move” Z(β) Z(2β−α) Z(α) Z(α) if we have sampler oracles for μβ then we can get adaptive schedule of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for β<βC (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2Δ (using Jerrum’95) Appendix – proof of: 1) E[X1 X2 ... Xt] = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) 2 E[Xi] Theorem (Dyer-Frieze’91) 2 2 O(t /ε ) samples (O(t/ε ) from each X ) 2 i give 1±ε estimator of “WANTED” with prob≥3/4 The Bienaymé-Chebyshev inequality P( Y gives (1±ε)-estimate ) V[Y] ≥1E[Y]2 Y= 1 ε2 X1 + X2 + ... + Xn n The Bienaymé-Chebyshev inequality P( Y gives (1±ε)-estimate ) squared coefficient of variation SCV V[Y] 1 = 2 E[Y] n V[Y] ≥1E[Y]2 V[X] E[X]2 Y= 1 ε2 X1 + X2 + ... + Xn n The Bienaymé-Chebyshev inequality Let X1,...,Xn,X be independent, identically distributed random variables, Q=E[X]. Let Y= X1 + X2 + ... + Xn n Then P( Y gives (1±ε)-estimate of Q ) V[X] 1 ≥12 2 ε n E[X] Chernoff’s bound Let X1,...,Xn,X be independent, identically distributed random variables, 0 ≤ X ≤ 1, Q=E[X]. Let Y= X1 + X2 + ... + Xn n Then P( Y gives (1±ε)-estimate of Q ) ≥1– e - ε2 . n . E[X] / 3 V[X] n= E[X]2 n= 0≤X≤1 1 E[X] 3 ln (1/δ) ε2 1 ε2 1 δ 0≤X≤1 n= n= 0≤X≤1 1 E[X] 1 E[X] 3 ln (1/δ) ε2 1 ε2 1 δ Median “boosting trick” n= 1 E[X] 4 ε2 Y= P( ∈ X1 + X2 + ... + Xn n ) ≥ 3/4 = (1-ε)Q Y (1+ε)Q Median trick – repeat 2T times (1-ε)Q P( ∈ (1+ε)Q ) ≥ 3/4 ⇒ P( > T out of 2T )≥1-e ⇒ P( median is in ) ≥1-e -T/4 -T/4 0≤X≤1 n= 1 E[X] + median trick n= 0≤X≤1 1 E[X] 3 ε2 ln (1/δ) 32 ln (1/δ) 2 ε n= V[X] 32 ln (1/δ) E[X]2 ε2 + median trick n= 0≤X≤1 1 E[X] 3 ε2 ln (1/δ) Appendix – proof of: 1) E[X1 X2 ... Xt] = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) 2 E[Xi] Theorem (Dyer-Frieze’91) 2 2 O(t /ε ) samples (O(t/ε ) from each X ) 2 i give 1±ε estimator of “WANTED” with prob≥3/4 How precise do the Xi have to be? First attempt – Chernoff’s bound How precise do the Xi have to be? First attempt – Chernoff’s bound Main idea: ε ε ε ε (1± )(1± )(1± )... (1± ) ≈ 1±ε t t t t How precise do the Xi have to be? First attempt – Chernoff’s bound Main idea: ε ε ε ε (1± )(1± )(1± )... (1± ) ≈ 1±ε t t t t n= Θ( 1 E[X] 1 ε2 ln (1/δ) ) each term Ω (t2) samples ⇒ Ω (t3) total How precise do the Xi have to be? Bienaymé-Chebyshev is better (Dyer-Frieze’1991) X=X1 X2 ... Xt GOAL: SCV(X) ≤ ε2/4 squared coefficient of variation (SCV) P( X gives (1±ε)-estimate ) V[X] ≥1E[X]2 1 ε2 How precise do the Xi have to be? Bienaymé-Chebyshev is better (Dyer-Frieze’1991) Main idea: ε2/4 2/4 SCV(Xi) ≤ ⇒ SCV(X) < ε ≈ t SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1 SCV(X)= V[X] E[X]2 = E[X2] E[X]2 -1 How precise do the Xi have to be? Bienaymé-Chebyshev is better (Dyer-Frieze’1991) Main idea: X = X1 X2 ... Xt ε2/4 2/4 SCV(Xi) ≤ ⇒ SCV(X) < ε ≈ t each term Ο(t /ε2) samples ⇒ Ο(t2/ε2) total if we have sampler oracles for μβ then we can get adaptive schedule of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for β<βC (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2Δ (using Jerrum’95)