Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovič (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech) Counting independent sets spanning trees matchings perfect matchings k-colorings (approx) counting ⇔ sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86, the outcome of the JVV reduction: random variables: X1 X2 ... Xt such that 1) E[X X ... X ] = “WANTED” 1 2 t 2) the Xi are easy to estimate V[Xi] squared coefficient = O(1) 2 of variation (SCV) E[Xi] (approx) counting ⇔ sampling 1) E[X1 X2 ... Xt] = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) 2 E[Xi] Theorem (Dyer-Frieze’91) 2 2 O(t /ε ) samples (O(t/ε ) from each X ) 2 i give 1±ε estimator of “WANTED” with prob≥3/4 JVV for independent sets GOAL: given a graph G, estimate the number of independent sets of G 1 # independent sets = P( ) JVV for independent sets P( P( ? ? ? X1 )P( P(A∩B)=P(A)P(B|A) )= ? ? ) P( )P( ) ? X2 Xi ∈ [0,1] and E[Xi] ≥½ X3 ⇒ V[Xi] X4 = O(1) E[Xi]2 JVV: If we have a sampler oracle: graph G SAMPLER ORACLE random independent set of G then FPRAS using O(n2) samples. JVV: If we have a sampler oracle: graph G SAMPLER ORACLE random independent set of G then FPRAS using O(n2) samples. ŠVV: If we have a sampler oracle: β, graph G SAMPLER ORACLE set from gas-model Gibbs at β then FPRAS using O*(n) samples. Application – independent sets O*( |V| ) samples suffice for counting Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O*( |V| ) for graphs of degree ≤ 4. Total running time: O* ( |V|2 ). Other applications matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for β<βC (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2Δ (using Jerrum’95) total running time easy = hot hard = cold Hamiltonian 4 2 1 0 Big set = Ω Hamiltonian H : Ω → {0,...,n} Goal: estimate -1 |H (0)| -1 |H (0)| = E[X1] ... E[Xt ] Distributions between hot and cold β = inverse temperature β = 0 ⇒ hot ⇒ uniform on Ω β = ∞ ⇒ cold ⇒ uniform on H-1(0) μβ (x) ∝ exp(-H(x)β) (Gibbs distributions) Distributions between hot and cold μβ (x) ∝ exp(-H(x)β) exp(-H(x)β) μβ (x) = Z(β) Normalizing factor = partition function Z(β)= ∑ exp(-H(x)β) x∈Ω Partition function Z(β)= ∑ exp(-H(x)β) x∈Ω have: want: Z(0) = |Ω| -1 Z(∞) = |H (0)| Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) graph G β SAMPLER ORACLE subset of V from μβ Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) W ∼ μβ Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) W ∼ μβ X = exp(H(W)(β - α)) Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) W ∼ μβ X = exp(H(W)(β - α)) can obtain the following ratio: E[X] = ∑ μβ(s) X(s) = s∈Ω Z(α) Z(β) Our goal restated Partition function Z(β) = ∑ exp(-H(x)β) x∈Ω Goal: estimate Z(∞) = Z(β1) Z(β2) Z(β0) Z(β1) -1 Z(∞)=|H (0)| ... Z(βt) Z(βt-1) β0 = 0 < β1 < β 2 < ... < βt = ∞ Z(0) Our goal restated Z(∞) = Z(β1) Z(β2) Z(β0) Z(β1) ... Z(βt) Z(βt-1) Z(0) Cooling schedule: β0 = 0 < β1 < β 2 < ... < βt = ∞ How to choose the cooling schedule? minimize length, while satisfying V[Xi] E[Xi]2 = O(1) E[Xi] = Z(βi) Z(βi-1) Parameters: A and n Z(β) = ∑ exp(-H(x)β) x∈Ω Z(0) = A H:Ω → {0,...,n} n Z(β) = ∑ ak e-β k k=0 ak = |H-1(k)| Parameters Z(0) = A H:Ω → {0,...,n} A n V 2 E ≈ V! V perfect matchings V! V k-colorings V k E independent sets matchings Previous cooling schedules Z(0) = A H:Ω → {0,...,n} β0 = 0 < β1 < β 2 < ... < βt = ∞ “Safe steps” β → β + 1/n (Bezáková,Štefankovič, β → β (1 + 1/ln A) Vigoda,V.Vazirani’06) ln A → ∞ Cooling schedules of length O( n ln A) O( (ln n) (ln A) ) (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) No better fixed schedule possible Z(0) = A H:Ω → {0,...,n} A schedule that works for all -βn A Za(β) = (1 + a e ) 1+a (with a∈[0,A-1]) has LENGTH ≥ Ω( (ln n)(ln A) ) Parameters Z(0) = A H:Ω → {0,...,n} Our main result: can get adaptive schedule * 1/2 of length O ( (ln A) ) Previously: non-adaptive schedules of length Ω*( ln A ) Existential part Lemma: for every partition function there exists a cooling schedule of length O*((ln A)1/2) s t s i x e e r e h t can get adaptive schedule of length O* ( (ln A)1/2 ) Express SCV using partition function (going from β to α) E[X] = W ∼ μβ E[X2] 2 E[X] Z(α) Z(β) X = exp(H(W)(β - α)) = Z(2α-β) Z(β) Z(α)2 ≤ C E[X2] 2 E[X] β α = Z(2α-β) Z(β) Z(α)2 ≤ C 2α-β f(γ)=ln Z(γ) Proof: ≤ C’=(ln C)/2 f(γ)=ln Z(γ) Proof: 1 f is decreasing f is convex f’(0) ≥ –n f(0) ≤ ln A either f or f’ changes a lot Let K:=Δf 1 Δ(ln |f’|) ≥ K f:[a,b] → R, convex, decreasing can be “approximated” using f’(a) (f(a)-f(b)) f’(b) segments Technicality: getting to 2α-β Proof: β α 2α-β Technicality: getting to 2α-β βi Proof: β βi+1 α 2α-β Technicality: getting to 2α-β βi Proof: β βi+1 α 2α-β βi+2 Technicality: getting to 2α-β βi Proof: ln ln A extra steps β βi+1 α 2α-β βi+2 βi+3 Existential → Algorithmic s t s i x e e r e th can get adaptive schedule of length O* ( (ln A)1/2 ) can get adaptive schedule * 1/2 of length O ( (ln A) ) Algorithmic construction Our main result: using a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) we can construct a cooling schedule of length ≤ 38 (ln A)1/2(ln ln A)(ln n) Total number of oracle calls ≤ 107 (ln A) (ln ln A+ln n)7 ln (1/δ) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = Z(α) Z(β) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = X is “easy to estimate” Z(α) Z(β) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = Z(α) Z(β) we make progress (assuming B1>1) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = need to construct a “feeler” for this Z(α) Z(β) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 = E[X] = Z(β) Z(2β−α) Z(α) Z(α) need to construct a “feeler” for this Z(α) Z(β) Algorithmic construction current inverse temperature β bad “feeler” ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 = E[X] = Z(β) Z(2β−α) Z(α) Z(α) need to construct a “feeler” for this Z(α) Z(β) Rough estimator for n Z(β) = ∑ Z(β) Z(α) ak e-β k k=0 For W ∼ μβ we have P(H(W)=k) = ak e-β k Z(β) Rough estimator for Z(β) Z(α) If H(X)=k likely at both α, β ⇒ rough n estimator Z(β) = ∑ ak e-β k k=0 For W ∼ μβ we have P(H(W)=k) = For U ∼ μα we have P(H(U)=k) = ak e-β k Z(β) ak e-α k Z(α) Rough estimator for Z(β) Z(α) For W ∼ μβ we have P(H(W)=k) = For U ∼ μα we have P(H(U)=k) = P(H(U)=k) k(α-β) Z(β) = e P(H(W)=k) Z(α) ak e-β k Z(β) ak e-α k Z(α) Z(β) Rough estimator for Z(α) n Z(β) = ∑ ak e-β k k=0 For W ∼ μβ we have P(H(W)∈[c,d]) = d ∑ ak e-β k k=c Z(β) Rough estimator for If |α-β|⋅ |d-c| ≤ 1 then Z(β) Z(α) Z(β) 1 Z(β) P(H(U)∈[c,d]) ec(α-β) ≤ ≤e P(H(W)∈[c,d]) e Z(α) Z(α) We also need P(H(U) ∈ [c,d]) P(H(W) ∈ [c,d]) to be large. Split {0,1,...,n} into h ≤ 4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature β there exists a interval with P(H(W)∈ I) ≥ 1/8h We say that I is HEAVY for β Algorithm repeat find an interval I which is heavy for the current inverse temperature β see how far I is heavy (until some β*) use the interval I for the feeler either * make progress, or * eliminate the interval I Z(β) Z(2β−α) Z(α) Z(α) Algorithm repeat find an interval I which is heavy for the current inverse temperature β see how far I is heavy (until some β*) use the interval I for the feeler either * make progress, or * eliminate the interval I * or make a “long move” Z(β) Z(2β−α) Z(α) Z(α) if we have sampler oracles for μβ then we can get adaptive schedule of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for β<βC (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2Δ (using Jerrum’95)