Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovič (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech) Adaptive annealing: a near-optimal connection between counting If sampling you want and to count using MCMC then statistical physics Daniel Štefankovič is useful. (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech) Outline 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More… Counting independent sets spanning trees matchings perfect matchings k-colorings Counting independent sets spanning trees matchings perfect matchings k-colorings Compute the number of spanning trees Compute the number of spanning trees Kirchhoff’s Matrix Tree Theorem: 2 0 0 0 0 2 0 0 0 0 2 0 D 0 0 0 2 - 0 1 0 1 1 0 1 0 0 1 0 1 A 1 0 1 0 det(D – A)vv 2 -1 0 det -1 2 -1 0 -1 2 Compute the number of spanning trees G polynomial-time algorithm number of spanning trees of G Counting independent sets spanning trees matchings perfect matchings k-colorings ? Compute the number of independent sets (hard-core gas model) independent set subset S of vertices, = of a graph no two in S are neighbors # independent sets = 7 independent set = subset S of vertices no two in S are neighbors # independent sets = G1 G2 G3 Gn-2 ... Gn-1 ... Gn ... # independent sets = 2 G1 3 G2 5 G3 Gn-2 ... Fn-1 Gn-1 ... Fn Gn ... Fn+1 # independent sets = 5598861 independent set = subset S of vertices no two in S are neighbors Compute the number of independent sets G polynomial-time algorithm ? number of independent sets of G Compute the number of independent sets G polynomial-time algorithm ) y l ke i l (un ! number of independent sets of G graph G 6 # independent sets in G #P-complete #P NP FP P #P-complete even for 3-regular graphs (Dyer, Greenhill, 1997) graph G 6 # independent sets in G ? approximation randomization graph G 6 # independent sets in G ? approximation randomization which is more important? graph G 6 # independent sets in G My world-view: ? (true) randomness is important conceptually but NOT computationally (i.e., I believe P=BPP). which is approximation makes problems approximation more important? easier (i.e., I believe #P=BPP) randomization We would like to know Q Goal: random variable Y such that P( (1-ε)Q ≤ Y ≤ (1+ε)Q ) ≥ 1-δ “Y gives (1±ε)-estimate” We would like to know Q Goal: random variable Y such that P( (1-ε)Q ≤ Y ≤ (1+ε)Q ) ≥ 1-δ (fully polynomial randomized approximation scheme): FPRAS: G,ε,δ polynomial-time algorithm Y Outline 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... We would like to know Q 1. Get an unbiased estimator X, i. e., E[X] = Q 2. “Boost the quality” of X: X1 + X2 + ... + Xn Y= n The Bienaymé-Chebyshev inequality P( Y gives (1±ε)-estimate ) ≥1- V[Y] E[Y]2 1 ε2 The Bienaymé-Chebyshev inequality P( Y gives (1±ε)-estimate ) ≥1- V[Y] E[Y]2 1 ε2 squared coefficient of variation SCV Y= X1 + X2 + ... + Xn n ⇒ V[Y] 1 = 2 E[Y] n V[X] E[X]2 The Bienaymé-Chebyshev inequality Let X1,...,Xn,X be independent, identically distributed random variables, Q=E[X]. Let Y= X1 + X2 + ... + Xn n Then P( Y gives (1±ε)-estimate of Q ) V[X] 1 ≥12 2 ε n E[X] Chernoff’s bound Let X1,...,Xn,X be independent, identically distributed random variables, 0 ≤ X ≤ 1, Q=E[X]. Let Y= X1 + X2 + ... + Xn n Then P( Y gives (1±ε)-estimate of Q ) ≥1– e - ε2 . n . E[X] / 3 V[X] n= E[X]2 1 ε2 1 δ Number of samples to achieve precision ε with confidence δ. n= 0≤X≤1 1 E[X] 3 ln (1/δ) ε2 BAD V[X] n= E[X]2 1 ε2 1 δ Number of samples to achieve precision ε with confidence δ. n= 0≤X≤1 1 E[X] BAD 3 ln (1/δ) ε2 GOOD Median “boosting trick” n= 1 E[X] 4 ε2 Y= X1 + X2 + ... + Xn n BY BIENAYME-CHEBYSHEV: P( ∈ ) ≥ 3/4 = (1-ε)Q Y (1+ε)Q Median trick – repeat 2T times (1-ε)Q BY BIENAYME-CHEBYSHEV: P( ∈ P( ) ≥ 3/4 ⇒ BY CHERNOFF: (1+ε)Q > T out of 2T )≥1-e ⇒ P( median is in ) ≥1-e -T/4 -T/4 n= V[X] 32 ln (1/δ) 2 2 E[X] ε + median trick n= 0≤X≤1 1 E[X] BAD 3 ε2 ln (1/δ) Creating “approximator” from X ε = precision δ = confidence n= Θ( V[X] 1 E[X]2 ε2 ln (1/δ) ) Outline 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... (approx) counting ⇔ sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86 the outcome of the JVV reduction: random variables: X1 X2 ... Xt such that 1) E[X X ... X ] = “WANTED” 1 2 t 2) the Xi are easy to estimate V[Xi] squared coefficient = O(1) 2 of variation (SCV) E[Xi] (approx) counting ⇔ sampling 1) E[X1 X2 ... Xt] = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) 2 E[Xi] Theorem (Dyer-Frieze’91) 2 2 O(t /ε ) samples (O(t/ε ) from each X ) 2 i give 1±ε estimator of “WANTED” with prob≥3/4 JVV for independent sets GOAL: given a graph G, estimate the number of independent sets of G 1 # independent sets = P( ) JVV for independent sets P( P( ? ? ? X1 )P( P(A∩B)=P(A)P(B|A) )= ? ? ) P( )P( ) ? X2 Xi ∈ [0,1] and E[Xi] ≥½ X3 ⇒ V[Xi] X4 = O(1) E[Xi]2 JVV for independent sets P( P( ? ? ? X1 )P( P(A∩B)=P(A)P(B|A) )= ? ? ) P( )P( ) ? X2 Xi ∈ [0,1] and E[Xi] ≥½ X3 ⇒ V[Xi] X4 = O(1) E[Xi]2 Self-reducibility for independent sets P( ? ? ? ) 5 = 7 Self-reducibility for independent sets P( ? ? ? 7 = 5 ) 5 = 7 Self-reducibility for independent sets P( ? ? ? 7 = 5 ) 5 = 7 7 = 5 Self-reducibility for independent sets P( ? ? ) 5 = 3 3 = 5 Self-reducibility for independent sets P( ? ? ) 5 = 3 3 = 5 5 = 3 Self-reducibility for independent sets 7 = 5 7 5 3 = 5 3 2 7 5 = 5 3 =7 JVV: If we have a sampler oracle: graph G SAMPLER ORACLE random independent set of G then FPRAS using O(n2) samples. JVV: If we have a sampler oracle: graph G SAMPLER ORACLE random independent set of G then FPRAS using O(n2) samples. ŠVV: If we have a sampler oracle: β, graph G SAMPLER ORACLE set from gas-model Gibbs at β then FPRAS using O*(n) samples. Application – independent sets O*( |V| ) samples suffice for counting Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O*( |V| ) for graphs of degree ≤ 4. Total running time: O* ( |V|2 ). Other applications matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for β<βC (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2Δ (using Jerrum’95) total running time Outline 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More… easy = hot hard = cold Hamiltonian 4 2 1 0 Big set = Ω Hamiltonian H : Ω → {0,...,n} Goal: estimate -1 |H (0)| -1 |H (0)| = E[X1] ... E[Xt ] Distributions between hot and cold β = inverse temperature β = 0 ⇒ hot ⇒ uniform on Ω β = ∞ ⇒ cold ⇒ uniform on H-1(0) μβ (x) ∝ exp(-H(x)β) (Gibbs distributions) Distributions between hot and cold μβ (x) ∝ exp(-H(x)β) exp(-H(x)β) μβ (x) = Z(β) Normalizing factor = partition function Z(β)= ∑ exp(-H(x)β) x∈Ω Partition function Z(β)= ∑ exp(-H(x)β) x∈Ω have: want: Z(0) = |Ω| -1 Z(∞) = |H (0)| Partition function - example Z(β)= ∑ exp(-H(x)β) x∈Ω 4 2 1 0 have: want: Z(0) = |Ω| Z(∞) = |H-1(0)| Z(β) = 1 e-4.β + 4 e-2.β + 4 e-1.β + 7 e-0.β Z(0) = 16 Z(∞)=7 Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) graph G β SAMPLER ORACLE subset of V from μβ Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) W ∼ μβ Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) W ∼ μβ X = exp(H(W)(β - α)) Assumption: we have a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) W ∼ μβ X = exp(H(W)(β - α)) can obtain the following ratio: E[X] = ∑ μβ(s) X(s) = s∈Ω Z(α) Z(β) Our goal restated Partition function Z(β) = ∑ exp(-H(x)β) x∈Ω Goal: estimate Z(∞) = Z(β1) Z(β2) Z(β0) Z(β1) -1 Z(∞)=|H (0)| ... Z(βt) Z(βt-1) β0 = 0 < β1 < β 2 < ... < βt = ∞ Z(0) Our goal restated Z(∞) = Z(β1) Z(β2) Z(β0) Z(β1) ... Z(βt) Z(βt-1) Z(0) Cooling schedule: β0 = 0 < β1 < β 2 < ... < βt = ∞ How to choose the cooling schedule? minimize length, while satisfying V[Xi] E[Xi]2 = O(1) E[Xi] = Z(βi) Z(βi-1) Our goal restated Z(∞) = Z(β1) Z(β2) Z(β0) Z(β1) ... Z(βt) Z(βt-1) Z(0) Cooling schedule: β0 = 0 < β1 < β 2 < ... < βt = ∞ How to choose the cooling schedule? minimize length, while satisfying V[Xi] E[Xi]2 = O(1) E[Xi] = Z(βi) Z(βi-1) Outline 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Parameters: A and n Z(β) = ∑ exp(-H(x)β) x∈Ω Z(0) = A H:Ω → {0,...,n} n Z(β) = ∑ ak e-β k k=0 ak = |H-1(k)| Parameters Z(0) = A H:Ω → {0,...,n} A n V 2 E ≈ V! V perfect matchings V! V k-colorings V k E independent sets matchings Parameters Z(0) = A H:Ω → {0,...,n} n A independent sets matchings 2V E ≈ V! V V! V V k E perfect matchings matchings = # ways of marrying them so that no unhappy couple k-colorings Parameters Z(0) = A H:Ω → {0,...,n} n A independent sets matchings 2V E ≈ V! V V! V V k E perfect matchings matchings = # ways of marrying them so that no unhappy couple k-colorings Parameters Z(0) = A H:Ω → {0,...,n} n A independent sets matchings 2V E ≈ V! V V! V V k E perfect matchings matchings = # ways of marrying them so that no unhappy couple k-colorings Parameters Z(0) = A H:Ω → {0,...,n} n A independent sets matchings 2V E ≈ V! V V! V V k E perfect marry ignoringmatchings “compatibility” hamiltonian = number of unhappy couples k-colorings Parameters Z(0) = A H:Ω → {0,...,n} A n V 2 E ≈ V! V perfect matchings V! V k-colorings V k E independent sets matchings Previous cooling schedules Z(0) = A H:Ω → {0,...,n} β0 = 0 < β1 < β 2 < ... < βt = ∞ “Safe steps” β → β + 1/n (Bezáková,Štefankovič, β → β (1 + 1/ln A) Vigoda,V.Vazirani’06) ln A → ∞ Cooling schedules of length O( n ln A) O( (ln n) (ln A) ) (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) Previous cooling schedules Z(0) = A H:Ω → {0,...,n} β0 = 0 < β1 < β 2 < ... < βt = ∞ “Safe steps” β → β + 1/n (Bezáková,Štefankovič, β → β (1 + 1/ln A) Vigoda,V.Vazirani’06) ln A → ∞ Cooling schedules of length O( n ln A) O( (ln n) (ln A) ) (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) “Safe steps” β → β + 1/n β → β (1 + 1/ln A) ln A → ∞ W ∼ μβ (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) X = exp(H(W)(β - α)) n Z(β) = ∑ ak e-β k k=0 1/e ≤ X ≤ 1 V[X] E[X]2 ≤ 1 E[X] ≤e “Safe steps” β → β + 1/n β → β (1 + 1/ln A) ln A → ∞ W ∼ μβ (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) X = exp(H(W)(β - α)) n Z(β) = ∑ ak e-β k k=0 Z(∞) = a0 ≥ 1 Z(ln A) ≤ a0 + 1 E[X] ≥ 1/2 “Safe steps” β → β + 1/n β → β (1 + 1/ln A) ln A → ∞ W ∼ μβ (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) X = exp(H(W)(β - α)) n Z(β) = ∑ ak e-β k k=0 E[X] ≥ 1/2e Previous cooling schedules 1/n, 2/n, 3/n, .... , (ln A)/n, .... , ln A “Safe steps” β → β + 1/n (Bezáková,Štefankovič, β → β (1 + 1/ln A) Vigoda,V.Vazirani’06) ln A → ∞ Cooling schedules of length O( n ln A) O( (ln n) (ln A) ) (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) No better fixed schedule possible Z(0) = A H:Ω → {0,...,n} THEOREM: A schedule that works for all -βn A Za(β) = (1 + a e ) 1+a (with a∈[0,A-1]) has LENGTH ≥ Ω( (ln n)(ln A) ) Parameters Z(0) = A H:Ω → {0,...,n} Our main result: can get adaptive schedule * 1/2 of length O ( (ln A) ) Previously: non-adaptive schedules of length Ω*( ln A ) Related work can get adaptive schedule * 1/2 of length O ( (ln A) ) Lovász-Vempala Volume of convex bodies in O*(n4) schedule of length O(n1/2) (non-adaptive cooling schedule, using specific properties of the “volume” partition functions) Existential part Lemma: for every partition function there exists a cooling schedule of length O*((ln A)1/2) s t s i x e e r e h t can get adaptive schedule of length O* ( (ln A)1/2 ) Cooling schedule (definition refresh) Z(∞) = Z(β1) Z(β2) Z(β0) Z(β1) ... Z(βt) Z(βt-1) Z(0) Cooling schedule: β0 = 0 < β1 < β 2 < ... < βt = ∞ How to choose the cooling schedule? minimize length, while satisfying V[Xi] E[Xi]2 = O(1) E[Xi] = Z(βi) Z(βi-1) Express SCV using partition function (going from β to α) E[X] = W ∼ μβ E[X2] = 2 E[X] V[X] +1 2 E[X] Z(α) Z(β) X = exp(H(W)(β - α)) = Z(2α-β) Z(β) Z(α)2 ≤ C E[X2] 2 E[X] β α = Z(2α-β) Z(β) Z(α)2 ≤ C 2α-β f(γ)=ln Z(γ) graph of f (f(2α-β) + f(β))/2 ≤ (ln C)/2 + f(α) Proof: ≤ C’=(ln C)/2 Properties of partition functions f(γ)=ln Z(γ) f is decreasing f is convex f’(0) ≥ –n f(0) ≤ ln A Properties of partition functions f is decreasing f is convex f’(0) ≥ –n f(0) ≤ ln A f(γ)=ln Z(γ) n f(β) = ln ∑ k=0 ak e-β k f’(β) (ln f)’ = f’ f n ∑ ak k e-β k k=0 = n ak e-β k k=0 ∑ GOAL: proving Lemma: for every partition function there exists a cooling schedule of length O*((ln A)1/2) f(γ)=ln Z(γ) Proof: 1 f is decreasing f is convex f’(0) ≥ –n f(0) ≤ ln A either f or f’ changes a lot Let K:=Δf Then 1 Δ(ln |f’|) ≥ K Proof: 1 a Let K:=Δf Then 1 Δ(ln |f’|) ≥ K c b c := (a+b)/2, Δ := b-a have f(c) = (f(a)+f(b))/2 – 1 f is convex (f(a) – f(c)) /Δ ≤ f’(a) (f(c) – f(b)) /Δ ≥ f’(b) f’(b) ≤ 1-1/Δf ≤ e-Δf f’(a) Let K:=Δf Then 1 Δ(ln |f’|) ≥ K c := (a+b)/2, Δ := b-a have f(c) = (f(a)+f(b))/2 – 1 f is convex (f(a) – f(c)) /Δ ≤ f’(a) (f(c) – f(b)) /Δ ≥ f’(b) f:[a,b] → R, convex, decreasing can be “approximated” using f’(a) (f(a)-f(b)) f’(b) segments Technicality: getting to 2α-β Proof: β α 2α-β Technicality: getting to 2α-β βi Proof: β βi+1 α 2α-β Technicality: getting to 2α-β βi Proof: β βi+1 α 2α-β βi+2 Technicality: getting to 2α-β βi Proof: ln ln A extra steps β βi+1 α 2α-β βi+2 βi+3 Existential → Algorithmic s t s i x e e r e th can get adaptive schedule of length O* ( (ln A)1/2 ) can get adaptive schedule * 1/2 of length O ( (ln A) ) Algorithmic construction Our main result: using a sampler oracle for μβ exp(-H(x)β) μβ (x) = Z(β) we can construct a cooling schedule of length ≤ 38 (ln A)1/2(ln ln A)(ln n) Total number of oracle calls ≤ 107 (ln A) (ln ln A+ln n)7 ln (1/δ) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = Z(α) Z(β) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = X is “easy to estimate” Z(α) Z(β) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = we make progress (where B1>1) Z(α) Z(β) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 E[X] = need to construct a “feeler” for this Z(α) Z(β) Algorithmic construction current inverse temperature β ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 = E[X] = Z(β) Z(2β−α) Z(α) Z(α) need to construct a “feeler” for this Z(α) Z(β) Algorithmic construction current inverse temperature β bad “feeler” ideally move to α such that B1 ≤ E[X2] E[X]2 ≤ B2 = E[X] = Z(β) Z(2β−α) Z(α) Z(α) need to construct a “feeler” for this Z(α) Z(β) estimator for n Z(β) = ∑ Z(β) Z(α) ak e-β k k=0 For W ∼ μβ we have P(H(W)=k) = ak e-β k Z(β) estimator for Z(β) Z(α) If H(X)=k likely at both α, β ⇒ estimator n Z(β) = ∑ ak e-β k k=0 For W ∼ μβ we have P(H(W)=k) = For U ∼ μα we have P(H(U)=k) = ak e-β k Z(β) ak e-α k Z(α) estimator for Z(β) Z(α) If H(X)=k likely at both α, β ⇒ estimator n Z(β) = ∑ ak e-β k k=0 For W ∼ μβ we have P(H(W)=k) = For U ∼ μα we have P(H(U)=k) = ak e-β k Z(β) ak e-α k Z(α) estimator for Z(β) Z(α) For W ∼ μβ we have P(H(W)=k) = For U ∼ μα we have P(H(U)=k) = P(H(U)=k) k(α-β) Z(β) = e P(H(W)=k) Z(α) ak e-β k Z(β) ak e-α k Z(α) estimator for Z(β) Z(α) For W ∼ μβ we have P(H(W)=k) = For U ∼ μα we have P(H(U)=k) = ak e-β k Z(β) ak e-α k Z(α) P(H(U)=k) k(α-β) Z(β) = e P(H(W)=k) Z(α) PROBLEM: P(H(W)=k) can be too small Z(β) Rough estimator for n Z(β) = ∑ ak e-β k Z(α) interval instead of single value k=0 For W ∼ μβ we have P(H(W)∈[c,d]) = For U ∼ μα we have P(H(W)∈[c,d]) = d ∑ ak e-β k k=c Z(β) d ∑ ak e-α k k=c Z(α) Rough estimator for If |α-β|⋅ |d-c| ≤ 1 then Z(β) Z(α) Z(β) 1 Z(β) P(H(U)∈[c,d]) ec(α-β) ≤ ≤e P(H(W)∈[c,d]) e Z(α) Z(α) We also need P(H(U) ∈ [c,d]) P(H(W) ∈ [c,d]) to be large. d ∑ ak e-α k k=c d ∑ ak e-β k k=c d ec(α-β) ∑ ak e-α (k-c) = k=c d ∑ ak e-β (k-c) k=c We will: Split {0,1,...,n} into h ≤ 4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature β there exists a interval with P(H(W)∈ I) ≥ 1/8h We say that I is HEAVY for β We will: Split {0,1,...,n} into h ≤ 4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature β there exists a interval with P(H(W)∈ I) ≥ 1/8h We say that I is HEAVY for β Algorithm repeat find an interval I which is heavy for the current inverse temperature β see how far I is heavy (until some β*) use the interval I for the feeler ANALYSIS: either * make progress, or * eliminate the interval I * or make a “long move” Z(β) Z(2β−α) Z(α) Z(α) I is heavy β distribution of h(X) where X∼μβ ... I = a heavy interval at β I is heavy I is NOT heavy β γ distribution of h(X) where X∼μγ ... no longer heavy at γ I = a heavy interval at β ! I is heavy I is heavy I is NOT heavy β γ’ γ distribution of h(X) where X∼μγ’ heavy at γ’ ... I = a heavy interval at β I is heavy I is heavy I is NOT heavy β γ’ γ I is heavy β* β*+1/(2n) I is NOT heavy use binary search to find β* α = min{1/(b-a), ln A} I=[a,b] I is heavy I is heavy I is NOT heavy β γ’ γ I is heavy β* β*+1/(2n) I is NOT heavy I=[a,b] use binary search to find β* α = min{1/(b-a), ln A} How do you know that you can use binary search? How do you know that you can use binary search? I is NOT heavy I is heavy I is NOT heavy I is heavy Lemma: the set of temperatures for which I is h-heavy is an interval. I is h-heavy at β ∑ k∈I ak -β k e P(h(X)∈ I) ≥ 1/8h for X∼μβ 1 ≥ 8h n ∑ k=0 ak e-β k How do you know that you can use binary search? ∑ ak -β k e k∈I 1 ≥ 8h n ∑ ak -β k e k=0 x=e-β Descarte’s rule of signs: c0 x0 + c1 x1 + c2 x2 + .... + cn xn + + + sign change number of positive roots ≤ number of sign changes How do you know that you can use binary search? ∑ k∈I n ak -β k e ∑ 1 -1+x+x ≥ 2+x3+...+xank e-β k 20 1+x+x h k=0 x=e-β Descarte’s rule of signs: c0 x0 + c1 x1 + c2 x2 + .... + cn xn + + + sign change number of positive roots ≤ number of sign changes How do you know that you can use binary search? ∑ ak -β k e k∈I 1 ≥ 8h n ∑ ak -β k e k=0 x=e-β Descarte’s rule of signs: c0 x0 + c1 x1 + c2 x2 + .... + cn xn + + + sign change number of positive roots ≤ number of sign changes I=[a,b] I is heavy β I is heavy β* β*+1/(2n) I is NOT heavy can roughly compute ratio of Z(α)/Z(α’) for α, α’∈ [β,β*] if |α-α|.|b-a|≤ 1 1. success I=[a,b] I is heavy 2. eliminate interval β I is heavy β* β*+1/(2n) 3. long move I is NOT heavy find largest α such that Z(β) Z(2β−α) Z(α) Z(α) ≤C can roughly compute ratio of Z(α)/Z(α’) for α, α’∈ [β,β*] if |α-α|.|b-a|≤ 1 if we have sampler oracles for μβ then we can get adaptive schedule of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for β<βC (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2Δ (using Jerrum’95) Outline 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline 6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Appendix – proof of: 1) E[X1 X2 ... Xt] = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) 2 E[Xi] Theorem (Dyer-Frieze’91) 2 2 O(t /ε ) samples (O(t/ε ) from each X ) 2 i give 1±ε estimator of “WANTED” with prob≥3/4 How precise do the Xi have to be? First attempt – term by term Main idea: ε ε ε ε (1± )(1± )(1± )... (1± ) ≈ 1±ε t t t t n= Θ( V[X] E[X]2 1 ε2 ln (1/δ) ) each term Ω (t2) samples ⇒ Ω (t3) total How precise do the Xi have to be? Analyzing SCV is better (Dyer-Frieze’1991) X=X1 X2 ... Xt GOAL: SCV(X) ≤ ε2/4 squared coefficient of variation (SCV) P( X gives (1±ε)-estimate ) V[X] ≥1E[X]2 1 ε2 How precise do the Xi have to be? Analyzing SCV is better (Dyer-Frieze’1991) Main idea: ε2/4 2/4 SCV(Xi) ≤ ⇒ SCV(X) < ε t ≈ proof: SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1 SCV(X)= V[X] E[X]2 = E[X2] E[X]2 -1 How precise do the X⇒i have X1, X2 independent E[X1 X2] =to E[Xbe? 1]E[X2] Analyzing is better X1, X2SCV independent ⇒ X12,X22 independent (Dyer-Frieze’1991) Main idea: X1,X2 independent ⇒ SCV(X1X2)=(1+SCV(X1))(1+SCV(X2))-1 ε2/4 2/4 SCV(Xi) ≤ ⇒ SCV(X) < ε t ≈ proof: SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1 SCV(X)= V[X] E[X]2 = E[X2] E[X]2 -1 How precise do the Xi have to be? Analyzing SCV is better (Dyer-Frieze’1991) Main idea: X = X1 X2 ... Xt ε2/4 2/4 SCV(Xi) ≤ ⇒ SCV(X) < ε ≈ t each term Ο(t /ε2) samples ⇒ Ο(t2/ε2) total Outline 6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Hamiltonian 4 2 1 0 Hamiltonian – many possibilities 2 1 0 (hardcore lattice gas model) What would be a natural hamiltonian for planar graphs? What would be a natural hamiltonian for planar graphs? H(G) = number of edges natural MC pick u,v uniformly at random 1/(1+λ) try G - {u,v} λ/(1+λ) try G + {u,v} 1/(1+λ) n(n-1)/2 G G’ v v u λ/(1+λ) n(n-1)/2 natural MC pick u,v uniformly at random 1/(1+λ) try G - {u,v} λ/(1+λ) try G + {u,v} u 1/(1+λ) n(n-1)/2 G G’ v v u λ/(1+λ) n(n-1)/2 u π(G) ∝ λnumber of edges satisfies the detailed balance condition π(G) P(G,G’) = π(G’) P(G’,G) (λ = exp(-β)) Outline 6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts (n=3) Mixing time: τmix = smallest t such that | μt - π |TV ≤ 1/e Θ(n ln n) Relaxation time: τrel = 1/(1-λ2) Θ(n) τrel ≤ τmix ≤ τrel ln (1/πmin) (discrepancy may be substantially bigger for, e.g., matchings) Estimating π(S) Mixing time: τmix = smallest t such that | μt - π |TV ≤ 1/e Relaxation time: τrel = 1/(1-λ2) X∼π METHOD 1 { 1 if X∈ S Y= 0 otherwise E[Y]=π(S) X1 X2 X3 ... Xs Mixing time: τmix = smallest t such that | μt - π |TV ≤ 1/e Estimating π(S) Relaxation time: τrel = 1/(1-λ2) METHOD 1 X∼π { 1 if X∈ S Y= 0 otherwise E[Y]=π(S) X1 X2 X3 ... METHOD 2 (Gillman’98, Kahale’96, ...) X1 X2 X3 ... Xs Xs Further speed-up Mixing time: τmix = smallest t such that | μt - π |TV ≤ 1/e Relaxation time: τrel = 1/(1-λ2) |μt - π |TV ≤ exp(-t/τrel) Varπ(μ0/π) (∑ π(x)(μ0(x)/π(x)-1)2)1/2 small ⇒ called warm start METHOD 2 (Gillman’98, Kahale’96, ...) X1 X2 X3 ... Xs Further speed-up Mixing at time: sample β can be used as a τmix = smallest t such that warm start for β’ | μt - π |TV ≤ 1/e ⇔ Relaxation time: can step cooling schedule = β1/(1-λ2) relto from τβ’ |μt - π |TV ≤ exp(-t/τrel) Varπ(μ0/π) (∑ π(x)(μ0(x)/π(x)-1)2)1/2 small ⇒ called warm start METHOD 2 (Gillman’98, Kahale’96, ...) X1 X2 X3 ... Xs sample at β can be used as a warm start for β’ ⇔ cooling schedule can step from β’ to β m=O( (ln n)(ln A) ) β0 β1 β2 β3 βm .... = “well mixed” states β0 β1 β2 β3 βm .... = “well mixed” states run the our cooling-schedule algorithm with METHOD 2 using “well mixed” states as starting points METHOD 2 X1 Xs X2 X3 ... Xs k=O*( (ln A)1/2 ) Output of our algorithm: β0 β0 β1 βk small augmentation (so that we can use sample from current β as a warm start at next) still O*( (ln A)1/2 ) β3 β2 β1 βm .... Use analogue of Frieze-Dyer for independent samples from vector variables with slightly dependent coordinates. if we have sampler oracles for μβ then we can get adaptive schedule of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for β<βC (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2Δ (using Jerrum’95)