The analysis of BP guided decimation algorithm Federico Ricci-Tersenghi Physics Department Sapienza University, Roma FRT, G. Semerjian, JSTAT (2009) P09001 A. Montanari, FRT and G. Semerjian, Proc. Allerton (2007) 352 Motivations • Solving algorithms are of primary relevance in combinatorial optimization -> provide lower bounds -> their behavior is related to problem hardness • Analytical description of the dynamics of solving algorithms is difficult • • Can we link it to properties of the solution space ? Is there a threshold unbeatable by any algorithm ? (kind of first principles limitation...) Models and notation • • Random k-XORSAT (k=3) • Notation: Random k-SAT (k=4) • • N variables, M clauses Clause to variables ratio = M/N Phase transitions in random CSP Structure of solutions space ons to Diluted p-Spin Models and XORSAT Problems 525 0.18 0.16 0.14 0.12 E=0 totalS(γ) entropy S clustering phase transition 0.1 0.08 0.06 complexity Σ(γ) 0.04 0.02 0 0.75 γdd 0.8 0.85 0.9 γ γcs 0.95 α 1 cluster contains g. 11. Total entropy S(c) and configurationaleach entropy S(c) for p=3. eN (S ) solutions ntaneously form clusters. By definition, two solutions having a mming distance d, i.e., d/N Q 0 for N Q ., are in the same Standard picture energy = unsat clauses d Easy phase running times O(N) s Hard-SAT phase UNSAT α [Mézard, Parisi, Zecchina] More More transitions (for k ≥ 4)phase transitions [Krzakala, Montanari, Ricci-Tersenghi, GS, Zdeborova] in random k-SAT (k > 3) αd,+ αd αc αs SAT/UNSAT Exponential number of relevant clusters only in [αd , αc ], condensation then “condensation” clustering Guilhem Semerjian (LPT-ENS Paris) 20.12.07 / Orsay 7 / 34 [Mézard, Parisi, Zecchina] More More transitions (for k ≥ 4)phase transitions [Krzakala, Montanari, Ricci-Tersenghi, GS, Zdeborova] in random k-SAT (k > 3) αd,+ αd BP marginals are correct αc αs SAT/UNSAT Exponential number of relevant clusters only in [αd , αc ], condensation then “condensation” clustering Guilhem Semerjian (LPT-ENS Paris) 20.12.07 / Orsay 7 / 34 Performance of algorithms for random 4-SAT 9.38 9.93 d 0 s c 9.55 Performance of algorithms for random 4-SAT Rigorously solved algorithms SC 2.25 SCB/GUC 5.5 9.38 9.93 d 0 s c 9.55 Performance of algorithms for random 4-SAT Rigorously solved algorithms SC 2.25 SCB/GUC 5.5 9.38 9.93 d 0 s c 9.55 zChaff MCMC? FMS SPdec Algorithms with no analytic solution Performance of algorithms for random 4-SAT Rigorously solved algorithms SC 2.25 BPdec 9.05 SCB/GUC 5.5 9.38 9.93 d 0 s c 9.55 zChaff MCMC? FMS SPdec Algorithms with no analytic solution Two broad classes of solving algorithms • Local search (biased) random walks in the space of configurations E.g. Monte Carlo, WalkSAT, FMS, ChainSAT, ... • Sequential construction at each step a variable is assigned E.g. UCP, GUCP, BP/SP guided decimation • • the order of assignment of variables the information used to assign variables The oracle guided algorithm (a thought experiment) • • Start with all variables unassigned while (there are unassigned variables) • • • choose (randomly) an unassigned variable i ask the oracle the marginal of this variable assign i according to its marginal Samples solutions uniformly :-) Oracle job is #P-complete in general :-( µi ( · | (t)) Ensemble of ✓-decimated CSP 1. Draw a CSP formula with parameter ↵ 2. Draw a uniform solution ⌧ of this CSP 3. Choose a set D✓ by retaining each variable independently with probability ✓ 4. Consider the residual formula on the variables outside D✓ obtained by imposing the allowed configurations to coincide with ⌧ on D✓ Not an ensemble of randomly uniform formulae conditioned on their degree distributions (step 2 depends on step 1) eneral with randomly uniform formulae conditioned on their degree CSP induces non-trivial correlations in the struc depends on the initia act that the generation of the configuration τ We shall see in the following how to adapt vial correlations in the structure of the final formula. compute the typical properties of such generalized ✓ Ensemble ofthe -decimated CSPtechniques to the following how to adapt statistical mechanics the phase transition thresholds in the (α, θ) plan properties of such generalized formulae, and in particular to determine ensembles is the quenched average residual entro thresholds the (α,entropy: θ) plane. One characterization of these random • inResidual nched average residual entropy: 1 EF Eτ ED [ln Z(τ D )], Z(τ D ) = ω(θ) = lim N →∞ M N !" ED [ln Z(τ D )], Z(τ D ) ==number ofψsolutions (5 compatible a (σ ∂a )I(σ D = τ D ), where the expectation values with the “exposed” on D✓correspond t σthree a=1solution This quantity is similar, yet distinct, from the F ectation values correspond to the three steps of the definition above Fraction of frozen variables: definition of the latter also involves a ‘thermalize • lar, yet distinct, from potential [26].confi The bythe theFranz–Parisi free of the measure on the 1 energy quenched (✓) = E E E |W | is given er also involves a ‘thermalized’ reference configuration τ , but F ⌧ D ✓ ✓ replicas σ from τ . NIn other words the two real of the measure on the configurations at a given Hamming distance variables in a Franz–Parisi quenched potential, W = D [ {variables implied by D } ords the two real replicas σ and τ are coupled uniformly the ✓ ✓ ✓ coupled infinitely strongly on D whereacross they are –Parisi quenched potential, whereas in thepresented definitionin of theyofar D. The computations theω rest th ongly on D where they are forced to coincide, andpotential. not at all outside to obtain the usual quenched ns presented in the rest of this paper can, however, be easily adapted ✓ 09) P09001 -trivial correlations in νa→ithe (σi ),structure of the final formula. ( zi in the followinga∈∂i how to adapt the statistical mechanics techniques to normalization. In general graphs do to contain loops, in th again zioffixed calwith properties suchbygeneralized formulae, andfactor in particular determine (6), (8), (9) approximations, the basis of the so-called propagati ioncase thresholds in are theonly (α, θ) plane. One at characterization of thesebelief random Ensemble ofbelow. -decimated CSP algorithmaverage discussed in more entropy: details quenched residual The Bethe approximation can be easily adapted to the case where the configuration M ! the" sites i ∈ D, that is to the conditional measure ( forced to the value τ D on a subset of by log thepartition Bethe-Peierls approx. [ln Z(τCompute Z(τ D ) = ψa (σfunction = τ D ),from (6) (5) of the conditioned follows F EThe τ EDestimation D )], ∂a )I(σ D ⎛ ⎞ σ a=1 " # ! ! τ ! ! & τ τ D D D ⎝ ⎠ ) = − ln ν (σ )η (σ ) + ln ψ (σ ) η ln Z(τ i i a expectation values corresponda→ito thei→athree steps of the definition above. D ∂a i→a (σi ) • σi a σ ∂a i∈∂a i∈D,a∈∂i / similar, yet distinct, from the Franz–Parisi quenched potential [26]. The " # ! & τ reference configuration τ , but is given atter also involves!a ‘thermalized’ D ln νa→i (σi ) , (1 + gy of the measure on theσ configurations at a given Hamming distance i∈D / i a∈∂i On the cavity method for the decimated words the two real replicas σ and τ are coupled uniformly across doi:10.1088/1742-5468/2009/09/P09001 anz–Parisi quenched potential, whereas where messages satisfy standard BPdefinition equationsof ω they are τ Din the τD where the messages {η , ν } depend on the imposed i→a a→i strongly on D where they are forced to coincide, and not at all outside with the boundary condition obeyof this the paper same can, equations (8), complemented w tions presentedindeed in the rest however, be easily adapted τD ηi→a (σi ) = δσi ,τi when i ∈ D. ual quenched potential. racterize the reduced measure µ(·|τ D ) more precisely by computing other s ω(θ). The existence of clusters in this measure will be tested bythought the 2.4. Practical approximate implementation of the he long-range point-to-set correlations and the complexity of the typical The ideal sampling algorithm described in section 2.2 cann τ D(σi ) = δσi ,τi when i ∈ D. ηi→a (σiPractical ) = δσiapproximate when i ∈ D. ηi→a ,τi approximate 2.4. implementation of thought the experiment 2.4. Practical implementation the thought thought experiment 2.4. Practical approximate implementation of the experiment Practical approximate implementation because the computation ofthe the marginals of probability the probability law µ(σ|τ D because the computation of marginals of the probability law because the computation of the marginals of the law µ(σ|τ ) ha The ideal sampling algorithm described in section 2.2 cannot be practically im of the thought experiment Dµ(σ| The idealexponential sampling algorithm described in sectionOne 2.2 cannot be practicall a cost in the number of variables. can, however, mimi a cost cost exponential exponential ininthe number of variables. can, can, however, mimic thi the computation the marginals of theOne probability law µ(σ|τ ha abecause theof number of variables. One however, D ) mi because the computation of the marginals of the probability law µ(σ|τ ) using a faster yet approximate estimation of the marginals of µ(σ|τ ) D D (BP guided decimation algorithm) faster estimation of the of µ(σ|τ ) by ausing cost aaexponential inapproximate the number of variables. Onemarginals can, however, mimic thim using faster yet yetapproximate estimation of the marginals of Dµ(σ|τ D 2.4.The Practical approximate implementation ofin the thought experiment ideal sampling algorithm described in section 2.2 cannot bebepractica The ideal sampling algorithm described section 2.2 cannot practim 2.4.ideal Practical approximate of the2.2 thought experiment The sampling algorithmimplementation described in section cannot be practically a cost exponential inalgorithm. the number of variables. One can, however, mimic belief propagation This modification of the ideal sampler, wh belief propagation algorithm. This modification of the ideal sampler, which w using propagation a faster yet approximate estimation of the marginals of µ(σ|τ D ) by m belief algorithm. This modification of the ideal sampler, w using a faster yet approximate estimation of the marginals of µ(σ|τ ) b BP guided decimation in the following, thus corresponds to (for a given DCSP BP guided decimation in the following, thus corresponds to (for a given belief propagation algorithm. This modification of the ideal sampler, which w BP guided decimation in the following, thus corresponds to (for a giv belief propagation algorithm. This modification of the ideal sampler, whic BP guided decimation in the following, thus corresponds to (for a given CSP (1) choose a random order of the variables, i(1), . . . , i(N), call a.choose Choose arandom randomorder order ofthe the variables, variables (1) adecimation i(1), . . . , i(N), D0 D = BP guided in theoffollowing, thus corresponds to (forcall a given C (1) choose a. . random order of the variables, i(1), . . . , i(N ), call {i(1), . , i(t)}; {i(1), . . a. , i(t)}; On the cavity method for decimated (1) choose random order of the variables, i(1), . . . , i(N), call random D0 = {i(1), . . . , i(t)}; (1) choose a random order of the variables, i(1), . . . , i(N), call D0 b. for (2) for t = 1, . . . , N: {i(1), . . . , i(t)}; (2) for t = 1, . . . , N: τD τD {i(1), . . . , i(t)}; where the messages {ηi→a , νa→i } depend on the imposed partial (2) for t = 1, . . . , N : (2) for t(a) = 1, . fixed . . a, N: find fixed point ofBPBP the BP with equations (7) with the boundary cond (a) find a point of the equations (7) with the boundary condition 1. find a fixed point of eqns. boundary condition indeed obey the same equations (8), complemented with th (2) for t =τδ 1, . . when . , N: i ∈ D ; σa ,τfixed t−1 (a) point of the BP equations (7) with the boundary co D ifixed i δfind when i ∈ D ; (a) find a point of the BP equations (7) with the boundary condition ση t−1 (σ ) = δ when i ∈ D. i ,τ i σi ,τi i→a i (b) draw σ according to the BP estimation of µ(σ |τ ) given i (a)δdraw a fixed point of the BP equations (7) with the boundary condit i i(t) D δσσfind when i ∈ D ; (b) σ according to the BP estimation of µ(σ |τ ) given in (9) t−1 when i ∈ D ; ,τ t−1 i Dt−1 i(t) t−1 ii,τi i = σ.∈i(t)D. t−1to τ= δσiset when i ; i(t)according ,τi(t) iσσ σ (c) set τ (b) draw the BP estimation of µ(σ ) given (9) 2. (c) draw according to the BP estimation of (b) draw according to the BP estimation ) ingiven i |τµ(σ i(t) i(t) i |τ i(t) Dthought Dt−1experim t−1 2.4. Practical approximate implementation of of the (b)set draw σ=i(t) according to the BP estimation of µ(σi |τ Dt−1 ) given in σ . (c) τ i(t) i(t) = σ . (c) set τ i(t) i(t) The belief propagation part of algorithm the algorithm corresponds to step 2(a The belief propagation part of the corresponds to step 2(a). It =σ (c) set set τi(t) 3. The ideal sampling algorithm described in section 2.2 cannot be p i(t) . searching for a stationary point of Bethe the Bethe approximation for the log It p The belief propagation part of the algorithm corresponds to step 2(a). searching for a stationary point of the approximation for the log partit The belief propagation part of the algorithm corresponds to step because the computation of the marginals of the probability law in an iterative manner. The unknowns of the Bethe expression, η The belief propagation part of the algorithm corresponds tolog step 2(a) i→a searching for a stationary point of the Bethe approximation for the partiti in an iterative manner. The unknowns of the Bethe expression, η (resp i→a a cost exponential in the number of variables. One can, however searching for a stationary point of the Bethe approximation for the log considered as messages passed from a variable to a neighboring clau searching for amanner. stationary point of thea Bethe approximation for the log(resp par in an iterative The unknowns of the Bethe expression, η considered as messages passed from variable to a neighboring clause (r i→a using a faster yet approximate estimation of the marginals ofηηi→ µ ininan iterative manner. The unknowns of the Bethe expression, clause to a variable). In a random sequential order a message, say an to iterative manner. The unknowns of the expression, ηi→a (r i→ considered as messages passed fromsequential a variable toBethe aa neighboring clause i→a, (r clause abelief variable). In a random order message, say η u propagation algorithm. This modification of the ideal samp considered asas messages passed from variable toamessage, asent neighboring by to recomputing itsInvalue from the current messages bysay itsηi→a neighb messages passed aavariable toasent neighboring clause clause a variable). afrom random sequential order , cl u byconsidered recomputing its value thefrom current messages by its neighbors When BP guided decimation is expected to work • • • • At least 1 solution must exists ( ↵ < ↵s ) No contradictions should be generated Check for contradictions at each time • add step 0. where UCP/WP is run Can not go beyond condensation transition as BP marginals are no longer correct ( ↵ < ↵c ) Results for random 3-XORSAT • Full analytic solution (by differential equations) ⇥ = + (1 < a ) 1 e k⇥k > ⇥ 1 a random 3-XORSAT random 3-XORSAT Results for random 3-XORSAT Phase transition for Jump in ⇥( ) and cusp in ⇥( ) 2 a (k = 3) = 3 > a 1 = k k k 1 2 ⇥( ) ⇥k 2 like UCP Results for random 3-XORSAT Phase transition for Jump in ⇥( ) and cusp in ⇥( ) 2 a (k = 3) = 3 > a 1 = k k k 1 2 ⇥k ⇥( ) 2 like UCP Results for random 3-XORSAT ⇥( ) Results for random 3-XORSAT ⇥( ) Results for random 3-XORSAT >0 ⇥( ) Results for random 3-XORSAT >0 d c ⇥( ) sults for XORSATPhase - Analysis diagram for random 3-XORSAT 0.3 0.2 UNSAT jump in ⇥ ( ) θ 0.1 a A c d B C >0 0 0.6 0.65 0.7 0.75 0.8 α A : no clusters B : clusters, Σ > 0 0.85 d 0.9 0.95 c C : clusters, Σ = 0 Numerics for random k-SAT • • • k = 4, N = 1e3, 3e3, 1e4, 3e4 c = 9.38 = 9.55 Run WP s = 9.93 d • integer variables, no approximation Run BP • • • much care for dealing with quasi-frozen variables slow convergence (damping and restarting trick) maximum number of iterations (1000) Much larger than the diameter (~2) Results for random 4-SAT ⇥( ) = 8.4 =7 Results for random 4-SAT ⇥( ) = 8.4 =7 Results for random 4-SAT ⇥( ) = 8.4 =7 θ FIG. 6: Results 4-sat, φ(θ), left:random α = 7.0, 4-SAT right: α = 8.4. for d 0.4 c s jump in ⇥( ) 0.3 θ 0.2 0.1 0 8 8.2 8.4 8.6 8.8 9 α 9.2 9.4 9.6 9.8 10 θ FIG. 6: Results 4-sat, φ(θ), left:random α = 7.0, 4-SAT right: α = 8.4. for a 0.4 d c s jump in ⇥( ) 0.3 θ Algorithmic phase transition?? 0.2 0.1 0 8 8.2 8.4 8.6 8.8 9 α 9.2 9.4 9.6 9.8 10 Probability of finding a solution Results for random 4-SAT Probability of finding a solution Results for random 4-SAT Probability of finding a solution Results for random 4-SAT θ FIG. 6: Results 4-sat, φ(θ), left:random α = 7.0, 4-SAT right: α = 8.4. for a 0.4 d c s jump in ⇥( ) 0.3 θ Algorithmic phase transition?? 0.2 0.1 0 8 8.2 8.4 8.6 8.8 9 α 9.2 9.4 9.6 9.8 10 θ FIG. 6: Results 4-sat, φ(θ), left:random α = 7.0, 4-SAT right: α = 8.4. for a 0.4 a d c s jump in ⇥( ) 0.3 θ Algorithmic phase transition?? 0.2 0.1 0 8 8.2 8.4 8.6 8.8 9 α 9.2 9.4 9.6 9.8 10 θ FIG. 6: Results 4-sat, φ(θ), left:random α = 7.0, 4-SAT right: α = 8.4. for a 0.4 a d c s jump in ⇥( ) 0.3 θ Algorithmic phase transition?? 0.2 0.1 0 8 8.2 8.4 8.6 8.8 9 α 9.2 9.4 9.6 9.8 10 Results for random 4-SAT = 8.4 1000 800 600 400 200 0 0.6 0.4 0.2 0 mean BP running times = 8.8 fraction BP runs exceding 1000 iterations Results for random 4-SAT = 8.4 1000 800 600 400 200 0 0.6 0.4 0.2 0 mean BP running times = 8.8 fraction BP runs exceding 1000 iterations θ FIG. 6: Results 4-sat, φ(θ), left:random α = 7.0, 4-SAT right: α = 8.4. for a 0.4 d c s jump in ⇥( ) 0.3 θ ma xim run um nin of B g ti me P 0.2 0.1 0 8 8.2 8.4 8.6 8.8 9 α 9.2 9.4 9.6 9.8 10 Large k limit 1 • ln k k ↵d ' 2 INTRODUCTION k ↵c ' ↵s ' 2 k Previous solvable algorithms Density m/n < · · · Algorithm Pure Literal (“PL”) Walksat, rigorous Walksat, non-rigorous Unit Clause (“UC”) Shortest Clause (“SC”) SC+backtracking (“SCB”) BP+decimation (“BPdec”) (non-rigorous) o(1) as k → ∞ 1 k 2 · 2 /k 6 2k /k ! "k−2 k 1 k−1 2 · 2 k−2 k ! "k−3 1 k−1 k−1 2k 8 k−3 k−2 · k ∼ 1.817 · e · 2k /k 2k k Success probability w.h.p. w.h.p. w.h.p. Ref., [19], [9], 2 [22], Ω(1) [7], 1 w.h.p. [8], 1 w.h.p. w.h.p. k [15], [21], • e Our prediction for BP guided decimation ↵a ' 2 k Table 1: Algorithms for random k-SAT • ln k k Algorithm Fixwork by A. Coja-Oghlan works up to 2 1.3 Related k Quite a few papers deal with efficient algorithms for random k-SAT, contributing either ri non-rigorous evidence based on physics arguments, or experimental evidence. Table 1 su part of this work that is most relevant to us. The best rigorous result (prior to this work) is Large k limit 1 • ln k k ↵d ' 2 INTRODUCTION k ↵c ' ↵s ' 2 k Previous solvable algorithms Density m/n < · · · Algorithm Pure Literal (“PL”) Walksat, rigorous Walksat, non-rigorous Unit Clause (“UC”) Shortest Clause (“SC”) SC+backtracking (“SCB”) BP+decimation (“BPdec”) (non-rigorous) o(1) as k → ∞ 1 k 2 · 2 /k 6 2k /k ! "k−2 k 1 k−1 2 · 2 k−2 k ! "k−3 1 k−1 k−1 2k 8 k−3 k−2 · k ∼ 1.817 · e · 2k /k 2k k Success probability w.h.p. w.h.p. w.h.p. Ref., [19], [9], 2 [22], Ω(1) [7], 1 w.h.p. [8], 1 w.h.p. w.h.p. k [15], [21], • e Our prediction for BP guided decimation ↵a ' 2 k Table 1: Algorithms for random k-SAT • ln k k Algorithm Fixwork by A. Coja-Oghlan works up to 2 1.3 Related k Quite a few papers deal with efficient algorithms for random k-SAT, contributing either ri non-rigorous evidence based on physics arguments, or experimental evidence. Table 1 su part of this work that is most relevant to us. The best rigorous result (prior to this work) is Large k limit (pros and cons) • • Allows for rigorous proofs :-) • Phase transition in the decimation process proved rigorously by A. Coja-Oghlan and A. Pachon-Pinzon May lead to assertions that are not always true :-( (especially for small k values) • Clustering threshold = rigidity threshold Performance of algorithms for random 4-SAT Rigorously solved algorithms SC 2.25 BPdec 9.05 SCB/GUC 5.5 9.38 9.93 d 0 s c 9.55 zChaff MCMC? FMS SPdec Algorithms with no analytic solution In summary... • We have solved the oracle guided decimation algorithm -> ensemble of decimated CSP • • BP guided decimation follows closely this solution • Conjecture: in the large N limit for ↵ < ↵a BP guided decimation = oracle guided decimation • Todo: bound the error on BP marginals We improve previous algorithmic thresholds ↵a from 5.56 (GUC) to 9.05 for k=4 from 9.77 (GUC) to 16.8 for k=5 A conjecture for the ultimate algorithmic threshold • Hypothesis 1: no polynomial time algorithm can find solutions in a cluster having a finite fraction of frozen variables (frozenENTROPY cluster) LANDSCAPE AND NON-GIBBS SOLUTIONS IN … • Hypothesis 2: smart polynomial time algorithms can find solutions in unfrozen clusters even when these clusters are not the majority s(m=0) ∼ 0.007 0.04 s(m=1) ∼ 0.016 s(BPR) ∼ 0.021 Σ 0.03 0.02 0.01 0 −0.01 0 NAE-SAT on (6,121) random regular graphs 0.005 0.01 0.015 0.02 s 0.025 0.03 0.035 0.04 0.045 Dall’Asta, Ramezanpour, Zecchina,the PRE 77 (2008) 031118 FIG. 10. !Color online" Comparing attractive clusters of the BPR algorithm with the typical and thermodynamically relevant In metho on re replica ous ph tions l els. N solutio But as to giv close We locate provid betwe the di A conjecture for the ultimate algorithmic threshold • The smartest polynomial time algorithm can work as long as there exists at least one unfrozen cluster • Conjecture: No polynomial time algorithm can find solutions when all clusters are frozen • Stronger condition than the rigidity transition