advertisement

3. Optimization Methods for

Molecular Modeling

*by Barak Raveh*

Outline

•

•

•

**Introduction**

Local Minimization Methods (derivative-based)

–

Gradient (first order) methods

–

Newton (second order) methods

Monte-Carlo Sampling (MC)

–

Introduction to MC methods

–

Markov-chain MC methods (MCMC)

–

Escaping local-minima

**I. The energy function:**

The *in-silico *energy function should correlate with the (intractable) physical free energy. In particular, they should **share the same global energy minimum**.

*II. The sampling strategy:*

Our sampling strategy should efficiently scan the

(enormous) space of protein conformations

rough = has multitude of local minima in a multitude of scales.

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

The landscape is rough because both small pits and the Sea of Galilee are local minima.

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

•

A protein conformation is defined by the set of Cartesian atom coordinates

(x,y,z) or by Internal coordinates

(φ /ψ/χ torsion angles ; bond angles ; bond lengths)

•

The conformation space of a protein with 100 residues has ≈ 3000 dimensions

•

The X-ray structure of a protein is a point in this space .

•

A 3000-dimensional space cannot be systematically sampled, visualized or comprehended.

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

*space of conformations energy*

smooth?

rugged?

Images by Ken Dill

Outline

•

•

•

Introduction

**Local Minimization Methods (derivative-based)**

–

Gradient (first order) methods

–

Newton (second order) methods

Monte-Carlo Sampling (MC)

–

Introduction to MC methods

–

Markov-chain MC methods (MCMC)

–

Escaping local-minima

Example: removing clashes from X-ray models

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of minima do we want?

The path to the closest local minimum = local **minimization **

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of minima do we want?

The path to the closest local minimum = local **minimization **

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of minima do we want?

The path to the closest local minimum = local **minimization **

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of minima do we want?

The path to the closest local minimum = local **minimization **

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of minima do we want?

The path to the closest local minimum = local **minimization **

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of minima do we want?

The path to the closest local minimum = local **minimization **

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of minima do we want?

The path to the closest local minimum = local **minimization **

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

Gradients and Hessians generalize the first and second derivatives (respectively) of multi-variate scalar functions ( = functions from vectors to scalars)

*Energy = f(x*

*1*

*, y*

*1*

*, z*

*1*

*, … , x n*

*, y n*

*, z n*

*) i*

*E*

*r*

*i*

*E*

*x*

*E i*

*y*

*E i*

*z i*

*Gradient*

**h**

*ij*

*r i*

**2**

*E*

*r j*

**2**

*E*

*x i*

**2**

*x*

*E j*

*y i*

*x*

**2**

*E j*

*z i*

*x j*

**2**

*E*

*x i*

**2**

*y*

*E j*

*y i*

**2**

*y*

*E j*

*z i*

*y j*

*Hessian*

**2**

*E*

*x i*

*z*

**2**

*E j*

*y i*

**2**

*E z j*

*z i*

*z j*

**Analytical Energy Gradient **

(i) Cartesian Coordinates

*E = f(x*

*1*

*, y*

*1*

*,z*

*1*

*, … , x n*

*, y n*

*, z n*

*)*

*E*

*x*

**1**

*E*

*y*

**1**

*E*

*z*

**1**

**...**

*E*

*x n*

*E*

*y n*

*E*

*z n*

**Example: **

Van der-Waals energy between pairs of atoms – O(n

2

) pairs:

*E*

*VdW*

*i*

*j*

**,**

*A*

*R ij*

**12**

*R ij*

**6**

*E*

*VdW*

*R ij*

**12**

*A*

*R ij*

**13**

**6**

*B*

*R*

**7**

*ij*

*R ij*

**(**

*x i*

*x j*

**)**

**2**

**(**

*y i*

*y j*

**)**

**2**

**(**

*z i*

*z j*

**)**

**2**

**Energy, work and force: **recall that

*Energy ( = work) is defined as force integrated over distance *

Energy gradient in Cartesian coordinates = vector of *forces *that act upon atoms

(but this is not exactly so for statistical energy functions, that aim at the free energy ΔG)

**Analytical Energy Gradient **

(ii) Internal Coordinates (torsions, etc.)

**Note: **For simplicity, bond lengths and bond angles are often ignored

*E = f(*

*1*

*,*

*1*

*, *

*1*

*, *

*11*

*,*

*12 ,*

*…)*

*E*

**1**

*E*

**1**

*E*

**1**

*E*

**11**

*E*

**12**

**...**

*Enrichment: *Transforming a gradient between Cartesian and Internal coordinates

(see *Abe, Braun, Nogoti and Gö, 1984 *; *Wedemeyer and Baker, 2003*)

Consider an infinitesimal rotation of a vector * r *around a unit vector

mechanics, it can be shown that:

*r*

*n*

*n n*

*x *

*r*

*r*

. From physical

*n*

*r*

*r*

*cross product – right hand rule adapted from image by Sunil Singh http://cnx.org/content/m14014/1.9/*

Using the * fold-tree *(previous lesson), we can recursively propagate changes in internal coordinates to the whole structure

(see Wedemeyer *and Baker 2003)*

**Gradient Calculations –**

**Cartesian vs. Internal Coordinates**

•

•

•

For some terms, Gradient computation is simpler and more natural with Cartesian coordinates, but harder for others:

**Distance / Cartesian dependent: **

Van der-Waals term ; Electrostatics ; Solvation

**Internal-coordinates dependent: **

Rosetta)

Bond length and angle ; Ramachandran and Dunbrack terms (in

**Combination: **

Hydrogen-bonds (in some force-fields)

**Reminder: **Internal coordinates provide a natural distinction between soft constraints

(flexibility of φ/ψ torsion angles) and hard constraints with steep gradient (fixed length of covalent bonds).

*Energy landscape of Cartesian coordinates is more rugged.*

•

•

Analytical solutions require a closed-form algebraic formulation of energy score

Numerical solution try to approximate the gradient (or Hessian)

–

Simple example:

–

**f’(x) ≈ f(x+1) – f(x)**

Another example: the Secant method (soon)

Outline

•

•

•

Introduction

**Local Minimization Methods (derivative-based)**

–

**Gradient (first order) methods**

–

Newton (second order) methods

Monte-Carlo Sampling (MC)

–

Introduction to MC methods

–

Markov-chain MC methods (MCMC)

–

Escaping local-minima

*Sliding down an energy gradient*

*good*

*( = global minimum) local minimum*

Image by Ken Dill

1.

Gradient Descent –

System Description

Coordinates vector (Cartesian or Internal coordinates):

**X=(x**

**1**

**, x**

**2**

**,…,x n**

**)**

2.

Differentiable energy function:

**E(X)**

3.

Gradient vector:

**(**

*X*

**)**

*E x*

**1**

**,**

*E x*

**2**

**,......,**

*E x n*

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

•

•

**Parameters:**

λ = step size ;

= convergence threshold

–

Compute

(x)

– x new

•

= x + λ

(x)

*Line search: *find the best step size λ in order to minimize E(x new

)

**(discussion later)**

**Note on convergence condition: **in local minima, the gradient must be zero (but not always the other way around)

**λ**

**E[x + λ**

**(x)]: **

(1)This is also an optimization problem, but in one-dimension…

(2)Inexact solutions are probably sufficient

**Interval bracketing –**

–

–

**(e.g., golden section, parabolic interpolation, Brent’s search)**

Bracketing the local minimum by intervals of decreasing length

Always finds a local minimum

**Backtracking **

**(e.g., with Armijo / Wolfe conditions)**

**: **

–

–

Multiply step-size λ by *c<1*, until some condition is met

Variations: λ can also increase

**1-D Newton and Secant methods**

We will talk about this soon…

*2-D Rosenbrock’s Function: *a Banana Shaped Valley

Pathologically Slow Convergence for Gradient Descent

**0 iterations**

**1000 iterations**

**100 iterations**

**10 iterations**

*The (very common) problem: a narrow, winding “valley” in the energy landscape *

*The narrow valley results in miniscule, zigzag steps*

•

Use a (smart) linear combination of gradients from previous iterations to prevent zigzag motion

•

•

•

**Parameters:**

λ = step size ;

= convergence threshold x

0

= random starting point

*Λ*

*0*

*= *

(x

0

)

While

–

*Λ i+1*

–

*Λ i*

*= *

(x

>

i

) +

β i

∙*Λ*

*i*

*choice of *β i is important

–

X i+1

•

= x i

+ λ ∙ *Λ*

*i*

*Line search: *adjust step size λ to minimize E(X i+1

)

**Conjugated gradient descent gradient descent**

•

•

The new gradient is “A-orthogonal” to all previous search direction, for exact line search

Works best when the surface is approximately quadratic near the minimum (convergence in N iterations), otherwise need to reset the search every N steps (N = dimension of space)

Outline

•

•

•

Introduction

**Local Minimization Methods (derivative-based)**

–

Gradient (first order) methods

–

**Newton (second order) methods**

Monte-Carlo Sampling (MC)

–

Introduction to MC methods

–

Markov-chain MC methods (MCMC)

–

Escaping local-minima

First order approximation:

Second order approximation:

The full Series:

=

Example:

Taylor’s Series

*(a=0)*

Taylor’s Approximation: f(x)=e x

3 2

*f(x) = sin(x)*

*2x at x=1.5*

1

2

1

0

0

-1

-1

-2

-2 sin(X)^(2X)

-3

3

*f(x) = sin(x)*

*2x at x=1.5*

2 1

2

1

0

0

-1

-1 -2 sin(X)^(2X)

1st order

-3

-2

3

*f(x) = sin(x)*

*2x at x=1.5*

2 1

2

1

0

0

-1

-1 -2 sin(X)^(2X)

1st order

2nd order

-3

-2

3

*f(x) = sin(x)*

*2x at x=1.5*

2 1

2

1

0

0

-1

-1 -2 sin(X)^(2X)

1st order

2nd order

3rd order

-3

-2

First order approximation:

Root finding by Taylor’s approximation:

*f*

**(**

*x*

**)**

*f*

**(**

*a*

**)**

*f*

**' (**

*a*

**)**

**(**

*x*

**1 !**

*a*

**)**

**0**

*f*

**(**

*a*

**)**

*f*

**' (**

*a*

**)**

**(**

*x*

**1 !**

*a*

**)**

*x*

*a*

*f*

**(**

*a*

**)**

*f*

**' (**

*a*

**)**

1. Start from a random x

0

2. While not converged, update x with Taylor’s series:

*x n*

**1**

*x n*

*f*

**(**

*x n*

**)**

*f*

**' (**

*x n*

**)**

**THEOREM: **Let x root some size Δ around x root quadratically be a “nice” root of f(x). There exists a “neighborhood” of

, in which Newton method will converge towards x

**( = the error decreases quadratically in each round)**

root

*Image from http://www.codecogs.com/d-ox/maths/rootfinding/newton.php*

•

Just like Newton-Raphson, but approximate the derivative by drawing a secant line between two previous points:

*f*

**' (**

*x*

**)**

*f*

**(**

*x*

**1**

**)**

*x*

**1**

*f x*

**0**

**(**

*x*

**0**

**)**

1.

2.

**Secant algorithm:**

Start from two random points: x

0

, x

1

While not converged:

•

**Theoretical convergence rate: **golden-ratio (~1.62)

•

**Often faster in practice: **no gradient calculations

Second order approximation of f(x):

*f*

**(**

*x*

**)**

*f*

**(**

*a*

**)**

*f*

**' (**

*a*

**)**

**(**

*x*

**1 !**

*a*

**)**

*f*

**' ' (**

*a*

**)**

**(**

*x*

**2 !**

*a*

**)**

**2 take derivative (by X)**

Minimum is reached when derivative of approximation is zero:

**0**

*f*

**' (**

*a*

**)**

*f*

**' ' (**

*a*

**)(**

*x*

*a*

**)**

*x*

*a*

*f*

**' (**

*a*

**)**

*f*

**' ' (**

*a*

**)**

•

**So… this is just root finding over the derivative **

(which makes sense since in local minima, the gradient is zero)

1. Start from a random vector x=x

0

2. While not converged, update x with Taylor’s series:

*x new*

*x*

*f*

**' (**

*x*

**)**

*f*

**' ' (**

*x*

**)**

•

•

**Notes:**

if f’’(x)>0, then x is surely a local minimum point

We can choose a different step size than one

1. Start from a random vector x=x

0

2. While not converged, update x with Taylor’s series:

**Notes: **

•

H is the Hessian matrix (generalization of second derivative to high dimensions)

•

We can choose a different step size using Line

Search (see previous slides)

•

•

nd

–

–

DFP (Davidson – Fletcher – Powell)

BFGS (Broyden – Fletcher – Goldfarb – Shanno)

–

Combinations

**Timeline:**

Newton-Raphson (17

th century)

Secant method

Broyden Method for roots (1965)

DFP (1959, 1963)

BFGS (1970)

•

•

•

•

•

Conjugate Gradient Descent http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf

Quasi-Newton Methods: http://www.srl.gatech.edu/education/ME6103/Quasi-Newton.ppt

HUJI course on non-linear optimization by Benjamin Yakir http://pluto.huji.ac.il/~msby/opt-files/optimization.html

Line search:

–

– http://pluto.huji.ac.il/~msby/opt-files/opt04.pdf

http://www.physiol.ox.ac.uk/Computing/Online_Documentation/Matlab/toolbox/nnet/backpr59.html

Wikipedia…

Outline

•

•

•

Introduction

**Local Minimization Methods (derivative-based)**

–

Gradient (first order) methods

–

Newton (second order) methods

**Monte-Carlo Sampling (MC)**

–

**Introduction to MC methods**

–

Markov-chain MC methods (MCMC)

–

Escaping local-minima

Harder Goal: Move from an Arbitrary Model to a Correct

One

Example: predict protein structure from its AA sequence.

Arbitrary starting point

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

10

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

100

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

200

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

400

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

800

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

1000

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

1200

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

1400

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

1600

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

1800

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

2000

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

iteration

4000

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

This time succeeded, in many cases not.

iteration

7000

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

What kind of paths do we want?

The path to the global minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

*(a.k.a. MC simulations, MC sampling or MC search)*

•

–

–

Samples can be *dependent *or *independent*

MC physical simulations are most famous for their role in the *Manhattan Project*

(Uncle of Polish mathematician *Stanisław Marcin Ulam*’s was said to be a heavy gambler)

Suppose we throw darts randomly (and uniformly) at the square:

**Algorithm:**

For i=[1..ntrials] x = (random# in [0..r]) y = (random# in [0..r]) distance = sqrt (x^2 + y^2) if distance ≤ r hits++

End

Output:

**4**

*hits ntrials*

http://www.chem.unl.edu/ze ng/joy/mclab/mcintro.html

*Adapted from course slides by Craig Douglas*

Outline

•

•

•

Introduction

**Local Minimization Methods (derivative-based)**

–

Gradient (first order) methods

–

Newton (second order) methods

**Monte-Carlo Sampling (MC)**

–

Introduction to MC methods

–

**Markov-chain MC methods (MCMC)**

–

Escaping local-minima

http://www.chem.uoa.gr/applets/AppletSailor/Appl_Sailor2.html

**0.25**

**0.25**

**0.25**

**0.25**

**0.25**

**0.25**

**0.25**

**0.25**

**What is the probability that the sailor will leave through each exit?**

•

•

**Markov-Chain: **future state depends only on present state

**Markov-Chain Monte-Carlo on Graphs**: we randomly walk from node to node with a certain probability, that depends only on our current location.

0.5

0.75

0.5

0.25

Analysis of a Two-Nodes Walk

0.5

0.75

A B

0.5

**After n rounds, what is the probability of being in node A?**

0.25

Assume

*Pr n+1*

*A ≈ Pr n*

*A*

for a large

*n*

:

*Pr n+1*

*A = Pr n*

*A x 0.75 + Pr n*

*B x 0.5*

*0.25 x Pr n*

*A = Pr n*

*B x 0.5*

*Pr n*

*A = 2 x Pr n*

*B*

**So:**

Pr

∞

A = ⅔ Pr

∞

B = ⅓

Markov-Chain Monte-Carlo (MCMC) with “proposals”:

1.

Perturb Structure to create a “proposal”

2.

Accept or reject new conformation with a “certain” probability

Protein image taken from Chemical Biology, 2006

After a long run, we want to find lowenergy conformations, with high probability

A (physically) natural

* choice is the

**Boltzmann distribution**, proportional to:

*E i e k*

*B*

*T*

*Z*

**E**

*i k*

*B*

*= energy of state i*

*= Boltzmann constant*

**T **= temperature

Z = “Partition Function” constant

But how?

* In theory, the Boltzmann distribution is a bit problematic in non-gas phase, but never mind that for now…

•

"**Equations of State Calculations by Fast Computing Machines**“ –

Metropolis, N. et al. *Journal of Chemical Physics *(**1953)**

Boltzmann Distribution:

*e*

*E i k*

*B*

*T*

•

•

•

*Z*

The energy score and temperature are computed (quite) easily

The “only” problem is calculating Z (the “partition function”) – this requires summing over all states.

Metropolis showed that MCMC will converge **to the true **

**Boltzmann distribution**, if we accept a new proposal with probability min(

*e*

*Energy k*

*B*

*T*

,1)

Markov-Chain Monte-Carlo (MCMC) with “proposals”:

1.

2.

3.

Perturb Structure to create a “proposal”

Accept or reject new conformation by the Metropolis criterion

Repeat for many iterations

If we run till infinity, with good perturbations, we will visit every conformation according to the

Boltzmann distribution

Protein image taken from Chemical Biology, 2006

But we just want to find the energy minimum.

If we do our perturbations in a smart manner, we can still cover relevant (realistic, lowenergy) parts of the search space

Outline

•

•

•

Introduction

**Local Minimization Methods (derivative-based)**

–

Gradient (first order) methods

–

Newton (second order) methods

**Monte-Carlo Sampling (MC)**

–

Introduction to MC methods

–

Markov-chain MC methods (MCMC)

–

**Escaping local-minima**

Getting stuck in a local minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

Getting stuck in a local minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

Getting stuck in a local minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

Getting stuck in a local minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

Getting stuck in a local minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

Getting stuck in a local minimum

*

Adapted from slides by Chen Kaeasar, Ben-Gurion University

**Trick 1: **Simulated Annealing

•

•

The Boltzmann distribution depends on the *in-silico *temperature **T**:

In low temperatures, we will get stuck in local minima (we will always get zero if the energy rises even slightly)

In high temperatures, we will always get 1 (jump between conformations like nuts).

min(

*e*

*Energy k*

*B*

*T*

,1)

**In simulated annealing, we gradually decrease **

**(“cool down”) the virtual temperature factor, until we converge to a minimum point**

**Trick 2: **Monte-Carlo with Energy Minimization (MCM)

Scheraga *et al.*, 1987

•

•

Derivative-based methods (Gradient Descent, Newton’s method, DFP) are excellent at finding near-by local minima

In Rosetta, Monte-Carlo is used for

*bigger jumps *

between

*near-by local minima*

**Trick 3: **Switching between Low-Resolution (smooth) and High-Resolution (rugged) energy functions

•

•

In Rosetta, the

**Centroid energy function **

is used to quickly sample large perturbations

The

**Full-Atom energy function **

is used for fine tuning

**START**

**Smooth **

**Low-res**

**Rugged **

**High-res**

*energy conformations*

•

•

•

**Trick 4: **Repulsive Energy Ramping

The repulsive VdW energy is the main reason for getting stuck

Start simulations with lowered repulsive energy term, and gradually ramp it up during the simulation

Similar rational to Simulated Annealing

•

•

•

**Trick 5: **Modulating Perturbation Step Size

A too small perturbation size can lead to a very slow simulation

we remain stuck in the local minimum

A large perturbation size can lead to clashes and a very high rejection rate

we remain stuck in the same local minimum

We can increase or decrease the step size until a fixed rejection rate (for example, 50%) is achieved

Monte-Carlo in Rosetta

•

•

In Rosetta, it is common to use any of the above tricks, MCM in particular

In general, a single simulation is pretty short (no more than a few minutes), but is repeated *k *independent times – getting *k* sampled “decoys”

–

We use energy scoring to decide which is the best decoy structure – hopefully this is the near-native solution

–

Low-resolution sampling is often used to create a very large number of initial decoys, and only the best one are moved to high-resolution minimization

Summary

•

•

•

•

**Derivative-based methods **can effectively reach *near-by energy minima*

**Metropolis-Hastings MCMC **can recover the *Boltzmann distribution *in some applications, but for protein folding, we cannot hope to cover the huge conformational space, or recover the Boltzmann distribution.

Still, useful tricks help us find good *low-energy near-native *conformations

(**Simulated Annealing**, **Monte-Carlo with Minimization**, **Centroid mode, **

**Ramping, Step size modulation, **and other smart sampling steps, etc.).

We didn’t cover some very popular non-linear optimization methods:

–

**Linear and Convex Programming ; Expectation Maximization algorithm ; Branch **

**and Bound algorithms ; Dead-End Elimination ***(Lesson 4) ***; Mean Field approach ; **

**And more…**