Uploaded by Klaudia Buzo

Nonlinear Programming Introduction: Lecture 1

advertisement
An Introduction to Nonlinear Programming
Lecture 1
Universiteti Politeknik i Tiranes
Université de Technologie de Compiègne
October 9, 2024
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
1 / 24
Outline
1
Introduction
What is Nonlinear Programming?
Some recalls
2
Unconstrained Optimization
Optimality Conditions
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
2 / 24
Introduction
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
3 / 24
What is Nonlinear Programming?
Nonlinear Programming (NLP) involves optimizing an objective
function where the function or the constraints are nonlinear.
General Form:
min
x∈Rn
f(x)
subject to gi (x) ≤ 0,
i = 1, . . . , m
hj (x) = 0,
j = 1, . . . , p
Applications in engineering, economics, finance, and machine learning.
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
4 / 24
Linear vs. Non-Linear Optimization
Aspect
Linear Optimization (LP)
Objective
Function
Constraints
Complexity
Linear: f(x) = c1 x1 + c2 x2 +
· · · + c n xn
Linear: a1 x1 + a2 x2 ≤ b
Efficient and solvable in polynomial time
Always guarantees global optimum if feasible
Insensitive to initial conditions
Global
Optimum
Sensitivity
Scalability
Solvable for large-scale problems
(UPT-UTC)
Non-Linear
Optimization
(NLP)
Non-linear: f(x) = x21 +
sin(x2 ) + . . .
Non-linear: g(x) = x21 + x32 ≤ b
Generally harder to solve; may
only find local optima
May only find local optima for
non-convex problems
Sensitive to starting points;
can converge to local optima
More computationally expensive, especially for large or nonconvex problems
An Introduction to Nonlinear Programming
October 9, 2024
5 / 24
Classification of non-linear problems
Unconstrained optimization
Classical analytical methods (Optimality conditions)
Numerical methods
Deterministic (Gradient descent)
Stochastic
Constrained optimization
Equality constraints (Lagrangian Multiplier)
Inequality constraints (KKT conditions)
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
6 / 24
Properties of matrices
Let A ∈ Mm,n (R).
Symmetric Matrix
Matrix A is symmetric if and only if A = AT .
(Ax, y) = (x, AT y) = (x, Ay) (scalar product)
∀x, y ∈ Rn
Positive Semi-Definite Matrix
Matrix A is positive semi-definite (A ≥ 0) if and only if:
∀x ∈ Rn
(Ax, x) ≥ 0 (positive eigenvalues)
Positive Definite Matrix
Matrix A is positive definite (A > 0) if and only if:
A ≥ 0 and (Ax, x) > 0 for x ̸= 0 (strictly positive eigenvalues)
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
7 / 24
Properties of matrices
Eigenvalues and Eigenvectors
Let λ ∈ R, λ is an eigenvalue of A if and only if ∃x ∈ Rn , x ̸= 0 such
that Ax = λx.
x is an eigenvector of A, associated with the eigenvalue λ.
Let A ∈ Mn,n (R).
Inverse Matrix A−1
A is square
A is non-singular (determinant is non-zero): det(A) ̸= 0
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
8 / 24
Differentiation
A function f(x) is said to be differentiable if the following limit exists:
f(a + h) − f(a)
h→0
h
lim
If this limit exists and is finite, the function is differentiable at a. If the
limit does not exist or is infinite, the function is not differentiable at that
point.
First-Order Differentiation
lim
t→0
f(x + th) − f(x)
= ⟨∇f(x), h⟩ = ∇f(x) · h
t
This is the directional derivative of f(x) in the direction h.
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
9 / 24
Gradient of a Function
Definition of the Gradient
The gradient of a differentiable scalar function f : Rn → R is a vector of the
partial derivatives of f with respect to its variables.
It is denoted by ∇f and is defined as:
 ∂f 
∂x1
 ∂f 
 ∂x 
∇f(x) =  . 2 
 .. 
∂f
∂xn
where x = (x1 , x2 , . . . , xn )T ∈ Rn .
Interpretation of the Gradient
The gradient points in the direction of the steepest ascent of the function f.
In optimization, the negative gradient −∇f(x) is often used to find the
direction of steepest descent.
The magnitude of the gradient ∥∇f(x)∥ represents the rate of change of f in
the direction of the gradient.
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
10 / 24
Hessian Matrix
Definition
The Hessian matrix H(f) of a twice-differentiable function f : Rn → R is a
square matrix of second-order partial derivatives.
 2

∂ f
∂2f
∂2f
·
·
·
2
∂x1 ∂x2
∂x1 ∂xn
1
 ∂x

∂2f
∂2f 
 ∂2f
·
·
·
2
 ∂x ∂x
∂x2 ∂xn 
∂x2
H(f)(x) =  2. 1
..
.. 
..
 .

.
.
.
. 

2
2
2
∂ f
∂ f
∂ f
∂xn ∂x1
∂xn ∂x2 · · ·
∂x2
n
Properties of the Hessian Matrix
If H(f)(x) is positive definite at x, then f has a local minimum at x.
If H(f)(x) is negative definite at x, then f has a local maximum at x.
If H(f)(x) has both positive and negative eigenvalues, then x is a
saddle point.
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
11 / 24
Example: Gradient and Hessian Calculation
Function:
f(x, y) = 3x2 + 2xy + 4y2
Step 1: Gradient of the Function
The gradient of f is the vector of first-order partial derivatives:
[ ] [
]
∂f
6x + 2y
∂x
∇f(x, y) = ∂f =
2x + 8y
∂y
Step 2: Hessian Matrix
The Hessian matrix is the square matrix of second-order partial derivatives:
[ 2
] [
]
∂ f
∂2f
6 2
2
∂x∂y
∂x
H(f)(x, y) = ∂ 2 f
=
∂2f
2 8
2
∂y∂x
∂y
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
12 / 24
Convex sets
Definition: Let K ⊆ Rn , K ̸= ∅
1
2
K is a convex set if and only if for all x, y ∈ K, for all α ∈ [0, 1], we
have:
(1 − α)x + αy ∈ K (line segment)
The set of points of the line [x, y] = {(1 − α)x + αy : α ∈ [0, 1]} ⊆ K
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
13 / 24
Convex Functions
A convex function is a function f : K → R that satisfies:
f((1 − α)x + αy) ≤ (1 − α)f(x) + αf(y)
∀x, y ∈ K, ∀α ∈ [0, 1]
Some properties
f′ is non-decreasing:
(f′ (y) − f′ (x))(y − x) ≥ 0
f′′ ≥ 0 :
Hessian Hf = ∇2 f ≥ 0
⇒ positive semi-definite
Strict Convexity
A strictly convex function is a function f : K → R that satisfies:
f((1 − α)x + αy) < (1 − α)f(x) + αf(y),
(UPT-UTC)
∀x ̸= y, α ∈ [0, 1]
An Introduction to Nonlinear Programming
October 9, 2024
14 / 24
Quadratic Forms
Definition: A quadratic form is a scalar-valued function defined as:
1
1
f(x) = (Ax, x) − (b, x) = xT Ax − bT x
2
2
where A ∈ Rn×n is a symmetric positive definite matrix and b ∈ Rn is a
vector.
Properties:
The gradient of f(x) is:
∇f(x) = Ax − b
The Hessian of f(x) is constant:
H(f) = A
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
15 / 24
Unconstrained Optimization
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
16 / 24
Unconstrained Nonlinear Optimization
General Form of an Unconstrained Optimization Problem:
min f(x)
x∈Rn
where:
f : Rn → R is a differentiable (and possibly nonlinear) objective
function.
x = (x1 , x2 , . . . , xn ) is the vector of decision variables.
Objective:
The goal is to find the vector x∗ that minimizes the function f(x).
This problem does not involve any constraints, so the solution must
satisfy the first-order necessary conditions for optimality.
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
17 / 24
Optimality Conditions
Theorem 1: First-Order Necessary Condition
If the function f(x) has a local extremum at x = x∗ , and if the gradient of
f(x) exists at x = x∗ , then ∇f(x∗ ) = 0.
Theorem 2: Second-Order Neccessary Condition
Let H(x∗ ) = ∇(∇f(x∗ )) = ∇2 f(x∗ ). Then, f(x∗ ) is:
a minimum value of f(x) if H(x∗ ) is positive definite;
a maximum value of f(x) if H(x∗ ) is negative definite.
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
18 / 24
Proof of Theorem 2
Taylor Expansion: Consider a twice-differentiable function f(x) and
expand it around a point x∗ :
1
f(x) ≈ f(x∗ ) + ∇f(x∗ )T (x − x∗ ) + (x − x∗ )T H(f)(x∗ )(x − x∗ ) + o(∥x − x∗ ∥2 )
2
Where:
∇f(x∗ ) is the gradient at x∗ .
H(f)(x∗ ) is the Hessian matrix at x∗ .
First-Order Condition:
∇f(x∗ ) = 0
At a local minimum, the first-order term vanishes. Thus, the expansion
reduces to:
1
f(x) ≈ f(x∗ ) + (x − x∗ )T H(f)(x∗ )(x − x∗ ) + o(∥x − x∗ ∥2 )
2
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
19 / 24
Proof of Theorem 2 (cont’d)
Second-Order Condition:
For x∗ to be a local minimum, the quadratic term must be non-negative:
1
(x − x∗ )T H(f)(x∗ )(x − x∗ ) ≥ 0
2
This implies that the Hessian matrix H(f)(x∗ ) must be positive
semi-definite.
If H(f)(x∗ ) is positive definite, then x∗ is a strict local minimum.
If H(f)(x∗ ) is only positive semi-definite, x∗ may still be a local
minimum, but further analysis is required.
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
20 / 24
Example: Finding critical points of a function
Given Function:
f(x, y) = x3 − 3x + y2
We will:
Calculate the gradient and find the critical points.
Compute the Hessian matrix at each critical point.
Use the Hessian to classify each critical point.
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
21 / 24
Step 1: Gradient and Critical Points
The gradient of f(x, y) is:
[
∇f(x, y) =
∂f
∂x
∂f
∂y
]
[
3x2 − 3
=
2y
]
Setting the gradient to zero to find critical points:
3x2 − 3 = 0
and
x = ±1 and
2y = 0
y=0
Critical points:
(1, 0)
(UPT-UTC)
and
(−1, 0)
An Introduction to Nonlinear Programming
October 9, 2024
22 / 24
Step 2: Hessian Matrix
The Hessian matrix is the matrix of second-order partial derivatives:
[ 2
] [
]
∂ f
∂2f
6x 0
∂x∂y
∂x2
H(f)(x, y) = ∂ 2 f
=
∂2f
0 2
2
∂y∂x
∂y
We will now evaluate the Hessian at each critical point.
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
23 / 24
Step 3: Classifying the Critical Points (1, 0) and (-1, 0)
For the critical point (1, 0):
[
6 0
H(f)(1, 0) =
0 2
]
Eigenvalues : λ1 = 6, λ2 = 2 => Both eigenvalues are positive.
H(f)(1, 0) > 0 (positive definite)
Hence, (1, 0) is a local minimum.
For the critical point (−1, 0):
[
−6 0
H(f)(−1, 0) =
0 2
]
Eigenvalues : λ1 = −6, λ2 = 2 => One negative and one positive eigenvalue
H(f)(−1, 0) is undefined
Hence, (−1, 0) is a saddle point (neither a minimum nor maximum).
(UPT-UTC)
An Introduction to Nonlinear Programming
October 9, 2024
24 / 24
Download