An Introduction to Nonlinear Programming Lecture 1 Universiteti Politeknik i Tiranes Université de Technologie de Compiègne October 9, 2024 (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 1 / 24 Outline 1 Introduction What is Nonlinear Programming? Some recalls 2 Unconstrained Optimization Optimality Conditions (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 2 / 24 Introduction (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 3 / 24 What is Nonlinear Programming? Nonlinear Programming (NLP) involves optimizing an objective function where the function or the constraints are nonlinear. General Form: min x∈Rn f(x) subject to gi (x) ≤ 0, i = 1, . . . , m hj (x) = 0, j = 1, . . . , p Applications in engineering, economics, finance, and machine learning. (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 4 / 24 Linear vs. Non-Linear Optimization Aspect Linear Optimization (LP) Objective Function Constraints Complexity Linear: f(x) = c1 x1 + c2 x2 + · · · + c n xn Linear: a1 x1 + a2 x2 ≤ b Efficient and solvable in polynomial time Always guarantees global optimum if feasible Insensitive to initial conditions Global Optimum Sensitivity Scalability Solvable for large-scale problems (UPT-UTC) Non-Linear Optimization (NLP) Non-linear: f(x) = x21 + sin(x2 ) + . . . Non-linear: g(x) = x21 + x32 ≤ b Generally harder to solve; may only find local optima May only find local optima for non-convex problems Sensitive to starting points; can converge to local optima More computationally expensive, especially for large or nonconvex problems An Introduction to Nonlinear Programming October 9, 2024 5 / 24 Classification of non-linear problems Unconstrained optimization Classical analytical methods (Optimality conditions) Numerical methods Deterministic (Gradient descent) Stochastic Constrained optimization Equality constraints (Lagrangian Multiplier) Inequality constraints (KKT conditions) (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 6 / 24 Properties of matrices Let A ∈ Mm,n (R). Symmetric Matrix Matrix A is symmetric if and only if A = AT . (Ax, y) = (x, AT y) = (x, Ay) (scalar product) ∀x, y ∈ Rn Positive Semi-Definite Matrix Matrix A is positive semi-definite (A ≥ 0) if and only if: ∀x ∈ Rn (Ax, x) ≥ 0 (positive eigenvalues) Positive Definite Matrix Matrix A is positive definite (A > 0) if and only if: A ≥ 0 and (Ax, x) > 0 for x ̸= 0 (strictly positive eigenvalues) (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 7 / 24 Properties of matrices Eigenvalues and Eigenvectors Let λ ∈ R, λ is an eigenvalue of A if and only if ∃x ∈ Rn , x ̸= 0 such that Ax = λx. x is an eigenvector of A, associated with the eigenvalue λ. Let A ∈ Mn,n (R). Inverse Matrix A−1 A is square A is non-singular (determinant is non-zero): det(A) ̸= 0 (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 8 / 24 Differentiation A function f(x) is said to be differentiable if the following limit exists: f(a + h) − f(a) h→0 h lim If this limit exists and is finite, the function is differentiable at a. If the limit does not exist or is infinite, the function is not differentiable at that point. First-Order Differentiation lim t→0 f(x + th) − f(x) = ⟨∇f(x), h⟩ = ∇f(x) · h t This is the directional derivative of f(x) in the direction h. (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 9 / 24 Gradient of a Function Definition of the Gradient The gradient of a differentiable scalar function f : Rn → R is a vector of the partial derivatives of f with respect to its variables. It is denoted by ∇f and is defined as: ∂f ∂x1 ∂f ∂x ∇f(x) = . 2 .. ∂f ∂xn where x = (x1 , x2 , . . . , xn )T ∈ Rn . Interpretation of the Gradient The gradient points in the direction of the steepest ascent of the function f. In optimization, the negative gradient −∇f(x) is often used to find the direction of steepest descent. The magnitude of the gradient ∥∇f(x)∥ represents the rate of change of f in the direction of the gradient. (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 10 / 24 Hessian Matrix Definition The Hessian matrix H(f) of a twice-differentiable function f : Rn → R is a square matrix of second-order partial derivatives. 2 ∂ f ∂2f ∂2f · · · 2 ∂x1 ∂x2 ∂x1 ∂xn 1 ∂x ∂2f ∂2f ∂2f · · · 2 ∂x ∂x ∂x2 ∂xn ∂x2 H(f)(x) = 2. 1 .. .. .. . . . . . 2 2 2 ∂ f ∂ f ∂ f ∂xn ∂x1 ∂xn ∂x2 · · · ∂x2 n Properties of the Hessian Matrix If H(f)(x) is positive definite at x, then f has a local minimum at x. If H(f)(x) is negative definite at x, then f has a local maximum at x. If H(f)(x) has both positive and negative eigenvalues, then x is a saddle point. (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 11 / 24 Example: Gradient and Hessian Calculation Function: f(x, y) = 3x2 + 2xy + 4y2 Step 1: Gradient of the Function The gradient of f is the vector of first-order partial derivatives: [ ] [ ] ∂f 6x + 2y ∂x ∇f(x, y) = ∂f = 2x + 8y ∂y Step 2: Hessian Matrix The Hessian matrix is the square matrix of second-order partial derivatives: [ 2 ] [ ] ∂ f ∂2f 6 2 2 ∂x∂y ∂x H(f)(x, y) = ∂ 2 f = ∂2f 2 8 2 ∂y∂x ∂y (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 12 / 24 Convex sets Definition: Let K ⊆ Rn , K ̸= ∅ 1 2 K is a convex set if and only if for all x, y ∈ K, for all α ∈ [0, 1], we have: (1 − α)x + αy ∈ K (line segment) The set of points of the line [x, y] = {(1 − α)x + αy : α ∈ [0, 1]} ⊆ K (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 13 / 24 Convex Functions A convex function is a function f : K → R that satisfies: f((1 − α)x + αy) ≤ (1 − α)f(x) + αf(y) ∀x, y ∈ K, ∀α ∈ [0, 1] Some properties f′ is non-decreasing: (f′ (y) − f′ (x))(y − x) ≥ 0 f′′ ≥ 0 : Hessian Hf = ∇2 f ≥ 0 ⇒ positive semi-definite Strict Convexity A strictly convex function is a function f : K → R that satisfies: f((1 − α)x + αy) < (1 − α)f(x) + αf(y), (UPT-UTC) ∀x ̸= y, α ∈ [0, 1] An Introduction to Nonlinear Programming October 9, 2024 14 / 24 Quadratic Forms Definition: A quadratic form is a scalar-valued function defined as: 1 1 f(x) = (Ax, x) − (b, x) = xT Ax − bT x 2 2 where A ∈ Rn×n is a symmetric positive definite matrix and b ∈ Rn is a vector. Properties: The gradient of f(x) is: ∇f(x) = Ax − b The Hessian of f(x) is constant: H(f) = A (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 15 / 24 Unconstrained Optimization (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 16 / 24 Unconstrained Nonlinear Optimization General Form of an Unconstrained Optimization Problem: min f(x) x∈Rn where: f : Rn → R is a differentiable (and possibly nonlinear) objective function. x = (x1 , x2 , . . . , xn ) is the vector of decision variables. Objective: The goal is to find the vector x∗ that minimizes the function f(x). This problem does not involve any constraints, so the solution must satisfy the first-order necessary conditions for optimality. (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 17 / 24 Optimality Conditions Theorem 1: First-Order Necessary Condition If the function f(x) has a local extremum at x = x∗ , and if the gradient of f(x) exists at x = x∗ , then ∇f(x∗ ) = 0. Theorem 2: Second-Order Neccessary Condition Let H(x∗ ) = ∇(∇f(x∗ )) = ∇2 f(x∗ ). Then, f(x∗ ) is: a minimum value of f(x) if H(x∗ ) is positive definite; a maximum value of f(x) if H(x∗ ) is negative definite. (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 18 / 24 Proof of Theorem 2 Taylor Expansion: Consider a twice-differentiable function f(x) and expand it around a point x∗ : 1 f(x) ≈ f(x∗ ) + ∇f(x∗ )T (x − x∗ ) + (x − x∗ )T H(f)(x∗ )(x − x∗ ) + o(∥x − x∗ ∥2 ) 2 Where: ∇f(x∗ ) is the gradient at x∗ . H(f)(x∗ ) is the Hessian matrix at x∗ . First-Order Condition: ∇f(x∗ ) = 0 At a local minimum, the first-order term vanishes. Thus, the expansion reduces to: 1 f(x) ≈ f(x∗ ) + (x − x∗ )T H(f)(x∗ )(x − x∗ ) + o(∥x − x∗ ∥2 ) 2 (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 19 / 24 Proof of Theorem 2 (cont’d) Second-Order Condition: For x∗ to be a local minimum, the quadratic term must be non-negative: 1 (x − x∗ )T H(f)(x∗ )(x − x∗ ) ≥ 0 2 This implies that the Hessian matrix H(f)(x∗ ) must be positive semi-definite. If H(f)(x∗ ) is positive definite, then x∗ is a strict local minimum. If H(f)(x∗ ) is only positive semi-definite, x∗ may still be a local minimum, but further analysis is required. (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 20 / 24 Example: Finding critical points of a function Given Function: f(x, y) = x3 − 3x + y2 We will: Calculate the gradient and find the critical points. Compute the Hessian matrix at each critical point. Use the Hessian to classify each critical point. (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 21 / 24 Step 1: Gradient and Critical Points The gradient of f(x, y) is: [ ∇f(x, y) = ∂f ∂x ∂f ∂y ] [ 3x2 − 3 = 2y ] Setting the gradient to zero to find critical points: 3x2 − 3 = 0 and x = ±1 and 2y = 0 y=0 Critical points: (1, 0) (UPT-UTC) and (−1, 0) An Introduction to Nonlinear Programming October 9, 2024 22 / 24 Step 2: Hessian Matrix The Hessian matrix is the matrix of second-order partial derivatives: [ 2 ] [ ] ∂ f ∂2f 6x 0 ∂x∂y ∂x2 H(f)(x, y) = ∂ 2 f = ∂2f 0 2 2 ∂y∂x ∂y We will now evaluate the Hessian at each critical point. (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 23 / 24 Step 3: Classifying the Critical Points (1, 0) and (-1, 0) For the critical point (1, 0): [ 6 0 H(f)(1, 0) = 0 2 ] Eigenvalues : λ1 = 6, λ2 = 2 => Both eigenvalues are positive. H(f)(1, 0) > 0 (positive definite) Hence, (1, 0) is a local minimum. For the critical point (−1, 0): [ −6 0 H(f)(−1, 0) = 0 2 ] Eigenvalues : λ1 = −6, λ2 = 2 => One negative and one positive eigenvalue H(f)(−1, 0) is undefined Hence, (−1, 0) is a saddle point (neither a minimum nor maximum). (UPT-UTC) An Introduction to Nonlinear Programming October 9, 2024 24 / 24