Multidisciplinary System Design Optimization (MSDO) Gradient Calculation and Sensitivity Analysis Lecture 9 Olivier de Weck Karen Willcox 1 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Today‟s Topics • Gradient calculation methods – – – – – Analytic and Symbolic Finite difference Complex step Adjoint method Automatic differentiation • Post-Processing Sensitivity Analysis – effect of changing design variables – effect of changing parameters – effect of changing constraints 2 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Definition of the Gradient “How does the function J value change locally as we change elements of the design vector x?” Compute partial derivatives of J with respect to xi J x3 J J x2 J xn Gradient vector points normal to the tangent hyperplane of J(x) x2 3 J xi J x1 x1 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Geometry of Gradient vector (2D) Example function: J x1 , x2 x1 1 x1 x2 x2 Contour plot 2 1.8 25 3. 1.6 4 3.2 5 3. 5 3.1 1.4 3.1 1 5 3.2 3.5 x2 1.2 5 4 0.8 3.1 0.6 J 4 J x1 1 1 2 x1 x2 J x2 1 1 x1 x22 3.5 0.4 5 5 3.2 3.25 4 3.5 4 0.2 0 0 0.5 1 x1 1.5 2 Gradient normal to contours © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Geometry of Gradient vector (3D) Example J 2 1 2 2 x x 2 3 x increasing values of J 2 x1 J J 2 x2 2 x3 xo 2 2 2 Tangent plane 2 x1 2 x2 2 x3 6 0 x3 J=3 x2 x o 1 1 1 T x1 Gradient vector points to larger values of J 5 T © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Other Gradient-Related Quantities • Jacobian: Matrix of derivatives of multiple functions w.r.t. vector of variables J J1 J2 J Jz J1 x1 J2 x1 Jz x1 J1 x2 J2 x2 Jz x2 J1 xn J2 xn Jz xn nxz zx1 • Hessian: Matrix of second-order derivatives 2 H 6 2J J x12 2 J x2 x1 2 J xn x1 2 J x1 x2 2 J x22 2 J xn x2 2 J x1 xn 2 J x2 xn 2 J xn2 nxn © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Why Calculate Gradients • Required by gradient-based optimization algorithms – Normally need gradient of objective function and each constraint w.r.t. design variables at each iteration – Newton methods require Hessians as well • Isoperformance/goal programming • Robust design • Post-processing sensitivity analysis – determine if result is optimal – sensitivity to parameters, constraint values 7 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Analytical Sensitivities If the objective function is known in closed form, we can often compute the gradient vector(s) in closed form (analytically): Example 1 Example: J x1 , x2 x1 x2 x1 = x2 =1 x1 x2 J(1,1)=3 Analytical Gradient: J J x1 1 1 2 x1 x2 J x2 1 1 x1 x22 J (1,1) Minimum For complex systems analytical gradients are rarely available 8 0 0 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Symbolic Differentiation • Use symbolic mathematics programs • e.g. MATLAB®, Maple®, Mathematica® construct a symbolic object » syms x1 x2 » J=x1+x2+1/(x1*x2); » dJdx1=diff(J,x1) dJdx1 =1-1/x1^2/x2 » dJdx2=diff(J,x2) dJdx2 = 1-1/x1/x2^2 9 difference operator © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Finite Differences (I) Function of a single variable f(x) • First-order finite difference approximation of gradient: ' f xo f xo x x f xo Forward difference approximation to the derivative O f f xo 10 f xo x f xo 2 x x x Truncation Error x xo- x xo xo+ x • Second-order finite difference approximation of gradient: ' x x O x2 Central difference Truncation Error approximation to © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox the derivative Engineering Systems Division and Dept. of Aeronautics and Astronautics Finite Differences (II) Approximations are derived from Taylor Series expansion: 2 f xo x f xo x '' f xo 2 ' xf xo O x32 Neglect second order and higher order terms; solve for gradient vector: ' f xo f xo x x f xo Forward Difference O x Truncation Error O xo 11 x x '' f 2 xo x © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Finite Differences (III) Take Taylor expansion backwards at xo x 2 '' f xo 2 x 2 '' f xo 2 ' f xo x f xo xf xo f xo x f xo xf ' xo x O x2 (1) O x2 (2) (1) - (2) and solve again for derivative ' f xo f xo x f xo 2 x x Central Difference O x2 Truncation Error O xo 12 x2 x 2 ''' f 6 xo x © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Finite Differences (IV) J x11 J x1 J x1o J J x1o x1 x11 x1o J(x) J x1 x1 finite difference approximation J x11 J x1o true, analytical sensitivity 13 J x1o o 1 x1 1 1 x x x1 o x11 - x1 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Finite Differences (V) • Second-order finite difference approximation of second derivative: f ''( xo ) f xo x 2 f xo x2 f xo x f x x x xo- x xo xo+ x © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Errors of Finite Differencing Caution: - Finite differencing always has errors - Very dependent on perturbation size J x1 , x2 10 9 8 7 6 5 4 3 2 1 x2 x1 = x2 =1 J(1,1)=3 x1 0 10 Rounding Errors ~ 1/ x -7 10 10 x1 Choice of x is critical Truncation Errors ~ x -6 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 15 J (1,1) 0 0 -5 10 Gradient Error J x1 1 x1 x2 -8 10 -9 10 -8 10 -7 -6 10 Perturbation Step Size x1 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Perturbation Size x Choice theoretical function • Error Analysis x x A A f f (Gill et al. 1981) 1/ 2 - Forward difference 1/ 3 A - Central difference ~ x • Machine Precision Step size at k-th iteration xk xo xk 10 q computed values q-# of digits of machine Precision for real numbers • Trial and Error – typical value ~ 0.1-1% 16 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Computational Expense of FD F Ji Cost of a single objective function evaluation of Ji n F Ji Cost of gradient vector one-sided finite difference approximation for Ji for a design vector of length n z n F Ji Cost of Jacobian finite difference approximation with z objective functions Example: 6 objectives 30 design variables 1 sec per function evaluation 17 3 min of CPU time for a single Jacobian estimate - expensive ! © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Complex Step Derivative • Similar to finite differences, but uses an imaginary step p Im[ f ( x0 + iΔx)] f ' ( x0 ) ≈ Δx • Second order accurate • Can use very small step sizes e.g. Δx ≈ 10 −20 – Doesn’t have rounding error, since it doesn’t perform subtraction • Limited application areas – Code must be able to handle complex step values J.R.R.A. Martins, I.M. Kroo and J.J. Alonso, An automated method for sensitivity analysis using complex variables, AIAA Paper 2000-0689, Jan 2000 18 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Automatic Differentiation • Mathematical formulae are built from a finite set of basic functions, e.g. additions, sin x, exp x, etc. • Using chain rule, differentiate analysis code: add statements that generate derivatives of the basic functions • Tracks numerical values of derivatives, does not track symbolically as discussed before • Outputs modified program = original + derivative capability • e.g., ADIFOR (FORTRAN), TAPENADE (C, FORTRAN), TOMLAB (MATLAB), many more… • Resources at http://www.autodiff.org/ 19 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Adjoint Methods Consider the following problem: Minimize s.t. J (x, u) R(x, u) 0 where x are the design variables and u are the state variables. The constraints represent the state equation. e.g. wing design: x are shape variables, u are flow variables, R(x,u)=0 represents the Navier Stokes equations. We need to compute the gradients of J wrt x: dJ dx J x J du u dx Typically the dimension of u is very high (thousands/millions). Adjoint Methods dJ dx J x J du u dx • To compute du/dx, differentiate the state equation: dR dx R x R du u dx R du u dx du dx 21 0 R x R u 1 R x © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Adjoint Methods • We have dJ dx • Now define J x J du u dx J u λ J x R u R x λT 1 T R u J u 1 • Then to determine the gradient: First solve 22 Then compute R u T λ dJ dx J u T (adjoint equation) J T R λ x x Adjoint Methods • Solving adjoint equation R u T λ J u T about same cost as solving forward problem (function evaluation) • Adjoints widely used in aerodynamic shape optimization, optimal flow control, geophysics applications, etc. • Some automatic differentiation tools have „reverse mode‟ for computing adjoints 23 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Post-Processing Sensitivity Analysis • A sensitivity analysis is an important component of post-processing • Key to understanding which design variables, constraints, and parameters are important drivers for the optimum solution • How sensitive is the “optimal” solution J* to changes or perturbations of the design variables x*? • How sensitive is the “optimal” solution x* to changes in the constraints g(x), h(x) and fixed parameters p ? 24 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Sensitivity Analysis: Aircraft Questions for aircraft design: How does my solution change if I • change the cruise altitude? • change the cruise speed? • change the range? • change material properties? • relax the constraint on payload? • ... 25 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Sensitivity Analysis: Spacecraft Questions for spacecraft design: How does my solution change if I • change the orbital altitude? • change the transmission frequency? • change the specific impulse of the propellant? • change launch vehicle? • Change desired mission lifetime? • ... 26 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Sensitivity Analysis “How does the optimal solution change as we change the problem parameters?” effect on design variables effect on objective function effect on constraints Want to answer this question without having to solve the optimization problem again. 27 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Normalization In order to compare sensitivities from different design variables in terms of their relative sensitivity it may be necessary to normalize: J xi J J xi xi “raw” - unnormalized sensitivity = partial derivative evaluated at point xi,o xo = xi ,o o J (x ) J xi Normalized sensitivity captures relative sensitivity x o ~ % change in objective per % change in design variable Important for comparing effect between design variables 28 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Example: Dairy Farm Problem “Dairy Farm” sample problem L – Length = 100 [m] N - # of cows = 10 R – Radius = 50 [m] cow cow N R xo With respect to which design variable is the objective most sensitive? cow fence L Parameters: f=100$/m n=2000$/cow m=2$/liter 29 A 2 LR R2 F 2L 2 R M 100 A/ N C f F n N I N M m P I C © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Dairy Farm Sensitivity P L P N P R • Compute objective at xo J (x o ) 13092 • Then compute raw sensitivities J o J x J o J (x ) • Show graphically with tornado chart 0.28 1.7 2.25 Dairy Farm Normalized Sensitivities Design Variable • Normalize 100 36.6 13092 10 2225.4 13092 50 588.4 13092 R N L 0 30 36.6 2225.4 588.4 0.5 1 1.5 2 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics 2.5 Realistic Example: Spacecraft NASA Nexus Spacecraft Concept J2= Centroid Jitter on Focal Plane [RSS LOS] 60 40 1 pixel Centroid Y [ m] T=5 sec Z X Y Finite Element Model 20 0 14.97 m -20 -40 0 1 2 meters Image by MIT OpenCourseWare. Spacecraft CAD model 31 Requirement: J2,req=5 m -60 -60 -40 Simulation -20 0 20 40 Centroid X [ m] “x”-domain “J”-domain What are the design variables that are “drivers” of system performance ? © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics 60 Graphical Representation Srg Sst Tgs m_SM K_yPM K_rISO Srg Sst Tgs m_SM K_yPM K_rISO m_bus K_zpet t_sp I_ss I_propt zeta lambda m_bus K_zpet t_sp I_ss I_propt zeta lambda Ro QE Mgs fca Kc Kcf Ro QE Mgs fca Kc Kcf analytical finite difference -0.5 0 x 32 o /J 1,o 0.5 1 1.5 * J / x 1 Graphical Representation of Jacobian evaluated at design xo, normalized for comparison. J structural variables Ru Us Ud fc Qc Tst Design Variables Ru Us Ud fc Qc Tst disturbance variables J2: Norm Sensitivities: RSS LOS -0.5 0 0.5 1 1.5 xo /J2,o * J2 / x control optics variables variables J1: Norm Sensitivities: RMMS WFE x0 Jo J1 Ru J2 Ru J1 K cf J2 K cf J1: RMMS WFE most sensitive to: Ru - upper wheel speed limit [RPM] Sst - star tracker noise 1 [asec] K_rISO - isolator joint stiffness [Nm/rad] K_zpet - deploy petal stiffness [N/m] J2: RSS LOS most sensitive to: Ud - dynamic wheel imbalance [gcm2] K_rISO - isolator joint stiffness [Nm/rad] zeta - proportional damping ratio [-] Mgs - guide star magnitude [mag] Kcf - FSM controller gain [-] © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Parameters Parameters p are the fixed assumptions. How sensitive is the optimal solution x* with respect to fixed parameters ? Optimal solution: Example: x* =[ R=106.1m, L=0m, N=17 cows]T “Dairy Farm” sample problem Fixed parameters: cow R cow N cow fence L Maximize Profit 33 Parameters: f=100$/m - Cost of fence n=2000$/cow - Cost of a single cow m=2$/liter - Market price of milk How does x* change as parameters change? © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Sensitivity Analysis KKT conditions: J ( x *) j gˆ j ( x * ) j M gˆ j ( x *) 0, j 0, j j M M 0 Set of active constraints For a small change in a parameter, p, we require that the KKT conditions remain valid: d (KKT conditions) 0 dp Rewrite first equation: J * (x ) xi 34 gˆ j j j M xi (x * ) 0, i 1,..., n © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Sensitivity Analysis Recall chain rule. If: Y = Y ( p, x( p ) ) then n dY ∂Y ∂Y ∂xi = +∑ dp ∂p k =1 ∂xi ∂p A l i tto fifirstt equation Applying ti off KKT conditions: diti ∂gˆ j (x, p ) ⎞ d ⎛ ∂J (x, p ) ⎜ ⎟ + ∑ λ j ( p) ⎜ ⎟ dp ⎝ ∂xi ∂xi j∈M ⎠ 2 2 2 2 n ⎛ ∂ gˆ j ⎞ ∂xk ∂λ j ∂gˆ j ∂ J ∂ J ∂ g ⎟ + ∑⎜ + ∑λj +∑ =0 = + ∑λj ⎟ ⎜ ∂xi ∂p j∈M ∂xi ∂p k =1 ⎝ ∂xi ∂xk j∈M ∂xi ∂xk ⎠ ∂p j∈M ∂p ∂xi ∂λ j ∂xk Aik + ∑ Bij + ci = 0 ∑ ∂p j∈M ∂p k =1 n 35 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Sensitivity Analysis * g j ( x , p) Perform same procedure on equation: gˆ j n p k 1 n Bkj k 1 36 xk p gˆ j xk xk p dj 0 0 0 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Sensitivity Analysis In matrix form we can write: n M A T B n M 2 2 Aik Bij ci dj 37 J xi xk gˆ j xi 2 J xi p gˆ j p j j M 2 j j M B 0 x λ gˆ j xi xk gˆ j xi p x x1 p x2 p xn p c d 0 1 p 2 λ p M p © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Sensitivity Analysis We solve the system to find x and , then the sensitivity of the objective function with respect to p can be found: dJ dp J p JT x J dJ p dp x x p (first-order approximation) To assess the effect of changing a different parameter, we only need to calculate a new RHS in the matrix system. 38 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Sensitivity Analysis - Constraints • • We also need to assess when an active constraint will become inactive and vice versa An active constraint will become inactive when its Lagrange multiplier goes to zero: j j p p Find the p that makes j p j j zero: p j j p 0 j M j This is the amount by which we can change p before the jth constraint becomes inactive (to a first order approximation) 39 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Sensitivity Analysis - Constraints An inactive constraint will become active when gj(x) goes to zero: g j (x) g j (x*) g j (x*)T x p 0 Find the p that makes gj zero: p g j ( x *) T g j ( x *) x for all j not active at x* • This is the amount by which we can change p before the jth constraint becomes active (to a first order approximation) • If we want to change p by a larger amount, then the problem must be solved again including the new constraint • Only valid close to the optimum © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox 40 Engineering Systems Division and Dept. of Aeronautics and Astronautics Lagrange Multiplier Interpretation • Consider the problem: minimize J(x) s.t. h(x)=0 with optimal solution x* • What happens if we change constraint k by a small amount? h j (x* ) 0, hk (x* ) j k j k • Differentiating w.r.t dx* hk d 41 1 dx* hj d 0, © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Lagrange Multiplier Interpretation • How does the objective function change? dx* J d dJ d • Using KKT conditions: dJ d j j hj dx* d j j dx* hj d k • Lagrange multiplier is negative of sensitivity of cost function to constraint value. Also called shadow price. 42 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics Lecture Summary • Gradient calculation approaches – Analytical and Symbolic – Finite difference – Automatic Differentiation – Adjoint methods • Sensitivity analysis – Yields important information about the design space, both as the optimization is proceeding and once the “optimal” solution has been reached. Reading Papalambros – Section 8.2 Computing Derivatives 43 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Engineering Systems Division and Dept. of Aeronautics and Astronautics MIT OpenCourseWare http://ocw.mit.edu ESD.77 / 16.888 Multidisciplinary System Design Optimization Spring 2010 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.