CHEE801 Module 5: Nonlinear Regression 1 Notation Model: random noise component Yi f (x i , ) i explanatory variables – ith run conditions p-dimensional vector of parameters Model specification – f ( x i , ) – the model equation is – with n experimental runs, we have – ( ) defines the expectation surface – the nonlinear regression model is Y ( ) f ( x1 , ) f ( x 2 , ) ( ) f ( x , ) n 2 Parameter Estimation – Gauss-Newton Iteration Least squares estimation – minimize Y ( ) 2 e e S ( ) T Numerical optimization procedure is required. One possible method: 1. Linearization about the current estimate of the parameters 2. Solution of the linear(ized) regression problem to obtain the next parameter estimate 3. Iteration until a convergence criterion is satisfied 3 Linearization about a nominal parameter vector Linearize the expectation function η(θ) in terms of the parameter vector θ about a nominal vector θ0: ( ) ( 0 ) V 0 ( 0 ) ( 0 ) V 0 Sensitivity Matrix -Jacobian of the expectation function -contains first-order sensitivity information V0 ( ) T 0 f ( x1 , ) 1 f ( x n , ) 1 f ( x1 , ) p f ( x n , ) p 0 4 Parameter Estimation – Gauss-Newton Iteration Iterative procedure consisting of: 1. Linearization about the current estimate of the parameters Y ( 2. ) V (i ) ( i 1) e Solve the linearized regression problem to obtain the next parameter estimate update ( i 1) 3. (i ) (V ( i 1) ( i )T V ( i ) 1 (i ) ) V ( i )T ( y ( (i ) )) ( i 1) Iterate until the parameter estimates converge 5 Computational Issues in Gauss-Newton Iteration The Gauss-Newton iteration can be subject to poor numerical conditioning, for some parameter values: » Conditioning problems arise in inversion of VTV » Solution – use a decomposition technique • QR decomposition • Singular Value Decomposition (SVD) » Use a different optimization technique » Don’t try to estimate so many parameters • Simplify the model • Fix some parameters at reasonable values 6 Other numerical estimation methods • Nonlinear least-squares is a minimization problem • Use any good optimization technique to find parameter estimates to minimize the sum of squares of the residuals 7 Inference – Joint Confidence Regions • • Approximate confidence regions for parameters and predictions can be obtained by using a linearization approach Approximate covariance matrix for parameter estimates: T 1 2 ˆ ( Vˆ Vˆ ) • • ˆ is the Jacobian of () evaluated at the least squares where V parameter estimates This covariance matrix is asymptotically the true covariance matrix for the parameter estimates as the number of data points becomes infinite 100(1-α)% joint confidence region for the parameters: T T 2 ( ˆ ) Vˆ Vˆ ( ˆ ) p s F p , n p , » Compare to the linear regression case 8 Inference – Marginal Confidence Intervals • Marginal confidence intervals » Confidence intervals on individual parameters ˆi t , / 2 sˆi where sˆ is the approximate standard error of the parameter i estimate • – i-th diagonal element of the approximate parameter estimate covariance matrix, with noise variance estimated as in the linear case T 1 2 ˆ ( Vˆ Vˆ ) s 9 Precision of the Predicted Responses – Linear Case From the linear regression module The predicted response from an estimated model has uncertainty, because it is a function of the parameter estimates which have uncertainty: e.g., Solder Wave Defect Model - first response at the point -1,-1,-1 y1 0 1 ( 1) 2 ( 1) 3 ( 1) If the parameter estimates were uncorrelated, the variance of the predicted response would be: V a r ( y1 ) V a r ( 0 ) V a r ( 1 ) V a r ( 2 ) V a r ( 3 ) Why? 10 Precision of the Predicted Responses - Linear In general, both the variances and covariances of the parameter estimates must be taken into account. For prediction at the k-th data point: T T Var ( yˆ k ) x k ( X X ) xk1 xk 2 1 2 x k xk1 x T 1 k 2 2 x kp ( X X ) x kp T T Var ( yˆ k ) x k ( X X ) 1 2 T x k x k ˆ x k 11 Precision of the Predicted Responses - Nonlinear Linearize the prediction equation about the least squares estimate: f (x k , ) ˆ ) f ( x , ˆ ) v T ( ˆ ) yˆ k f ( x k , ˆ ) ( k k T ˆ For prediction at the k-th data point: T T 1 2 Var ( yˆ k ) vˆ k ( Vˆ Vˆ ) vˆ k vˆ k 1 vˆ k 2 vˆ k 1 v ˆ T 1 k 2 2 vˆ kp ( Vˆ Vˆ ) vˆ kp T T 1 2 T Note - Var ( yˆ k ) vˆ k ( Vˆ Vˆ ) vˆ k vˆ k ˆ vˆ k 12 Estimating Precision of Predicted Responses Use an estimate of the inherent noise variance T T 1 T T 1 s 2 yˆ k x k (X X ) s 2 yˆ k v k (V V ) 2 x k s 2 v k s linear nonlinear The degrees of freedom for the estimated variance of the predicted response are those of the estimate of the noise variance » replicates » external estimate » MSE 13 Confidence Limits for Predicted Responses Linear and Nonlinear Cases: Follow an approach similar to that for parameters - 100(1-α)% confidence limits for the mean value of a predicted response are: y k t , / 2 s y k » degrees of freedom are those of the inherent noise variance estimate If the prediction is for a new data value, confidence intervals are: yˆ k t , / 2 s 2 2 se yˆ k Why? 14 Properties of LS Parameter Estimates Key Point - parameter estimates are random variables » because stochastic variation in data propagates through estimation calculations » parameter estimates have a variability pattern - probability distribution and density functions Unbiased E { } » “average” of repeated data collection / estimation sequences will be true value of parameter vector 15 Properties of Parameter Estimates Linear Regression Case – Least squares estimates are – » Unbiased » Consistent » Efficient Nonlinear Regression Case – Least squares estimates are – » Asymptotically unbiased – as number of data points becomes infinite » Consistent » Efficient 16 Diagnostics for nonlinear regression • Similar to linear case • Qualitative – residual plots – Residuals vs. » Factors in model » Sequence (observation) number » Factors not in model (covariates) » Predicted responses – Things to look for: » Trend remaining » Non-constant variance • Qualitative – plot of observed and predicted responses – Predicted vs. observed – slope of 1 – Predicted and observed – as function of independent variable(s) 17 Diagnostics for nonlinear regression • Quantitative diagnostics – Ratio tests: » 3 tests are the same as for linear case » R-squared • coarse measure of significant trend • squared correlation of observed and predicted values • adjusted R-squared • squared correlation of observed and predicted values 18 Diagnostics for nonlinear regression • Quantitative diagnostics – Parameter confidence intervals: » Examine marginal intervals for parameters • Based on linear approximations • Can also use hypothesis tests » Consider dropping parameters that aren’t statistically significant » What should we do if parameters are • Not significantly different from zero • Not signficiantly different from the initial guesses » In nonlinear models– parameters are more likely to be involved in more complex expressions involving factors and other parameters • E.g., Arrhenius reaction rate expression » If possible, examine joint confidence regions 19 Diagnostics for nonlinear regression • Quantitative diagnostics – Parameter estimate correlation matrix: » Examine correlation matrix for parameter estimates • Based on linear approximation • Compute covariance matrix, then normalize using pairs of standard deviations » Note significant correlations and keep these in mind when retaining/deleting parameters using marginal significance tests » Significant correlation between some parameter estimates may indicate over-parameterization relative to the data collected • Consider dropping some of the parameters whose estimates are highly correlated • Further discussion – Chapter 3 - Bates and Watts (1988), Chapter 5 - Seber and Wild (1988) 20 Practical Considerations – What kind of stopping conditions should be used to determine convergence? – Problems with local minima? – Reparameterization to reduce correlation between parameter estimates • Ensuring physically realistic parameter estimates – Common problem – we know that some parameters should be positive or should be bounded between reasonable values – Solutions » Constrained optimization algorithm to enforce non-negativity of parameters exp( ) positive » Reparameterization tricks • Estimate instead of positive 10 1 1 e Bounded between 0 and 1 21 Practical considerations • Correlation between parameter estimates – Reduce by reparameterization – Exponential example – 1 exp( 2 x ) 1 exp( 2 ( x x 0 x 0 )) 1 exp( 2 x 0 ) exp( 2 ( x x 0 )) 1 exp( 2 ( x x 0 )) 22 Practical considerations • Particular example – Arrhenius rate expression E 1 E 1 1 k 0 exp ) k 0 exp ( R T T T RT ref ref E 1 E 1 exp ( k 0 exp ) RT R T T ref ref E 1 1 k ref exp ( ) R T T ref – Reduces correlation between parameter estimates and improves conditioning of estimation problem 23 Practical considerations • Scaling – of parameters and responses • Choices – Scale by nominal values » Nominal values – design centre point, typical value over range, average value – Scale by standard errors or initial uncertainty ranges for parameters » Parameters – estimate of standard devn of parameter estimate » Responses – by standard devn of observations – noise standard deviation • Scaling can improve conditioning of the estimation problem (e.g., scale sensitivity matrix V), and can facilitate comparison of terms on similar (dimensionless) bases 24 Practical considerations • Initial parameter guesses are required – From prior scientific knowledge – From prior estimation results – By simplifying model equations 25 Things to learn in CHEE 811 • Estimating parameters in differential equation models: dy dt f ( y , u , t ; ); y ( t 0 ) y 0 • Estimating parameters in multi-response models • Deriving model equations based on chemical engineering knowledge and stories about what is happening • Solving model equations numerically • Deciding which parameters to estimate and which to leave at initial guesses when data are limited. 26