Nonlinear Regression KNNL – Chapter 13 Nonlinear Relations wrt X – Linear wrt bs 1) Polynomial Models: E Yi b 0 b1 X i b 2 X i2 E Yi X i E Yi b 0 b 0 b1 X i b 2 X i2 0 b1 2b 2 X i h X i X i E Yi 1 b1 E Yi Xi b 2 Yb 1 h1 X i1 1 X i1 i X i1 E Yi b 0 1 E Yi b1 1 Yi b 0 b1 ln X i1 b 2 X i2 E Y b 2) Transformed Variable Models: E E X i2 b0 None are functions of β b1 b 2 i X i 2 ln X i1 E Yi b 2 1 2 X2 i2 1 X i2 h2 X i 2 b0 None are functions of β b1 b 2 In each case: E Yi f Xi , β X'i β Case 1: X'i 1 X i X i2 Case 2: X'i 1 ln X i1 1 X i2 Nonlinear Regression Models Nonlinear Regression models often use γ as vector of coefficients to distinguish from linear models: Exponential Regression Models (Often used for modeling growth, where rate of growth changes): E Yi 0 exp 1 X i E Yi 0 exp 1 X i E Yi 1 0 X i exp 1 X i functions of γ f Xi , γ 0 exp 1 X i X'i γ More general exponential model (with errors independent and N 0, 2 ) : Yi 0 1 exp 2 X i i Typically, 0 0, 1 0, 2 0 Intercept: E Yi | X i 0 0 1 (1) 0 1 Asymptote: E Yi | X i 0 1 (1) 0 0.693 0.693 "Half-way" Point: E Yi | X i exp 0 1 exp 0.693 0 1 0 1 2 2 2 2 Data Description - Orlistat • 163 Patients assigned to one of the following doses (mg/day) of orlistat: 0, 60,120,150,240,300,480,600,1200 • Response measured was fecal fat excretion (purpose is to inhibit fat absorption, so higher levels of response are considered favorable) • Plot of raw data displays a generally increasing but nonlinear pattern and large amount of variation across subjects Fecal Fat Excretion vs Orlistat Daily Dose 60 50 Percent FFE 40 ffe 30 meanffe 20 10 0 0 200 400 600 800 Dose 1000 1200 1400 Nonlinear Regression Model - Example Y 0 1x 2 x Simple Maximum Effect (Emax) model: 0 ≡ Mean Response at Dose 0 • 1 ≡ Maximal Effect of Orlistat (0 1 Maximum Mean Response) • 2 ≡ Dose providing 50% of maximal effect (ED50) Nonlinear Least Squares f Xi , γ f i γ f ( 0 , 1 , 2 ) 0 f Xi , γ fi γ 1 X i 2 Xi Xi 1 X i Fi γ 1 2 γ ' γ ' X 2 X i 2 i 1 X1 0 2 X1 f1 Y1 Y f γ f n Yn X 0 1 n 2 X n X1 1 X 1 1 2 X X 2 n F1 2 1 Fγ Fn Xn 1 X n 1 2 X X 2 n 2 n F acts like the X matrix in linear regression (but depends on parameters) Nonlinear Least Squares ^ ^ ^ Goal: Choose 0 , 1 , 2 that minimize error sum of squares: 2 1 X i Q SSE γ Yi 0 2 Xi i 1 n Y - f γ Y - f γ ' n 1 X i Q 2 Yi 0 Fi j j 2 X i i 1 set T Q 2 Y - f γ F γ 0 0 0 ' γ j 0,1, 2 Estimated Variance-Covariance Matrix s γ s F F 2 ^ 2 ^' ^ -1 ^ T ^ Y-f Y-f s2 n p ^ s i -1 s F F ( i1,i1) ^' ^ ^ ^ Note: KNNL uses g for γ and D for F Orlistat Example • Reasonable Starting Values: – 0: Mean of 0 Dose Group: 5 – 1: Difference between highest mean and dose 0 mean: 33-5=28 – 2: Dose with mean halfway between 5 and 33: 160 • • • • Create Vectors Y and f (0) Generate matrix F(0) Obtain first “new” estimate of Continue to Convergence Orlistat Example – Iteration History (Tolerance = .0001) iteration 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 g0 5.0000 6.2379 6.1771 6.1507 6.1361 6.1277 6.1227 6.1197 6.1180 6.1169 6.1162 6.1158 6.1156 6.1155 6.1154 6.1153 g1 28.0000 28.5863 28.1281 27.9163 27.7967 27.7272 27.6861 27.6615 27.6467 27.6377 27.6323 27.6291 27.6271 27.6259 27.6251 27.6247 g2 160.0 140.9 133.7 129.9 127.8 126.5 125.8 125.4 125.1 125.0 124.9 124.8 124.8 124.8 124.7 124.7 SSE 13541.6 12945.5 12942.9 12942.2 12942.0 12941.9 12941.9 12941.9 12941.9 12941.9 12941.9 12941.9 12941.9 12941.9 12941.9 12941.9 Delta(g) 365.5745418 52.82513814 14.44158887 4.506448063 1.510150161 0.526393989 0.187692352 0.067822683 0.024703325 0.009040833 0.003318268 0.001220029 0.000449042 0.000165379 6.09317E-05 27.62 X Y 6.12 124.7 X ^ Fitted Equation, Raw Data - FFE vs ODD 60 50 fecal fat excretion 40 ffe 30 f(odd,bhat) 20 10 0 0 200 400 600 800 orlistat daily dose 1000 1200 1400 Variance Estimates/Confidence Intervals 2 Y f i i γ s 2 i 1 80.89 163 3 1.1594 0.7219 15.609 ' ^ -1 ^ ^ 2 2 s γ s F F 0.7219 12.081 130.14 15.609 130.14 2238.76 163 ^ Parameter Estimate Std. Error 95% CI 0 6.12 1.08 (3.96 , 8.28) 1 27.62 3.48 (20.66 , 34.58) 2 124.7 47.31 (30.08 , 219.32) Notes on Nonlinear Least Squares Large-Sample Theory: When i ~ N 0, ^ ^ independent , for large n: γ is approximately normal Approximate E γ γ ^ 2 γ ~ N γ, 2 F'F -1 2 ^ γ ^ ^ estimated by s γ MSE F 'F 2 ^ -1 (Approximately) For small samples: • When errors are normal, independent, with constant variance, we can often use the t-distribution for tests and confidence intervals (software packages do this implicitly) • When the extent of nonlinearity is extreme, or normality assumptions do not hold, should use bootstrap to estimate standard errors of regression coefficients