Nonlinear Models 1/14 Nonlinear regression I In linear models, we assume that Yi = XiT β + εi for i = 1, · · · , n. This implies that E(Yi |Xi ) = XiT β, which is a linear function of β. I A generalization of a linear model is a nonlinear model. A nonlinear model assumes that E(Yi |Xi ) = f (Xi ; β), where f (x; β) is a smooth function of β. For example, f (x; β) = (β T x)2 or f (x; β) = exp(β T x). 2/14 Nonlinear regression I Assume the following nonlinear regression model, Yi = f (Xi ; β) + εi for i = 1, · · · , n, where εi ’s are random errors, f (x; β) is a known function of x up to a p-dim unknown parameter vector β. I εi ’s are independent random errors with mean 0 and variance σ 2 . 3/14 Ordinary least squares in nonlinear model Similar to the linear regression, the least squares estimator of β can be obtained by minimizing the following objective function gn (β) = n X {Yi − f (Xi ; β)}2 . i=1 That is β̂ = arg min gn (β). β 4/14 OLS estimator I Typically, we do not have explicit formulas for the least squares estimation of β. We need to use numeric algorithm to find the OLS estimator in nonlinear regression. I A necessary condition for β̂ to be a minimizer of gn (β) is that 0 gnj (β) = ∂gn (β) | =0 ∂βj β̂ for j = 1, · · · , p. 5/14 Estimating equations I The LS estimator of β must be the solution of the following estimating equations DβT {Y − f (X ; β)} = 0, i ,β) where Y = (Y1 , · · · , Yn )T , Dβ = ( ∂f (X ∂βj )ij is an n × p matrix and f (X , β) = (f (X1 , β), · · · , f (Xn , β))T is an n × 1 vector. I In particular, if f (Xi ; β) = XiT β (a linear model), Dβ = X . Then the above estimating equations become X T (Y − X β) = 0, which are the normal equations. 6/14 Gauss-Newton algorithm Let β̂ (r ) be the solution of the estimating equation at the r -th iteration (r = 0, 1, 2, · · · ) and define the corresponding Dβ as (r ) i ,β) Dβ = ( ∂f (X ∂βj |β̂ (r ) )ij . The first order Taylor expansion of f (X ; β) is (r ) f (X ; β) ≈ f (X ; β̂ (r ) ) + Dβ (β − β̂ (r ) ). Then the nonlinear regression model may be represented as (r ) Y = f (X ; β̂ (r ) ) + Dβ (β − β̂ (r ) ) + ε. This implies that (r )T (r ) (r )T β\ − β̂ (r ) = (Dβ Dβ )−1 Dβ {Y − f (X , β̂ (r ) )}. 7/14 Gauss-Newton algorithm Step 1: Choose an initial value for β̂ (0) . Step 2: Update the value of β̂ (r ) by (r )T β̂ (r +1) = β̂ (r ) + (Dβ (r ) (r )T Dβ )−1 Dβ {Y − f (X , β̂ (r ) )}. Step 3: Repeat step 2 until kβ̂ (r ) − β̂ (r +1) k < δ for a small value of δ. 8/14 Example 1 Consider the following nonlinear regression model: Yi = exp(β1 + β2 Xi2 ) + εi . The least squares estimator of β = (β1 , β2 )T is the solution of the following estimating equations: DβT {Y − exp(β1 + β2 X 2 )} = 0, where Dβ = (exp(β1 + β2 X 2 ), X 2 exp(β1 + β2 X 2 )) is n × 2 matrix. 9/14 Example 2 Consider the following nonlinear regression model: Yi = sin{2π(β1 + β2 Xi2 )} + εi . 10/14 Example 2 Consider the following nonlinear regression model: Yi = sin{2π(β1 + β2 Xi2 )} + εi . Note that β1 is not identifiable without any constraints! 10/14 Example 2 Consider the following nonlinear regression model: Yi = sin{2π(β1 + β2 Xi2 )} + εi . Note that β1 is not identifiable without any constraints! We have to put some range restriction on β1 . 10/14 Remarks I The nonlinear regression should be parametrized such that all the parameters are identifiable. I If the objective function is non convex, finding the global minimization is a challenging problem. 11/14 Large sample inference based on likelihood Under some regularity conditions, we have the following I Asmptotic normality of β̂: d {σ 2 (DβT Dβ )−1 }−1/2 (β̂ − β) → Np (0, Ip ). I The MSE of the non-linear regression is defined as n 1 X {Yi − f (Xi ; β̂)}2 . MSE = n−p i=1 p It can be shown that MSE → σ 2 . I Note that (DβT Dβ )−1 can be estimated by (Dβ̂T Dβ̂ )−1 . 12/14 Large sample inference based on likelihood Under some regularity conditions, we have the following I For any smooth (differentiable) function h(b) : R p → R q , h(β̂) ∼ Nq (h(β), σ 2 G(DβT Dβ )−1 GT ) where G = (∂hi (b)/∂bj |b=β )ij is a q × p matrix which could be estimated as Ĝ = (∂hi (b)/∂bj |b=β̂ )ij . 13/14 Example 1 continued Consider the following nonlinear regression model: Yi = exp(β1 + β2 Xi2 ) + εi . The least squares estimator β̂ is asymptotic normal β̂ − β ∼ N(0, σ 2 (DβT Dβ )−1 ) where Dβ = (exp(β1 + β2 X 2 ), X 2 exp(β1 + β2 X 2 )) is n × 2 matrix. 14/14