Nonlinear Models 1/13 Inference based on likelihood I Recall the following nonlinear regression model, Yi = f (Xi ; β) + εi for i = 1, · · · , n, where εi ’s are random errors with mean 0 and variance σ 2 , f (x; β) is a known function of x up to a p-dim unknown parameter vector β. I Based on the Wald type inference, we have β̂ − β ∼ N(0, σ 2 (DβT Dβ )−1 ), where Dβ = (∂f (Xi ; β)/∂βj )ij is an n × p matrix. 2/13 Inference for a single coefficient I Based on the asymptotic normality, we have β̂j − βj √ ∼ N(0, 1), σ ηj where ηj is the j-th diagonal entry of (DβT Dβ )−1 . I Using Slustsky’s theorem, we have β̂j − βj p ∼ N(0, 1), MSE η̂j 1 Pn 2 where MSE = n−p i=1 {Yi − f (Xi ; β)} and η̂j is the j-th √ diagonal entry of (Dβ̂T Dβ̂ )−1 . I A 1 − α confidence interval for βj is q √ β̂j ± zα/2 MSE η̂j . 3/13 Inference for a univariate function of β Assume h(β) is a function of β, h : R p → R. Consider the inference for h(β). For example, h(β) = f (x; β) for a given x. According the delta method, we have √ h(β̂) − h(β) q ∼ N(0, 1), MSE Ĝ(D T Dβ̂ )−1 ĜT β̂ where Ĝ = (∂h(β)/∂βj |β̂ ) is a 1 × p matrix. A 1 − α confidence interval for h(β) is h(β̂) ± zα/2 √ r MSE Ĝ(D T Dβ̂ )−1 ĜT . β̂ 4/13 Example 1 continued Assume h(β) = f (x0 , β) with x0 = 0.7 is of interest. The least squares estimate of h(β) is f (x0 , β̂). A 1 − α confidence interval for h(β) is √ f (x0 , β̂) ± zα/2 MSE r Ĝ(D T Dβ̂ )−1 ĜT , β̂ where Ĝ = (exp(β̂1 + β̂2 x02 ), x02 exp(β̂1 + β̂2 x02 )) and Dβ̂ is the same as that defined in Example 1. 5/13 Prediction for a new observation I Suppose that we would like to predict a new observation Y ∗ , which is normally distributed with mean h(β) and variance γσ 2 . Assume that Y ∗ is independent of Y (the observed data). I An approximate prediction interval for Y ∗ is r √ h(β̂) ± zα/2 MSE γ + Ĝ(D T Dβ̂ )−1 ĜT . β̂ 6/13 Example 1 continued Assume we want to predict and construct prediction interval for Y ∗ = f (X1 , β) + ∗1 where ∗1 ∼ N(0, σ 2 ). The prediction of Y ∗ is f (X1 , β̂). A 1 − α prediction interval for Y ∗ is r √ f (X1 , β̂) ± zα/2 MSE 1 + Ĝ(D T Dβ̂ )−1 ĜT , β̂ where Ĝ = (exp(β̂1 + β̂2 X12 ), X12 exp(β̂1 + β̂2 X12 )) and Dβ̂ is the same as that defined in Example 1. 7/13 Likelihood ratio based inference I Under the Gauss-Markov model, the likelihood function for (β, σ 2 ) in the non-linear model is 2 L(β; σ ) = (2π) n/2 2 −n/2 (σ ) n 1 X {Yi − f (Xi ; β)}2 . exp − 2 2σ i=1 I The log-likelihood function of (β, σ 2 ) is n n n 1 X `(β, σ 2 ) = − log(2π) − log(σ 2 ) − 2 {Yi − f (Xi ; β)}2 . 2 2 2σ i=1 8/13 Inference for σ 2 I Consider the hypothesis testing problem: H0 : σ 2 = σ02 vs H1 : σ 2 6= σ02 . The log-likelihood under H0 is n SSE n . `(β̂, σ02 ) = − log(2π) − log(σ02 ) − 2 2 2σ02 Under H1 , the log-likelihood is n n n `(β̂, σ̂ 2 ) = − log(2π) − log(SSE/n) − , 2 2 2 P where SSE = ni=1 {Yi − f (Xi ; β̂)}2 . 9/13 Inference for σ 2 The confidence interval for σ 2 is σ 2 : −2{`(β̂, σ 2 ) − `(β̂, σ̂ 2 )} < χ21,α . More specifically, the confidence interval for σ 2 is σ 2 : − log(σ 2 ) − SSE 1 2 SSE > − log( ) − 1 − χ . n n 1,α nσ 2 10/13 Inference for β Consider the following hypothesis testing problem H0 : β = β0 . The log-likelihood under H0 is n n n `{β0 , σ̂ 2 (β0 )} = − log(2π) − log{σ̂ 2 (β0 )} − . 2 2 2 Recall that, under the alternative, the log-likelihood is n n n `(β̂, σ̂ 2 ) = − log(2π) − log(SSE/n) − . 2 2 2 11/13 Inference for β A 1 − α confidence region for β is β : −2[`{β, σ̂ 2 (β)} − `{β̂, σ̂ 2 }] < χ2p;α 1 = β : log{σ̂ 2 } < log(SSE/n) + χ2p;α n n X 1 {Yi − f (Xi ; β)}2 < SSE exp χ2p;α . = β: n i=1 Recall that, in linear model, the exact confidence region for β is n X β: {Yi − f (Xi ; β)}2 < SSE 1 + i=1 p Fp,n−p;α . n−p 12/13 Inference for a single βj Let β−j be a sub-vector of β with j-th component deleted and P β̂−j (βj ) be the minimizer of ni=1 (Yi − f (Xi ; βj , β−j ))2 for a fixed βj as a objective function of β−j . Similar to the last example, an approximate confidence interval for βj is {βj : n X 1 {Yi − f (Xi ; βj , β̂−j (βj ))}2 < SSE exp χ21 }. n i=1 13/13