Document 10639589

advertisement

A Review of Some Key

Linear Models Results

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 1 / 24

A General Linear Model (GLM)

Suppose y = X β + , where y ∈

R n is the response vector,

X is an n × p matrix of known constants,

β ∈

R p is an unknown parameter vector, and is a vector of random “errors” satisfying E ( ) = 0 .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 2 / 24

This GLM says simply that y is a random vector satisfying

E ( y ) = X β for some β ∈

R p .

The distribution of y is left unspecified.

We know only that y is random and its mean is in the column space of X ; i.e., E ( y ) ∈ C ( X ) .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 3 / 24

Ordinary Least Squares (OLS) Estimation of E ( y )

Because we know E ( y ) ∈ C ( X ) , a natural estimator of E ( y ) is the

Ordinary Least Squares Estimator (OLSE), which is the unique point in C ( X ) that is closest to y in terms of Euclidean distance.

The OLSE of E ( y ) is given by ˆ ≡ P

X y , where

P

X

= X ( X

0

X )

X

0

, because P

X y ∈ C ( X ) and

|| y − P

X y || 2

< || y − z || 2 ∀ z ∈ C ( X ) \ { P

X y } .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 4 / 24

The Orthogonal Projection Matrix

P

X

= X ( X

0

X )

X

0 is known as the orthogonal projection matrix

(a.k.a. the perpendicular projection operator) onto the column space of X and has the following properties:

P

X is symmetric (i.e., P

X

= P

0

X

).

P

X is idempotent (i.e., P

X

P

X

= P

X

).

P

X

X = X and X

0

P

X

= X

0

.

rank ( X ) = rank ( P

X

) = tr ( P

X

) .

P

X

= X ( X

0

X )

X

0 is the same matrix for all generalized inverses ( X

0

X )

− of X

0

X .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 5 / 24

The OLSE of Estimable Functions of β

For any q × n matrix A , the OLSE of A E ( y ) = AX β is

AP

X y = AX ( X

0

X )

X

0 y .

If there exists a matrix A such that C = AX , then C β = AX β is estimable , and the OLSE of C β = AX β is denoted C

ˆ

, where is any solution for b in the Normal Equations

X

0

Xb = X

0 y .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 6 / 24

Solutions to the Normal Equations

The Normal Equations

X

0

Xb = X

0 y have ( X

0

X )

− 1 X

0 y as the unique solution for b if rank ( X ) = p .

The Normal Equations have infinitely many solutions for b if rank ( X ) < p .

ˆ

= ( X

0

X )

X

0 y is always a solution to the Normal Equations for any ( X

0

X )

, a generalized inverse of X

0

X .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 7 / 24

Uniqueness of the OLSE of an Estimable C β

If C β is estimable, then C

ˆ is the same for all solutions

ˆ to the

Normal Equations.

In particular, the unique OLSE of C β is

C

ˆ

= C ( X

0

X )

X

0 y = AX ( X

0

X )

X

0 y = AP

X y , where C = AX .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 8 / 24

The Guass-Markov Model (GMM)

Suppose y = X β + , where y ∈

R n is the response vector,

X is an n × p matrix of known constants,

β ∈

R p is an unknown parameter vector, and is a vector of random “errors” satisfying E ( ) = 0 and

Var ( ) = σ 2 I for some unknown variance parameter σ 2 ∈

R

+ .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 9 / 24

The Guass-Markov Theorem

The GMM is a special case of the GLM presented previously.

We have added the assumption Var ( ) = σ 2 I ; i.e., we assume the errors are uncorrelated and have constant variance.

All the results presented for the GLM hold for the GMM.

For the GMM we have an additional result provided by the

Gauss-Markov Theorem : The OLSE of an estimable function C β is the Best Linear Unbiased Estimator (BLUE) of C β in the sense that the OLSE C

ˆ has the smallest variance among all linear unbiased estimators of C β .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 10 / 24

Unbiased Estimation of σ 2

An unbiased estimator of σ 2 under the GMM is given by

2 ≡ y

0

( I − P

X

) y

, where r = rank ( X ) .

n − r

Note that y

0

( I − P

X

) y = y

0

( I − P

X

)

0

( I − P

X

) y

= { ( I − P

X

) y }

0

{ ( I − P

X

) y } = || ( I − P

X

) y || 2

= || y − ˆ || 2

= “Sum of Squared Errors” (SSE) .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 11 / 24

Gauss-Markov Model with Normal Errors (GMMNE)

Suppose y = X β + , where y ∈

R n is the response vector,

X is an n × p matrix of known constants,

β ∈

R p is an unknown parameter vector, and

∼ N ( 0 , σ 2

I ) for some unknown variance parameter

σ

2 ∈

R

+ .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 12 / 24

The GMMNE is a special case of the GMM.

We have added the assumption is multivariate normal.

The GMMNE implies y ∼ N ( X β , σ 2 I ) .

The GMMNE is useful for drawing statistical inferences regarding estimable C β .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 13 / 24

Throughout the remainder of these slides we will assume the GMMNE model holds,

C is a q × p matrix such that C β is estimable, rank ( C ) = q , and d is a known q × 1 vector.

These assumptions imply H

0

: C β = d is a testable hypothesis .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 14 / 24

The Distribution of C

ˆ and ˆ 2

C

ˆ ∼ N ( C β , σ 2

C ( X

0

X )

C

0

) .

( n − r )ˆ 2

σ 2

∼ χ 2 n − r

⇐⇒ σ

2

σ 2

χ 2 n − r n − r

⇐⇒ ˆ 2 ∼ σ

2 n − r

χ 2 n − r

.

C

ˆ and ˆ

2 are independent.

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 15 / 24

The F Test of H

0

: C β = d

The test statistic

F ≡ ( C

ˆ

− d )

0

[ c

( C

ˆ

)]

− 1

( C

ˆ

− d ) / q

= ( C

ˆ

− d )

0

=

2

C ( X

0

X )

C

0

]

− 1

( C

ˆ

− d ) / q

( C

ˆ − d )

0

[ C ( X

0

X )

C

0

]

− 1 ( C

ˆ − d ) / q

ˆ 2 has a non-central F distribution with non-centrality parameter

( C β − d )

0

[ C ( X

0

X )

C

0

]

− 1

( C β − d )

2 σ 2 and degrees of freedom q and n − r .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 16 / 24

The F Test of H

0

: C β = d (continued)

The non-negative non-centrality parameter

( C β − d )

0

[ C ( X

0

X )

C

0

]

− 1

( C β − d )

2 σ 2 is equal to zero if and only if H

0

: C β = d is true.

If H

0

: C β = d is true, the statistic F has a central F distribution with q and n − r degrees of freedom ( F q , n − r

).

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 17 / 24

The F Test of H

0

: C β = d (continued)

Thus, to test H

0

: C β = d , we compute the test statistic F and compare the observed value of F to the F q , n − r distribution.

If F is so large that it seems unlikely to have been a draw from the F q , n − r distribution, we reject H

0 and conclude C β = d .

The p -value of the test is the probability that a random variable with distribution F q , n − r matches or exceeds the observed value of the test statistic F .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 18 / 24

The t Test of H

0

: c

0

β = d for Estimable c

0

β

The test statistic t ≡

= c

0 ˆ

− d q c

( c 0

ˆ

) c

0 ˆ

− d p

ˆ 2 c 0 ( X

0

X ) − c has a non-central t distribution with non-centrality parameter c

0

β − d p

σ 2 c 0 ( X

0

X ) − c and degrees of freedom n − r .

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 19 / 24

The t Test (continued)

The non-centrality parameter c

0

β − d p

σ 2 c 0 ( X

0

X ) − c is equal to zero if and only if H

0

: c

0

β = d is true.

If H

0

: c

0

β = d is true, the statistic t has a central t distribution with n − r degrees of freedom ( t n − r

).

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 20 / 24

The t Test (continued)

Thus, to test H

0

: c

0

β = d , we compute the test statistic t and compare the observed value of t to the t n − r distribution.

If t is so far from zero that it seems unlikely to have been a draw from the t n − r distribution, we reject H

0 and conclude c

0

β = d .

The p -value of the test is the probability that a random variable with distribution t n − r would be as far or farther from 0 than the observed value of the t test statistic.

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 21 / 24

A 100 ( 1 − α ) % Confidence Interval for Estimable c

0

β c

0 ˆ

− t n − r , 1 − α/ 2 q

ˆ 2 c 0 ( X

0

X ) − c , c

0 ˆ

+ t n − r , 1 − α/ 2 q

ˆ 2 c 0 ( X

0

X ) − c c

0 ˆ

± t n − r , 1 − α/ 2 q

ˆ 2 c 0 ( X

0

X ) − c estimate ± (distribution quantile) × (standard error of estimate)

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 22 / 24

Form of the t Statistic for Testing H

0

: c

0

β = d t = estimate − d standard error of estimate

= estimate − d q c

(estimator) t

2

=

( estimate − d ) 2 c

(estimator)

= ( estimate − d ) h c

(estimator) i

− 1

( estimate − d ) / 1

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 23 / 24

Revisiting the F Statistic for Testing H

0

: C β = d

F = ( estimate − d )

0 h c

( estimator ) i

− 1

( estimate − d ) / q

= ( C

ˆ

− d )

0

[ c

( C

ˆ

)]

− 1

( C

ˆ

− d ) / q

= ( C

ˆ − d )

0 2

C ( X

0

X )

C

0

]

− 1

( C

ˆ − d ) / q

=

( C

ˆ

− d )

0

[ C ( X

0

X )

C

0

]

− 1 ( C

ˆ

− d ) / q

ˆ 2

Copyright c 2016 Dan Nettleton (Iowa State University) Statistics 510 24 / 24

Download