Q - LCE/ESALQ/USP

advertisement
MATH 2016 (13177)
Statistical Modelling
Course coordinator: Chris Brien
Course is about designing experiments
and using linear models to analyze data,
both from experiments and surveys.
Statistical Modelling Chapter I
1
I.
Statistical inference
I.A Expected values and variances
I.B The linear regression model
I.C Model selection
a) Obtaining parameter estimates
b) Regression analysis of variance
I.D Summary
Statistical Modelling Chapter I
4
I.A Expected values and variances
• Statistical inference is about drawing conclusions
about one or more populations based on samples
from the populations.
• Compute statistics or estimates from samples.
• They are used as estimates of particular population
quantities, these being called parameters.
• Important to be clear about distinction —when one
is talking about a mean, is it the population or
sample mean?
• To aid in making the distinction, convention is to
use Greek letters as symbols for parameters and
ordinary Roman letters as symbols for statistics.
• Fundamental in this course are population
expected value and variance.
Statistical Modelling Chapter I
5
Expected value
• Expected value  mean of the variable Y in a
population — it is a population parameter.
Definition I.1:
The expected value of a
continuous random variable Y whose population
distribution is described by f(y) is given by

 Y  E Y    yf  y  dy

• That is, Y  E[Y] is the mean in a population
whose distribution is described by f(y).
Statistical Modelling Chapter I
6
Properties of expected values
Theorem I.1: Let Y be a continuous random
variable with probability distribution function f(y).
The expected value of a function u(y) of the
random variable is
E u Y   

 u  y  f  y  dy

Proof: not given
• Note that any function of a random variable is
itself a random variable.
• Use above theorem in next
• Theorem I.2: E a  v Y   b   aE v Y    b
Statistical Modelling Chapter I
7
Proof of Theorem I.2
• For a continuous random variable, we have from
theorem I.1
E a  v Y   b  


 a  v  y   b f  y  dy



 a  v  y  f  y  dy  





 a  v  y  f  y  dy  b
bf  y  dy

 f  y  dy
 aE v Y    b
• In particular, E aY  b  aE Y   b
Statistical Modelling Chapter I
8
Variance
Definition I.2: The variance of any random
variable Y is defined to be
var Y 
 Y2
2
2


 E Y   Y   E Y  E Y 




• That is the variance is the mean in the population
of the squares of the deviations of the observed
values from the population mean.
• It measures how far on average observations are
from the mean in the population.
• It is also a population parameter.
Statistical Modelling Chapter I
9
Variance (cont’d)
Theorem I.3: The variance of a continuous
random variable Y whose population distribution
is described by f(y) is given by
var Y 
 Y2


  y   Y  f  y  dy
2

Proof: This is a straight forward application of
theorem I.1 where
2
u Y   Y  Y 
Statistical Modelling Chapter I
10
Normal distribution parameters
and estimators
• Common in this course
• The distribution function for such a variable
involves the parameters Y and Y as follows:
2

1
   y Y  
f y  
exp 

2
2
2Y
(2Y )


• So we want to estimate Y and we have a
sample y1, y2,…, yn.
• Note the lower case y for observed values as
opposed to Y for the random variable.
• The obvious estimator of  (drop subscript) is the
sample mean
Y  n Y n
Statistical Modelling Chapter I
 i 1
i
11
Estimators
• Note we call the formula that tells us how to
estimate a parameter an estimator and it is a
function of random variables, Ys.
• The value obtained by substituting the sample
values into the formula is called the estimate
and it is a function of observed values, ys.
• It is common practice to denote the estimator as
the parameter with a caret over it.
– ˆ  Y means that the estimator of  is Y
– ˆ also stands for the estimate so that ˆ  y means
that the estimate of  is y
Statistical Modelling Chapter I
12
I.B The linear regression model
• We consider models of the general form:
Y  0  1x1  2x2    p xp  
• where
• Y is a continuous random variable and
• xis are quantitative variables that are called the
explanatory variables.
• This model is a linear model in the is.
Y   0  1x1   2 x12  
Y   0  1x1   2 x2  12 x1x2  
All but the
Y   0  1 ln  x1   
last two
0 x1
Y e

are linear
Y  1 1  e0 1x1
in the is.
Statistical Modelling Chapter I


13
The model for n observations
• Would conduct a study in which n ( p+1) observations are
taken of Y and the xis.
• Leads to the following system of equations that model the
observed responses.
Y1   0  1x11   2 x12    p x1p  1
Y2   0  1x21   2 x22 
Yi   0  1xi1   2 xi 2 
Yn   0  1xn1   2 xn 2 
  p x2 p   2
  p xip   i
  p xnp   n
• What does the model tell us about our data?
 Have a response variable, Y, whose values are related to several
explanatory variables, xis. (lower case x as not random variables)
 is, random errors, account for differences in values of response
variable for same combination of values of the explanatory
variables
Statistical Modelling Chapter I
14
Usual extra assumptions about is
E  i   0,
var  i    2 and cov  i , j   0, i  j
• These mean:
– on average the errors cancel out so that we get the
population value of the response,
– the variability of the errors is independent of the
values of any of the variables
– the error in one observation is unrelated to that of any
other observation.
• The last assumption involves a third quantity
involving expectations: covariance.
Statistical Modelling Chapter I
15
Covariance
Definition I.3: The covariance of two random
variables, X and Y, is defined to be
cov  X ,Y   E  X  E  X  Y  E Y  
• The covariance measures the extent to which the
two random variables values move together.
• In fact, the linear correlation coefficient can be
calculated from it as follows:
corr  X ,Y  
cov  X ,Y 
var  X  var Y 
• That is, the correlation coefficient is just the
covariance adjusted or standardized for the
variance of X and Y.
Statistical Modelling Chapter I
16
Matrix notation for the system of
equations
• Matrices in bolded upper case letters and
• Vectors in bolded lower case except vectors of random
variables will be in upper case.
• Thus in matrix terms let
x1p 
1 x11 x12
Y1 
 1 
1 x21 x22
x2 p 
0 
Y2 
 2 


 1 
 
 
Y   , X  
,
θ

,
ε

 
i 
Yi
1 xi1 x12
xip 


 p 
 
 


Yn 
 n 
xnp 
1 xn1 xn 2
• The system of equations can be written
Y  Xθ  ε
with E ε   0 and var ε   V   2In
ε
where In is the n  n identity matrix.
Statistical Modelling Chapter I
17
Expectation and variance of a
random vector
Definition I.4: Let Y be a vector of n jointlydistributed random variables with
E Yi    i ,
var Yi    i2 and cov Yi ,Y j    ij    ji  .
• Then, the random vector is
Y1 
Y  Y2 
 
Yn 
• The expectation vector, , giving the
expectation of Y is
 E Y1   1 
 E Y2   2 
E Y  
 ψ
E Y   
 n
n 

Statistical Modelling Chapter I
18
Expectation and variance of a
random vector (cont’d)
• The variance matrix, V, giving the variance of Y
is


V  E  Y  E  Y   Y  E  Y 


  12  12
2


2
 12

  1i  2i

 1n  2n
 1i
 2i
 i2
 in
 1n 
 2n 


 in 

2
 n 
• Note transpose in last expression
Statistical Modelling Chapter I
19
Lemma I.1: The transpose of a
matrix (selected properties)
2.
4.
9.
The transpose of a column vector is a row vector and
vice versa so that we always write the column vector as
untransposed and the row vector as transposed — a is a
column vector and a is the corresponding row vector.
The transpose of a product is the product of the
transposes, but with the order of the matrices reversed
—  AB  BA
A column vector premultiplied by its transpose is the
sum of squares of its elements, also a scalar —
n
aa  i 1ai2.
10. A column vector of order n post multiplied by its
transpose is a symmetric matrix of order n  n — from
property 7 we have  aa   aa.
•
In particular, property 10 applies to V in definition I.4 and
tells us that V is an n  n symmetric matrix.
Statistical Modelling Chapter I
22
Model for expectation & variance
• Have model for Y with conditions on .
• Find expressions for elements of E[Y] and var[Y].
• Thus, E Yi   E 0  1xi1   2 xi 2    p xip   i 
  0  1xi1   2 xi 2 
  p xip  E  i 
  0  1xi1   2 xi 2 
  p xip
2
var Yi   E Yi  E Yi  


 0  1xi1   2 xi 2    p xip   i 2 

 E 

   0  1xi1   2 xi 2    p xip  


 E  i2 

2
  2 since var  i   E   i  E  i    E  i2 


Statistical Modelling Chapter I

23
Model in terms of expectation
and variance (cont’d)


cov Yi ,Y j   E Yi  E Yi  Y j  E Y j  
 E  i  j 
 cov  i ,  j 
0
• In matrix terms, the alternative expression for the
model is:
E  Y   Xθ and var  Y   VY   2In
• That is, V is also the variance matrix for Y.
Statistical Modelling Chapter I
24
Example I.1 House price
• Suppose it is thought that the price obtained for a
house depends primarily the age and livable
area.
• Observe 5 randomly selected houses on the
market:
Price
$’000
y 
Age
years
 x1 
Area
’000 feet2
 x2 
50
40
52
47
65
1
5
5
10
20
1
1
2
2
3
• In this example, n = 5 and p = 2.
Statistical Modelling Chapter I
25
Model proposed for data
Yi  0  1xi1  2 xi 2   i
with E  i   0,
var  i    2 and cov  i , j   0, i  j
• or, equivalently,
E Yi   0  1xi1  2 xi 2
with var Yi    2 and cov Yi ,Y j   0, i  j
• In matrix terms, the model is:
Y  Xθ  ε with E ε   0 and var ε   V   2In,
• or, equivalently,
E  Y   Xθ and var  Y   V   2In
Statistical Modelling Chapter I
26
Model matrices for example
Y  Xθ  ε with E ε   0 and var ε   V   2In,
Y1 
 1 
1 1 1
Y2 
 2 
0 
1 5 1
Y  Y3  , X  1 5 2 , θ  1  , ε   3 
Y4 
 4 
2 
1 10 2
Y 
 
1 20 3 
5
5
 2 0
0
0
0
 0 2 0
0
0

2
V0
0 
0
0

2
0
0
0

0

 0
0
0
0  2 
50
• We also have the vector, y, of observed
values of Y:
Statistical Modelling Chapter I
 
 40 
y   52 
 47 
 65 
27
Example I.2 Voter turnout
• In this example a political scientist attempted to
investigate the relationship between campaign
expenditures on televised advertisements and
subsequent voter turnout.
• Aim to predict voter turnout from advertising
expenditure.
Voter
Turnout
35.4
58.2
46.1
45.5
64.8
52.0
37.9
48.2
41.8
54.0
Statistical Modelling Chapter I
% Advert
Expenditure
28.5
48.3
40.2
34.8
50.1
44.0
27.2
37.8
27.2
46.1
Voter
Turnout
40.8
61.9
36.5
32.7
53.8
24.6
31.2
42.6
49.6
56.6
% Advert
Expenditure
31.3
50.1
31.3
24.8
42.2
23.0
30.1
36.5
40.2
46.1
28
Proposed model
• Simple linear regression as only 1 explanatory
variable.
• Drop subscript for the independent variable:
E Yi   0  1xi
with var Yi    2 and cov Yi ,Y j   0, i  j
• How should data behave for this model?
Y
• E[Yi]
specifies
population mean.
• var[Yi]
specifies
variability
around
population mean.
• cov[Yi, Yj] specifies
X
relationship
29
Statistical Modelling Chapter I

 0  1 x 2

 0  1 x 1
x
1
x
2
Scatter diagram for Turnout
versus Expend
65
60
55
V
o
t 50
e
r
45
t
u 40
r
n
o 35
u
t
30
25
20
20
25
30
35
40
45
50
55
Advertising expenditure (%)
• Does it look like the model will describe this
situation?
Statistical Modelling Chapter I
30
I.C Model selection
• Generally, we want to determine the model
that best describes the data.
• To do this we usually obtain estimates of
our parameters under several alternative
models and use these in deciding which
model to use to describe the data.
• The choice of models is often made using
an analysis of variance (ANOVA).
Statistical Modelling Chapter I
31
a) Obtaining parameter estimates
• Estimators of the parameters  in the expectation
model are obtained using the least squares or
maximum likelihood criteria — they are
equivalent in the context of linear models.
• Also, an estimator of 2 is obtained from the
ANOVA described in the next section.
• Here will establish the least squares estimators
of .
Statistical Modelling Chapter I
32
Least squares estimators
• Definition I.5: Let Y  X +  where
– X is an nq matrix with nq,
–  is a q1 vector of unknown parameters,
–  is an n1 vector of errors with mean 0 and variance
2In, q  p + 1 and nq.
The least ordinary least squares (OLS)
estimator of  is the value of  that minimizes
εε   ni 1 i2
• Note that
–  is of the form described in property 9 of lemma I.1
– and is a scalar that is the sum of squares of the
elements of  or the sum of squares of the "errors".
Statistical Modelling Chapter I
33
Least squares estimators of 
• Theorem I.4: Let Y  X +  where
– Y is an n1 vector of random variables for the
observations,
– X is an nq matrix of full rank with nq,
–  is a q1 vector of unknown parameters,
–  is an n1 vector of errors with mean 0 and variance
2In, q  p + 1 and nq.
The ordinary least squares estimator of  is given
by
1
θˆ   XX  XY
• (The ‘^’ denotes estimator)
• Proof: see notes
Statistical Modelling Chapter I
34
Least squares estimates of 
• For a particular example, we will have an
observed vector y — substitute this into the
estimator to yield the estimate for that
example.
1
θˆ   XX  Xy
• Note the dual use of θˆ to denote the
estimator and the estimate.
Statistical Modelling Chapter I
35
What does full rank mean?
• Definition I.6: The rank of an nq matrix A with
nq is the number of linearly independent
columns of the matrix.
The matrix is said to be of full rank, or rank q, if,
none of the columns in the matrix can be written
as a linear combination of the other columns.
• Example I.1 House price (continued)
For this example the X matrix is
1 1 1
1 5 1
X  1 5 2 
1 10 2 
1 20 3 
It is rank 3 and is full rank as no column can be
written as a linear combination of the other two.
Statistical Modelling Chapter I
36
Another example
• On the other hand the following two matrices are
of rank 2 as the second columns are 5(3) and
5(3) – 9(1), respectively:
1 5 1
1 5 1
X  1 10 2 
1 10 2 
1 15 3 
Statistical Modelling Chapter I
1 4
1 4
X  1 1
1 1
1 6
1
1
2
2
3 
37
Fitted values and residuals
• Definition I.7: The estimator of the expected
values for the expectation model E[Y]    X is
given by
ˆ  Xθˆ
ψ
The estimates of Y for a particular observed y are
called the fitted values. They are computed by
substituting the values of the estimates of  and
the explanatory variables into the fitted equation.
• Definition I.8: The estimator of the errors for the
expectation model E[Y]    X is given by
ˆ
εˆ  Y  Xθˆ  Y  ψ
and so the estimates, the residuals, are
computed by subtracting the fitted values from
the observed values of the response variable.
Statistical Modelling Chapter I
38
Recap thus far
• Often want to decide between two models
– Fit models using least squares
– Want to use ANOVA to select between alternatives
• For the model Y  X +  or E[Y]    X and V  2I,
the ordinary least squares estimator of  is given by
ˆθ   XX 1 XY
• The estimator of the expected values is given by
ˆ  Xθˆ  QMY where QM  ?
ψ
• and of the errors is given by
ˆ  QRY where QR  ?
εˆ  Y  Xθˆ  Y  ψ
• Least squares can be viewed as the orthogonal projection
of the data vector, in the n-dimensional data space, into
both the model and residual subspaces using the Qs.
Statistical Modelling Chapter I
39
Error estimator as a linear
combination
• Given the expression for estimator of the
expected values, the estimator of the errors are
given by
εˆ  Y  Xθˆ
 Y  Q MY
  In  QM  Y
 QR Y
• Hence the fitted values and residuals are given
by
ˆ Q y
ψ
M
ˆ
εˆ  y  ψ
Statistical Modelling Chapter I
 QR y
40
Projection operators — QM
1
ˆ  Xθˆ  QMY where QM  X  XX  X
• Seen that ψ
• QM is a nn projection matrix with the property that
it is symmetric and idempotent.
• Definition I.9: A matrix E is idempotent if E2  E.
• Given that X is an nq matrix,
– then QM = X(XX)-1X
– is the product of nq, qq and qn matrices
– with the result that it is an nn matrix.
• Clearly the product of the nn matrix QM and the
n1 vector Y is an n1 vector.
• So the estimator of the expected values is a linear
combination of the elements of Y.
Statistical Modelling Chapter I
41
Projection operators — QR
• Theorem I.5: Given that the matrix E is symmetric
and idempotent,
then R  I  E is also symmetric and idempotent.
In addition, RE  ER  0.
• Application of this theorem to the regression
situation leads us to conclude that
– QR is symmetric and idempotent
– with QRQM  QMQR  0.
• All of this can be viewed as the orthogonal
projection of vectors onto subspaces.
Statistical Modelling Chapter I
42
Geometry of least squares
• The observation vector y is viewed as a vector
in n-space and this space is called the data
space.
• Then the X matrix, with q linearly independent
columns, determines a q-dimensional subspace
of the data space — this space is called the
model (sub)space
y
Residual space
Model space
Statistical Modelling Chapter I
43
Geometry of least squares
(cont’d)
y
Residual space
Q R y  εˆ
Model space
ˆ
Q My  ψ
• Fitted values are orthogonal projection
observation vector into the model space.
of
– The orthogonal projection is achieved using the
idempotent, or projection matrix, QM.
• Residuals are projection of observation vector into
the residual subspace, the subspace of the data
space orthogonal to the model space.
– Matrix that projects onto the residual subspace is QR.
• That QRQM  QMQR  0 reflects that the two
subspaces are orthogonal.
Statistical Modelling Chapter I
44
Projectors properties
y
Residual space
Q R y  εˆ
Model space
ˆ
Q My  ψ
• obvious whyQM2  QM ?
– Once you have projected y into the model subspace
and obtained QMy, it is in the model subspace.
– Applying QM to the fitted values, that is to QMy, will
have no effect because they are already in the model
subspace;
2
– clearly, QM
y  QM  QMy   QMy.
• A similar argument applies to QR.
• Also, it should be clear why QRQM  0.
Statistical Modelling Chapter I
45
Example I.3 Single sample
• Suppose that a single sample of 3 observations
has been obtained.
• The linear model we propose for this data is that
1
E  Y   XG  1   13  and var  Y    2In
1
• or, for an individual observation,
Yi     i with var Yi    2 and cov Yi ,Y j   0, i  j
• That is, the value for an observation is made up of
– the population mean
– plus a particular deviation from the population
mean for that observation.
Statistical Modelling Chapter I
46
Projection matrix
• In this case QM, a 3  3 matrix, is rather simple as
1
QM  X G  XGX G  XG
1
 13 1313  13
1
 13  3  13
 31 1313
• and
Statistical Modelling
1 1 1
 31 1 1 1
1 1 1
 31 J3
Y 
1 1 1
ˆ  QMY  31 1 1 1 Y  Y 
ψ
Y 
1 1 1
Chapter I
47
Grand mean operator
• That is, in this case, QM is the matrix that
replaces each observation with the grand mean
of all the observations. Will call it QG.
• Throughout this course the vector of grand
means will be denoted as G .
Hence, G  13Y and g  13 y.
• Note that
– estimator of  in our model is the mean of the
elements of Y, Y
– estimate is the mean of the observations, y
Statistical Modelling Chapter I
48
A simple 3-D example
• Suppose that y   2,1,2.
• Then y  5 3  1.67
• and fitting the model E[Y]  1n results in
– fitted values ψ
ˆ   1.67,1.67,1.67
– residuals εˆ   0.33, 0.66,0.33 
1st
data
point
plotted on axis
coming out of
figure,
2nd on axis going
across
3rd on axis going
up.
Statistical Modelling Chapter I
residual vector
fitted vector in
model subspace
residual
subspace
orthogonal
to model
subspace
49
I.C Model selection
• Generally, we want to determine the model
that best describes the data.
– obtain estimates of our parameters under
several alternative models.
– Choose model using an analysis of variance
(ANOVA).
Statistical Modelling Chapter I
50
b) Regression analysis of
variance
• An ANOVA is used to compare potential
models.
• In the case of the regression model, it is
common to want to choose between two
expectation models, one which is a subset
of the other.
Statistical Modelling Chapter I
52
Testing all expectation
parameters are zero
• The simplest, although not necessarily the most
useful, situation is where one compares the
expectation models
E Yi   0  1x1i  2 x2i and
E Yi   0
• So we first state the null and alternative
hypothesis for the hypothesis test.
H0:   0 (equivalent to E[Yi]  0)
H1:   0 (equivalent to E[Yi]  0 + 1x1i + 2x2i)
Statistical Modelling Chapter I
53
Computing test statistic using ANOVA table
Source
DF
SSq
Model0
0
00
Model1  Model0
(Model)
q
ˆ ψ
ˆ  00
ψ
Residual
Total
n q
εˆ εˆ
n
Y Y
MSq
ˆ ψ
ˆ
ψ
 sM2
q


εˆ εˆ
 sR2
n q


F
p
sM2
Pr Fq,n q  FO


sR2
• Generally ANOVA comparing two models involves
– SSqs of estimators of  for the null model and
– difference between SSqs of estimators of  for the two models.
• In this case,
– Estimators for null model are all 0 and so the difference in SSqs is
equal to the SSq of the estimators of  of the alternative model.
– Could leave Model0 out of the table altogether.
• Note use of s2, the symbol for variance, for MSqs
– because MSqs are variances (ratio of a SSq to its df).
Statistical Modelling Chapter I
54
Computing test statistic using ANOVA table
Source
Model
(Regression)
Residual
Total
DF
q
SSq
ˆ ψ
ˆ
ψ
n q
εˆ εˆ
n
YY
MSq
F
ˆ ψ
ˆ
ψ
 sM2
q


εˆ εˆ
 sR2
n q


sM2
p

Pr Fq,n  q  FO

sR2
• Two parallel identities:
– Obviously Total df  Model df + Residual df.
– Not so clear Total SSq  Model SSq + Residual SSq
(but remember geometry).
• SSq are of estimators of  and , and of Y.
• If the p-value is less than the significance level, a,
the H0 is rejected. Usually, a  0.05.
Statistical Modelling Chapter I
55
Squared length of a vector 
SSq of its elements
Q CT y
Residual space
Q R y  εˆ
y
εˆ
2
2
Model space
ˆ
ψ
2
ˆ  Q My
ψ
• From Pythagoras’ theorem y
2
2
ˆ  εˆ
 ψ
2
• This is equivalent to the SSQ identity
ˆ ψ
ˆ  εˆεˆ
yy  ψ
Statistical Modelling Chapter I
58
Example I.3 Single sample (continued)
Easy to verify that
the squared lengths,
or SSq, are 9, 8.33
and 0.67 for total,
fitted and residual,
respectively.
• Because data is very close to fitted line, only a small
vector in residual space with small squared length.
• But fitted values involve only 1 value and so has only 1 df
whereas residuals has 2 independent values and so 2 df.
• Adjust by dividing each SSq by it df, to yield mean
squares: results in 8.33 and 0.33. Bigger difference!
Statistical Modelling Chapter I
59
Example I.1 House price (continued)
• For this example, using the computer we find that
33.06 
θˆ   0.189 
10.178 
• The estimated expected value is given by
E Y   33.0626  0.1897x1  10.7182x2
• In this case QM, a 5  5 a projection matrix, is somewhat
more complicated. Using R:
0.359
0.249 0.138 0.193 
0.448
0.683 0.245 0.160 0.042 
0.359
QM  0.249 0.245
0.805 0.188 0.004 
0.160
0.188 0.215 0.298 
0.138
0.193
0.042
0.004 0.298 0.849 
• Fitted values from
– Equation, or
– Applying QM to y.
Statistical Modelling Chapter I
60
Fitted values and residuals
Observations
  y
Fitted values
 ψˆ  QMy 
50
40
52
47
65
43.59116
42.83241
53.55064
52.60221
61.42357
13142.32
SSq
Residuals
 εˆ  y  QMy 



Q
y


R
6.408840
-2.832413
-1.550645
-5.602210
3.576427
95.67588
• Note that
– the Observations are equal to sums of Fitted values
and the Residuals
– the sum of the last two SSqs is approximately equal to
the Total SSq.
Statistical Modelling Chapter I
61
ANOVA table
Source
Regression
Residual
Total
DF
3
2
5
SSq
13142.32
95.68
13238.00
MSq
F
p
4380.78 91.58 0.0108
47.84
• Note that the p-value is obtained using R.
• As the p-value is less than 0.05, the null
hypothesis is rejected.
• The expectation model E[Yi]  0 does not provide
as good a description of the data as the model
E[Yi]  0 + 1x1i + 2x2i.
Statistical Modelling Chapter I
62
Testing that a subset of the
expectation parameters are zero
• A more useful test involves testing that just some
of the s are zero.
• For example, in multiple linear regression you
might want to choose between the expectation
models
– E[Yi]  0 + 1x1i + 2x2i
– E[Yi]  0
• Again, we first state the null and alternative
hypothesis for the hypothesis test.
– H0: 1  2  0 (equivalent to E[Yi]  0)
– H1: 1,2  0 (equivalent to E[Yi]  0 + 1x1i + 2x2i)
Statistical Modelling Chapter I
63
Computing test statistic using
ANOVA table
Source
DF
SSq
Model0
1
GG
q 1
ˆ 1ψ
ˆ 1  GG
ψ
Model1  Model0
Residual
Total
n q
εˆ εˆ
n
Y Y
MSq
F
p
sM2
sM2
Pr F1,n  2  FO 
sR2
sR2
2
ˆ 1ψ
ˆ 1  GG  q  1 and sR2  εˆ εˆ  n  q 
sM
 ψ
• Now the null model is not 0, but the grand mean
model considered in Example I.3, Single sample.
– Showed estimated expected values, 0, is vector G
• The Model SSq does not look like a SSq, but a
difference. Can show

Statistical Modelling Chapter I
ˆ 1ψ
ˆ 1  GG  ψ
ˆ 1  G ψ
ˆ 1  G
ψ
64
Factoring out Intercept term
• So unusual to test a hypothesis about a model
that does not include the intercept term that the
SSq for the model involving only it is usually
subtracted out of the ANOVA.
• Form corrected Total SSq
• Again, one can
– either subtract the grand mean from the observations
and form the SSq
– or subtract the SSQ for the grand mean model from
the uncorrected total SSq
• because
Statistical Modelling Chapter I
YY  GG   Y  G  Y  G 
65
Revised ANOVA table
Source
DF
SSq
MSq
F
p
Model1  Model0
(Model)
q 1
ψˆ 1  G ψˆ 1  G
sM2
sM2
Pr F1,n  2  FO 
Residual
n q
εˆ εˆ
sR2
(Corrected) Total
n 1
 Y  G  Y  G
sR2
2
ˆ 1  G   ψ
ˆ 1  G   q  1 and sR2  εˆ εˆ  n  q  where G  1nY
sM
 ψ
• In this analysis, obtain SSqs of the following quantities:
Model SSq: differences between the estimators of the
expected values for the two models in the hypotheses.
Residual SSq: estimators of the errors obtained by
subtracting the estimators of the expected values under
the alternative model from the random vector Y.
(Corrected) Total SSq: deviations from the grand mean
obtained by subtracting the grand mean estimator from
the random vector Y.
Statistical Modelling Chapter I
66
Example I.1 House price (continued)
• Take previously computed fitted values and residuals
• Subtract g  50.8 15 from the response variable and the
fitted values to obtain:
Observa Deviations
Fitted
Model
Residuals
ˆa
tions
values
differences  y  ψ
 y  g
 y
ψˆ a 
ψˆ a  g
50
43.59116
-7.20884 6.408840
40
42.83241
-7.96759 -2.832413
52
53.55064
2.75064 -1.550645
47
52.60221
1.80221 -5.602210
65
61.42357
10.62357 3.576427
SSq 13238
13142.32 239.12409 95.67588
• Note that
• Deviations are equal to sum of the Model differences and the
Residuals and
• Sum of the last two sums of squares is approximately equal to the
Deviations sum of squares.
Statistical Modelling Chapter I
67
ANOVA table for the example
Source
Regression
Residual
Total
DF
2
2
4
SSq
239.124
95.676
334.800
MSq
119.562
47.838
F
2.50
p
0.2857
• As the p-value is greater than 0.05, the null
hypothesis cannot be rejected.
• The expectation model E[Yi]  0 + 1x1i + 2x2i
does not describe the data any better than the
model E[Yi]  0.
• As the latter model is simpler, it will be adopted
as the model that best describes the data.
Statistical Modelling Chapter I
68
I.E Exercises
• There are exercises at the end of the
chapter that review the material covered in
this chapter.
Statistical Modelling Chapter I
69
Download