Inferences about slope

advertisement
732G21/732A35/732G28
1
Formal statement
Yi   0  1 X 1   i




Yi is i th response value
β0 β1 model parameters, regression parameters (intercept,
slope)
Xi is i th predictor value
 i is i.i.d. normally distributed random vars with expectation
zero and variance σ2
732G21/732A35/732G28
2
Inference about regression coefficients and response:

Interval estimates and test concerning coefficients

Confidence interval for Y

Prediction interval for Y

ANOVA-table
732G21/732A35/732G28
3

After fitting the data, we may obtain a regr. line
y  1.5  0.00005 x


Is 0.00005 significant or just because of random variation?
(hence, no linear dependence between Y and X)
How to do?
◦ Use Hypothesis testing (later)
◦ Derive confindence interval for β0 . If ”0” does not fall within this
interval, there is dependence
732G21/732A35/732G28
4

Estimated slope b1 is a random variable (look at formula)
 X
n
b1 
i 1
i
 X Yi  Y 
 X
n
i 1
i  X
2
Properties of b1
 Normally distributed (show)
 E(b1)= β1
2
2
 b1  n
 Variance
2
 X
i 1
Further:
i
X
Test statistics
b1  1
sb1 
is distributed as t(n-2)
732G21/732A35/732G28
5


See table B.2 (p. 1317)
Example one-sided interval t(95%), 15 observations
t13=1.771
732G21/732A35/732G28
6

Confidence interval for β1 (show…)
b1  t 1   / 2, n  2sb1 

If variance in the data is unknown,
s b1 
s2
2
 X
n
i 1
i  X
2
Example Compute confidence interval for slope, Salary dataset
732G21/732A35/732G28
7
50
y = 0.5471x + 8.4545
45
40
Salary (y)
35
30
25
20
15
10
5
0
0
10
20
30
40
50
60
70
Age (x)
732G21/732A35/732G28
8

Often, we have sample and we test at some confidence
level α
H o :   0
H a :   0
or
H o :   0
H a :   0
or
H o :   0
H a :   0
How to do?



Step 1: Find and compute appropriate test function
T=T(sample,λ0)
Step 2: Plot test function’s distrubution and mark a critical
area dependent on α
If T is in the critical area, reject H0 otherwise do not reject H0
(accept H1)
732G21/732A35/732G28
9

Test
H o : 1  0
H a :  1 0
b1
Step 1: compute t 
sb1
*



Step 2: Plot the distribution , mark the points  t 1   / 2, n  2 and
the critical area.
Step 3: define where t* is and reject H0 if it is in the critical area
Example
Test the hypothesis for Salary dataset:
 Manually, compute also P-values
 By Minitab
732G21/732A35/732G28
10
Sometimes, we need to know ” β0=0?”
Do confidence intervals and hypothesis testing in the same way
using folmulas below!

b0  Y  b1 X
Properties of b0
 Normally distributed (show)

 E(b0)= β0

2
2 1
 Variance (show..)  b0     
n


Further:
Test statistics


X2

n
2


X

X

i

i 1
b0   0
sb0 
is distributed as t(n-2)
732G21/732A35/732G28
11


If distribution not normal (if slightly, OK, otherwise
asymptotic)
Spacing affects variance (larger spacing –smaller variance)
Example Test β0=0 for Salary data
732G21/732A35/732G28
12
Estimate at X=Xh (Xh – any):
Properties of E(Yh)
 Normally distributed (show)
 E (Yˆh )  E Yh 

2
1


X

X
 Variance
h
 2 Yˆ   2  

h
Further:
n

Yˆh  b0  b1 X h



n
2
X i  X  

i 1

ˆ  E Y 
Y
h is
Test statistics h
s Yˆh
Confidence interval

distributed as t(n-2)
 
Yˆh  t 1   / 2, n  2s Yˆh
732G21/732A35/732G28
13

Make a plot…
CONFIDENCE INTERVAL
We estimate the position of the mean in the population with X = Xh
POINT ESTIMATE
PREDICTION INTERVAL
We estimate the position of the individual observation in the
population with X = Xh
732G21/732A35/732G28
14

When parameters are unknown, the mean E(Yh) may have
more than one possible location
New observation = mean + random error
-> prediction interval should be wider

732G21/732A35/732G28
15
Further:
ˆ
Y

Y
h
(
new
)
h is distributed as t(n-2)
Test statistics
spred 
Prediction interval
Yˆh  t 1   / 2, n  2s pred 

How to estimate s(pred) ? New observ. is any within
b0+b1Xh+ε. Hence
 
 2  pred    2 b0  b1 X h      2 b0  b1 X h    2     2 Yˆh   2


Standard error (show)


2
 1

Xh  X  
2

s pred   MSE 1   n
2
 n

Xi  X  



i 1
732G21/732A35/732G28
16
Example
 Calculate confidence and prediction intervals for 35 years old
person
 Compare with output in Minitab
732G21/732A35/732G28
17

Total sum of squares
SSTO   Yi  Y 
n
Error sum of squares
SSE  
i 1

Regression sum of squares
i 1

SSR   Yˆ  Y 
n

2

Yi  Yˆi
2
n
i 1
2
i
SSTO  SSR  SSE
732G21/732A35/732G28
18

SSTO has n-1 (sum up to zero)

SSE has n-2 ( 2 model parameters)

SSR has 1 (fitted values lie on regression line= 2 degreessum up to zero 1 degree)
n-1 = n-2 + 1
SSTO =SSE + SSR
Important :
MSxx= SSxx/degrees_of_freedom
732G21/732A35/732G28
19

ANOVA table
Source of
variation
SS
df
Regression
SSR   Yˆ  Y 
MS
2
1

n-2
i

Error
SSE   Yi  Yˆi
Total
SSTO   Yi  Y  n - 1
2
MSR 
SSR
1
MSE 
SSE
n2
2
732G21/732A35/732G28
20
Expected mean squares
E MSE    2
E MSR     
2



2
1
 X
n
i 1
X
2
i
E(MSE) does not depend on the slope, even when zero
E(MSR) =E(MSE) when slope is zero
-> IF MSR much more than MSE, slope is not zero, if
approximately same, can be zero
732G21/732A35/732G28
21
H o : 1  0
H a :  1 0

Test statistics F* = MSR/MSE , use F(1,n-2) (see p. 1320)
Decision rules:


If F* > F(1-α;1, n-2) conclude Ha
If F* ≤ F(1-α;1, n-2) conclude H0
Note: F test and t test about β1 are equivalent
732G21/732A35/732G28
22

General approach
H o : 1  0
H a :  1 0

Full model: (linear)
n
n
SSE ( F )   Yi  (b0  b1 X )  
2
i 1

i 1
Reduced model: (constant)

Yi  Yˆi

2
 SSE
SSE ( R)   Yi  b0   Yi  Y   SSTO
n
i 1
2
n
2
i 1
732G21/732A35/732G28
23
It is known (why?..)
SSE(F)≤SSE(R).
Large difference -different models, small difference – can be
same


Test statistics
SSE R   SSE F  SSE ( F )
F 
/
df R  df F
df F
*



For univariate linear model, equivalent to F* = MSR/MSE
F* belongs to F(dfR-dfF,dfF) distribution (plot critical area..)
Test rule: F*> F(1-α; dfR-dfF,dfF)  reject H0
732G21/732A35/732G28
24
Example For Salary dataset
 Compose ANOVA table and compare with MINITAB
 Perform F-test and compare with MINITAB
732G21/732A35/732G28
25

Coefficient of determination:
SSR
R 
SSTO
2

Coefficient of correlation:
r  R2
Limitations:
 High R does not mean a good fit
 Low R does not mean than X and Y are not related
Example: For Salary dataset, compute R2 and compare with
MINITAB
732G21/732A35/732G28
26

Chapter 2 up to page 78
732G21/732A35/732G28
27
Download