Week 12, Lecture 1, Hypothesis tests for regression

advertisement
Business Statistics - QBM117
Interval estimation for the slope and y-intercept
Hypothesis tests for regression
Objectives

To determine confidence interval estimators of the slope
and the y intercept.

To test hypotheses about the slope of the regression line.
Estimating the slope and the y-intercept
As we are aware from our study on confidence interval
estimators previously, there are two types of estimators
when estimating a population parameter:
point estimators and interval estimators.
The point estimators for the slope and the y - intercept can
easily be determined from the Excel output generated when
fitting the regression.
The interval estimators can be just as easily determined from
the Excel output generated.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.86673932
R Square
0.75123705
Adjusted R Square 0.68904631
Standard Error
6.51658723
Observations
6
ANOVA
df
Regression
Residual
Total
SS
MS
F Significance F
1 512.9697 512.9697 12.07956 0.025454365
4 169.8636 42.46591
5 682.8333
Intercept
Experience
Coefficients Std Error t Stat P-value Lower 95% Upper 95%
15.3181818 6.213322 2.465377 0.06929 -1.93280173 32.5691654
1.67272727 0.481282 3.475567 0.025454 0.336471833 3.00898271
Therefore the 95% confidence interval estimate of the
slope is from 0.336 to 3.009 ie from $336 to $3009.
Excel also generates a confidence interval estimate for
the y-intercept. This will only be considered if the yintercept has a sensible interpretation in the situation
described.
For our salary and experience example, the y- intercept
does has a sensible interpretation ie it is the salary for a
person with no experience. As such, we would also be
interested in determining a confidence interval estimate
of the intercept.
Therefore the 95% confidence interval estimate of the
intercept is from -1.933 to 32.569 ie from -$1933 to $32 569.
Testing whether the relationship is real or
coincidence


We can easily summarise the relationship between two
variables, whether it exists or not.
Hypothesis testing will tell us whether the relationship that
appears to be there, is pure coincidence or, there is in fact a
significant relationship between the two variables.

The null hypothesis states that there is no relationship
between x and y.

Therefore the hypotheses for testing a significant
relationship are
H 0 : 1  0
H A : 1  0
Why Statistical Inference?

Because there can seem to be a relationship
• when, in fact, the population is just random

Below are plots of the data from samples of size
n = 10
• from a population with no relationship (correlation 0)
• Notice that the sample correlations are not zero!
• This is due to the randomness of sampling
r = – 0.471
r = 0.089
r = 0.395
For our example, we would be testing: is there a
significant relationship between salary and experience?
Step 1
H 0 : 1  0
H A : 1  0
Step 2
t
ˆ1  1
sˆ
1
Step 3
  0.05 t / 2,n2  t0.025, 4  2.776
Step 4
Reject H 0 if t sample  2.776 or t sample  2.776
Step 5
t
ˆ1  1
s ˆ
1
1.67  0

0.48
 3.48
Intercept
Experience
Coefficients Std Error t Stat P-value Lower 95% Upper 95%
15.3181818 6.213322 2.465377 0.06929 -1.93280173 32.5691654
1.67272727 0.481282 3.475567 0.025454 0.336471833 3.00898271
t sample  3.476 (from Excel output)
Step 4
Reject H 0 if t sample  2.776 or t sample  2.776
Step 5
t sample  3.48
Step 6
Since 3.48 > 2.776 we reject H0.
There is sufficient evidence at  = 0.05 to conclude that
there is a significant linear relationship between salary and
experience.
Using the p-value to test: is there a significant
relationship between salary and experience?
H 0 : 1  0
H A : 1  0
  0.05
Reject H 0 if p  value  0.05
Level of significance:
Decision rule:
Intercept
Experience
Coefficients Std Error t Stat P-value Lower 95% Upper 95%
15.3181818 6.213322 2.465377 0.06929 -1.93280173 32.5691654
1.67272727 0.481282 3.475567 0.025454 0.336471833 3.00898271
Since p  value  0.025  0.05 we reject H 0
There is sufficient evidence at  = 0.05 to conclude that
there is a significant linear relationship between salary and
experience.
An important point to remember about using the p-value to
test a hypothesis is that the p-value can give us a good
indication of how much evidence exists to support the
alternative hypothesis.
The smaller the p-value, the more overwhelming is the
evidence to support the alternative hypothesis.
In our example here, the p-value was only 0.025. This
allows us to conclude that a linear relationship exists when
testing at  = 0.05 and 0.1, but our conclusion would be
different at  = 0.01
Testing for a significant correlation



In situations where we are interested in how the
independent variable affects the dependent variable, we
estimate and test hypotheses about the linear regression
model.
In many situation however, one variable does not influence
the other and therefore we are not interested in estimating
how the independent variable affects the dependent
variable.
We simply want to test whether there is a linear correlation
between the two variables.
Testing for a significant correlation

For these situations the null hypothesis states that there is
no linear correlation between x and y.

Therefore the hypotheses for testing a significant linear
correlation are
H0 :   0
HA :   0
When we test for a significant correlation, you will find
that the value of the test statistic and the conclusion are
exactly the same as when we test for a significant
relationship between two variables.
This is because we are in fact testing the same thing.
Are the two variables linearly related (correlated)?
Therefore we perform one test or the other - not both!
For our previous example, we would be testing: is there a
significant linear correlation between salary and
experience?
Step 1
H0 :   0
HA :   0
Step 2
r
1 r
t
where sr 
sr
n2
2
Step 3
  0.05 t / 2,n2  t0.025, 4  2.776
Step 4
Reject H 0 if t sample  2.776 or t sample  2.776
Step 5
r
t
sr
1 r
where sr 
n2
2
0.751  0
1  0.751

where sr 
0.249
62
 3.48
Intercept
Experience
Coefficients Std Error t Stat P-value Lower 95% Upper 95%
15.3181818 6.213322 2.465377 0.06929 -1.93280173 32.5691654
1.67272727 0.481282 3.475567 0.025454 0.336471833 3.00898271
t sample  3.476 (from Excel output)
Step 4
Reject H 0 if t sample  2.776 or t sample  2.776
Step 5
t sample  3.48
Step 6
Since 3.48 > 2.776 we reject H0.
There is sufficient evidence at  = 0.05 to conclude that
there is a significant linear correlation between salary and
experience.
Using the p-value to test: is there a significant correlation
between salary and experience?
H0 :   0
HA :   0
  0.05
Reject H 0 if p  value  0.05
Level of significance:
Decision rule:
Intercept
Experience
Coefficients Std Error t Stat P-value Lower 95% Upper 95%
15.3181818 6.213322 2.465377 0.06929 -1.93280173 32.5691654
1.67272727 0.481282 3.475567 0.025454 0.336471833 3.00898271
Since p  value  0.025  0.05 we reject H 0
There is sufficient evidence at  = 0.05 to conclude that
there is a significant linear correlation between salary and
experience.
Reading for next lecture
Read Chapter 18 Sections 18.6
(Chapter 11 Sections 11.6 abridged)
Exercises to be completed before next lecture
S&S 18.27
(11.27
18.29
11.29 abridged)
Download