Chapter 10.1 — Inference for Simple Linear Regression

advertisement
Chapter 10.1 — Inference for Simple Linear Regression
Stat 226 – Introduction to Business Statistics I
is the linear relationship between x and y significant or
not?
Spring 2009
Professor: Dr. Petrutza Caragea
Section A
Tuesdays and Thursdays 9:30-10:50 a.m.
Do New Jersey banks serve minority communities?
Financial institutions have a legal and social responsibility to serve all
communities. Do banks adequately serve both inner-city and suburban
neighborhoods, both poor and wealthy communities? In New Jersey, banks
have been charged with withdrawing from urban areas with a high
percentage of minorities. To examine this charge, a regional New Jersey
newspaper, the Asbury Park Press compiled county by county data on the
number (y ) of people in each county per branch bank in the county and
the percentage (x) of the population in each county that is minority .
Chapter 10, Section 10.1
Inference for simple linear regression
Source: McClave, J.T., Benson, P.G., Sincich T.; (2007), Statistics for Business and Economics, 10th Edt., Prentice Hall, Upper
Saddle River, NJ.
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
1 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
2 / 28
Chapter 10.1 — Inference for Simple Linear Regression
data:
4000
1
2
3
4
5
..
.
Atlantic
Bergen
Burlington
Camden
Cape May
..
.
21
Warren
number of people
per bank branch
3,073
2,095
2,905
3,330
1,321
..
.
percentage of
minority population
23.3
13
17.8
23.4
7.3
..
.
2,349
2.8
3500
3000
number of people
per bank branch
county
2000
1500
1000
If charge against New Jersey holds true we should see an increase in the
number of people per bank (less bank branches) as the minority
percentage in population increases.
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
2500
0
10
20
30
40
50
percentage of
minority population
3 / 28
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
4 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Chapter 10.1 — Inference for Simple Linear Regression
population regression line
4000
Correlation:
Because we have complete data for all 21 New Jersey counties and only
New Jersey is of interest to us, we have data on the entire population.
3500
number of people
per bank branch
3000
The least squares regression line fitted through the 21 observations
corresponds therefore to the so-called population regression line
2500
2000
µy = β0 + β1 x
1500
1000
0
10
20
30
40
50
percentage of
minority population
LS regression line:
β0 and β1 are population parameters describing the linear relationship
between x and y in the entire population.
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
5 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
6 / 28
Chapter 10.1 — Inference for Simple Linear Regression
data for the 21 New Jersey counties (the entire population):
Note:
4000
The population regression line µy = β0 + β1 x describes the linear
relationship between the explanatory variable x and µy , i.e. the
relationship between x and the average/mean value of y for a given x.
3500
If we are interested in describing each individual y in the population, we
need to account for the fact that not all y are equal to µy and therefore
will not fall on the straight line but will deviate from the line by some error
ε:
number of people
per bank branch
3000
2500
2000
1500
1000
0
y = β0 + β1 x +ε
! "# $
10
20
30
40
50
percentage of
minority population
µy
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
7 / 28
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
8 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Chapter 10.1 — Inference for Simple Linear Regression
New Jersey counties:
Typically we are not as fortunate and won’t be able to observe an entire
population. Hopefully though, with the help of a representative random
sample, we still will obtain reliable information about the true underlying
linear relationship in the population.
The simple linear regression model
y = β0 + β1 x + ε
Recall the general form of the fitted least squares regression line from
Chapter 2
y% = a + bx,
allows us to describe the linear relationship between each yi for a given
value of the explanatory variable xi (i=1,2,. . . ,21), i.e.
where a and b are obtained from the sample as follows:
yi = β0 + β1 xi + εi
The εi ’s are independent and normally distributed with mean 0 and
standard deviation σ — this is an important assumption to which we will
come back to later.
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
b=r·
9 / 28
Stat 226 (Spring 2009)
sy
sx
and
a = ȳ − b · x̄
Introduction to Business Statistics I
Section 10.1
Chapter 10.1 — Inference for Simple Linear Regression
Chapter 10.1 — Inference for Simple Linear Regression
We can use a to estimate β0 and b to estimate β1 :
Knowing the sampling distribution of b0 and b1 allows us to:
10 / 28
Both, a and b are sample statistics and will vary from sample to sample.
If we took another sample we would get different values of a and b
(sampling variability).
1
construct confidence intervals for the slope β1 and intercept β0
Consequently, a and b have a sampling distribution.
2
test whether the response y depends linearly on x, i.e. there is a
significant linear relationship between x and y in the population
The textbook unfortunately switches notation from Chapter 2 to Chapter 10. In
the following we will denote a as b0 and b as b1 .
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
11 / 28
Generally, we will focus on the slope β1 because the value of the slope
determines whether or not a linear relationship between x and y exists.
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
12 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Chapter 10.1 — Inference for Simple Linear Regression
Note, in order to test whether a linear relationship exists between x
and y , we need to test whether the population slope β1 = 0
Why? If β1 = 0, we get the following regression model
y
y
y
= β0 + β1 · x + ε
= β0 + 0 · x + ε
= β0 + ε
if β1 = 0 ⇒ x does not help explain the behavior of y .
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
13 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Stat 226 (Spring 2009)
14 / 28
checking the assumptions
Before we are going to construct CIs and tests, we should have a look at
assumptions that are necessary for inference on regression parameters:
1
simple random sample (ensuring independence of y ’s)
2
linear relationship between x and µy
3
standard deviation of the responses about the population line is the
same for all values of the explanatory variable x
4
the response y varies according to a normal distribution about the
population regression line for all values of the explanatory variable x
Introduction to Business Statistics I
Section 10.1
Chapter 10.1 — Inference for Simple Linear Regression
assumptions for regression inference
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
15 / 28
1
independence:
2
linear relationship:
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
16 / 28
Chapter 10.1 — Inference for Simple Linear Regression
1
Chapter 10.1 — Inference for Simple Linear Regression
normality:
confidence intervals for slope β1
recall: the general form of a confidence interval is given
estimate ± margin of error,
where margin of error corresponds to critical value × standard error
2
constant variance:
CI for the slope β1 is of the same form:
b1 ± t ∗ SEb1 ,
the standard error SEb1 can be obtained from the JMP output.
Note, the critical value t ∗ corresponds now to a t-distribution with
df=n-2.
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
17 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Introduction to Business Statistics I
Section 10.1
Introduction to Business Statistics I
Section 10.1
18 / 28
Chapter 10.1 — Inference for Simple Linear Regression
New Jersey example: Let’s construct a 95% confidence interval for the
slope β1 :
Stat 226 (Spring 2009)
Stat 226 (Spring 2009)
19 / 28
Interpretation cont’d:
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
20 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Chapter 10.1 — Inference for Simple Linear Regression
Note: If there exists a linear relationship (β1 $= 0), then this linear
relationship can be either positive or negative
testing for a significant linear relationship, i.e. β1 $= 0
example: New Jersey data example
β1 < 0 ⇒ negative relationship
Is there a significant linear relationship between the percentage of the
minority population and the number of people per bank branch?
Recall the population regression line
β1 > 0 ⇒ positive relationship
µy = β0 + β1 x
We are interested in showing that β1 is significantly different from zero,
i.e. β1 $= 0 because this implies that there exists indeed a linear
relationship between x and y .
We therefore set up the following hypotheses
Ha : β1 < 0 for negative linear relationship
Ha : β1 $= 0 (there exists a linear relationship between x and y )
Introduction to Business Statistics I
If we are specifically interested in showing either a positive or negative
relationship we need to set up the alternatives accordingly, i.e.
Ha : β1 > 0 for positive linear relationship
H0 : β1 = 0 (no linear relationship between x and y )
Stat 226 (Spring 2009)
If we are simply interested in showing that a linear relationship exists and
the direction (either positive or negative) is not important, we test H0
against the two-sided alternative Ha : β1 $= 0
Section 10.1
21 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Stat 226 (Spring 2009)
p-values are found in exactly the same way we have done before.
Depending on the alternative, the p-value corresponds to
b1 − β1
SEb1
Under the null hypothesis we assume β1 = 0, the test statistic therefore
simplifies to
b1 − 0
b1
=
t=
SEb1
SEb1
Ha : β1 $= 0
Ha : β1 > 0
Ha : β1 < 0
Often this test statistic is called the t-ratio (e.g. in JMP)
Introduction to Business Statistics I
22 / 28
finding the p-value
with df=n-2 for a t-distribution, b1 is the estimate of β1 based on sample.
Stat 226 (Spring 2009)
Section 10.1
Chapter 10.1 — Inference for Simple Linear Regression
A general form of the test statistic is given by
t=
Introduction to Business Statistics I
Section 10.1
23 / 28
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
24 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Chapter 10.1 — Inference for Simple Linear Regression
Note, JMP gives p-values corresponding to a two-sided alternative, i.e.
Ha : β1 $= 0. We need to divide the JMP p-value by 2 if we are
interested in testing a one-sided alternative such as Ha : β1 > 0 or
Ha : β1 < 0!
Linear Fit
decision rule: as before, we reject H0 if p − value ≤ α
conclusion: Rejecting H0 implies that there exists a statistically
significant linear relationship between x and y .
Linear Fit
number of people per bank branch = 2082.0153 + 35.287737 percentage of
minority population
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.526538
0.501619
400.2546
2693.333
21
Does this conclusion imply a change in the response y can be caused by a
change in the explanatory variable x?
Analysis of Variance
Source
Model
Error
C. Total
DF Sum of Squares Mean Square
1
3385090.2
3385090
19
3043870.4
160204
20
6428960.7
F Ratio
21.1299
Prob > F
0.0002*
Parameter Estimates
Term
Intercept
percentage of minority population
Stat 226 (Spring 2009)
Estimate Std Error
2082.0153 159.107
35.287737 7.676707
t Ratio Prob>|t|
13.09 <.0001*
4.60 0.0002*
Introduction to Business Statistics I
Section 10.1
25 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
26 / 28
Chapter 10.1 — Inference for Simple Linear Regression
Example: New Jersey banks
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
27 / 28
Stat 226 (Spring 2009)
Introduction to Business Statistics I
Section 10.1
28 / 28
Download