Chapter 13 Notes

advertisement
Chapter 13 Simple Linear Regression and Correlation: Inferential Methods
13.1: Simple Linear Regression Model
Deterministic Relationship:
A descriptions of the relationship between two variables that are not deterministically related can
be given by a probabilistic model.
The equation for an additive probabilistic model is:
The simple linear regression model assumes that there is a line with y-intercept α and slope β,
called the population regression line.
When a value of the independent variable x is fixed and an observation on the dependent variable y
is made,
y
x
1
x
2
x
Basic Assumptions of the Simple Linear Regression Model
1.
2.
1
3.
4.
Weight
Let’s look at the heights and weights of a population of adult women
60
62
64
66
68
Height
We use 𝑦̂ = 𝑎 + 𝑏𝑥 to estimate the true population regression line.
2
Medical researchers have noted that adolescent females are much more likely to deliver low-birthweight babies than are adult females. Because low-birth-weight babies have higher mortality rates,
a number of studies have examined the relationship between birth weight and mother’s age for
babies born to young mothers.
The following data is on x = maternal age (in years) and y = birth weight of baby (in grams).
x
y
15
2289
17
3393
18
3271
15
2648
16
2897
The statistic for estimating the variance 𝜎 2 is
The estimate for the standard deviation 𝜎 is
Why is the degree of freedom n-2?
𝑠𝑒 =
𝑟2 =
3
19
3327
17
2970
16
2535
18
3138
19
3573
Practice Problems:
1. Can the number of watts be used to help determine the price of a microwave? The table
below contains data on microwave ovens from Target.
Power(watts) 1100 700
700
1200 1200 1200 1200 1000 1000 1000 700
Price ($)
80
80
50
90
100
90
110
90
75
80
63
a. Describe in words the X and Y variables and write a simple linear regression model in
context.
b. Check that the basic assumptions are met.
c. Estimate the slope and intercept of the line using your calculator.
d. Interpret the slope of the line in context.
e. Predict the price of a microwave that has a power of 900 watts.
f.
What range of possible power values are you comfortable using this model to predict
price? Explain why.
g. Obtain the residuals: Go to [Stat] "1: Edit". Select L3 with the arrow keys. [Enter] [2nd]
"list". Scroll down and select RESID. [Enter] [Enter] again. Fill in the residuals in the
table below:
Power(watts) 1100 700
700
1200 1200 1200 1200 1000 1000 1000 700
Price ($)
80
80
50
90
100
90
110
90
75
80
63
RESID
h. Compute SSResid: Obtain 1-Var Stats for the RESID list and SSResid is ∑ 𝑥 2:
4
i.
What are the values of se and r2?
j.
Interpret se and r2 in the context of the problem.
Homework: 1, 3-5, 8, 11
13.2: Inferences About the Slope of the Population Regression Line
Properties of the Sampling Distribution of b (Since β is almost always unknown, it must be
estimated from independently selected observations. The slope of b of the least-squares line give a
point estimate for β.)
When the four basic assumptions of the simple linear regression model are satisfied, the following
statements are true:
1.
2.
3.
Confidence Interval for β
When the four basic assumptions of the simple linear regression model are satisfied, a confidence
interval for β, the slope of the population regression line, has the form:
Is cardiovascular fitness (as measured by time to exhaustion from running on a treadmill) related to
an athlete’s performance in a 20-km ski race?
The following data on x = treadmill time to exhaustion (in minutes) and y = 20-km ski time (in
minutes) were taken from the article “Physiological Characteristics and Performance of Top U.S.
Biathletes” (Medicine and Science in Sports and Exercise, 1995):
x
y
5
7.7
71.0
8.4
71.4
8.7
65.0
9.0
68.7
9.6
64.4
9.6
69.4
10.0
63.0
10.2
64.6
10.4
66.9
11.0
62.6
11.7
61.7
Find a 95% confidence interval for the slope of the true regression line. (LinRegTInt)
Interpretation:
Summary of Hypothesis Tests Concerning β
Null hypothesis:
Test Statistic:
Alternative Hypothesis:
6
P-value:
For this test to be appropriate the four basic assumptions of the simple regression model must be
met:
1.
2.
3.
4.
A slope of zero:
The Model Utility Test for Simple Linear Regression
The model utility test for simple linear regress is the test of:
Test Statistic:
Biathletes Revisited: (LinRegTTest)
Hypotheses:
t=
P-value:
α:
df:
Conclusion:
7
Practice Problems:
2. How is resting body temperature dependent on heart rate? The Minitab output below is an
analysis of body temperature data from the Journal of Statistics Education Data Archive
using a simple linear regression model.
a. Perform a hypothesis test for the model utility, including stating the hypothesis, checking
assumptions, setting a reasonable value for α, giving the formula for the test statistic,
identifying the relevant values from the minitab output and making a conclusion.
b. Construct a 95% confidence interval for the slope of the regression line (include all necessary
information and justifications!)
Homework: 13, 15, 18-21
8
13.3: Checking Model Accuracy
The simple linear regression model is
We must estimate these deviation using the residuals from the estimated line. Thus, we use the
residuals to check our assumptions.
Residual Analysis:

Standardize the residuals to look at their magnitudes

A Look at Standardized Residual Plots
9
Biathletes Revisited:
r=
sr=
Normal Probability Plot:
Standardized Residual Plot:
Residual Plot:
Practice Problems:
3. The Federal Trade Commission evaluates cigarettes to determine their tar and carbon
monoxide contents. An analysis of a random sample of 35 brands using a simple linear
regression model had the following results:
10
a. State the simple linear regression model and its assumptions.
b. What is the equation of the estimated regression line?
c. Identify the values of se and r2. How well does this model perform?
d. Use the scatterplot to roughly check the model assumptions.
e. Examine the five plots below.
Which of the plots shows an unusual observation?
Which of the plots shows a potentially influential observation?
Which of the plots shows evidence of a noncontsant variance?
Which of the plots shows evidence of a curvi-linear relationship?
11
f.
Which of the following plots show evidence of non-normality? Explain.
g. Do following residual plots of the cigarette data show any interesting features that lead to
you feel the simple linear regression model assumptions have been met?
Homework:27, 28, 31
12
Download