Sec2-6 - Personal.psu.edu

advertisement
STAT 250
Dr. Kari Lock Morgan
Simple Linear
Regression
SECTION 2.6
• Least squares line
• Interpreting coefficients
• Prediction
• Cautions
Statistics: Unlocking the Power of Data
Lock5
Want More Stats???
• If you have enjoyed learning how to analyze data, and
want to learn more:
• STAT 460 (Intermediate Applied Statistics)
• STAT 461 (Analysis of Variance)
• STAT 462 (Applied Regression Analysis)
• All applied, only prerequisite is STAT 200 or 250
• If you like math and want to learn more of the
mathematical theory behind what we’ve learned:
• take STAT 414 (Probability)
• and then STAT 415 (Mathematical Statistics)
• Prerequisite: MATH 230 or MATH 231
Statistics: Unlocking the Power of Data
Lock5
MODELING
Statistics: Unlocking the Power of Data
Lock5
Crickets and Temperature
• Can you estimate the temperature on a
summer evening, just by listening to crickets
chirp?
• We will fit a model to predict temperature
based on cricket chirp rate
Statistics: Unlocking the Power of Data
Lock5
Crickets and Temperature
Response
Variable, y
Explanatory
Variable, x
Statistics: Unlocking the Power of Data
Lock5
Linear Model
A linear model predicts a response
variable, y, using a linear function of
explanatory variables
Simple linear regression predicts on
response variable, y, as a linear
function of one explanatory variable, x
Statistics: Unlocking the Power of Data
Lock5
Regression Line
Goal: Find a straight line that best fits the data in a
scatterplot
Statistics: Unlocking the Power of Data
Lock5
Equation of the Line
The estimated regression line is
𝑦 = 𝛽0 + 𝛽1 𝑥
Intercept
Slope
where x is the explanatory variable, and 𝑦 is the
predicted response variable.
• Slope: increase in predicted y for every unit
increase in x
• Intercept: predicted y value when x = 0
Statistics: Unlocking the Power of Data
Lock5
Regression in Minitab
Stat -> Regression -> Fitted Line Plot
Statistics: Unlocking the Power of Data
Lock5
Regression Model
𝑇𝑒𝑚𝑝 = 37.68 + 0.23𝐶ℎ𝑖𝑟𝑝𝑠
Which is a correct interpretation?
a) The average temperature is 37.68
b) For every extra 0.23 chirps per minute, the
predicted temperate increases by 1 degree
c) Predicted temperature increases by 0.23
degrees for each extra chirp per minute
d) For every extra 0.23 chirps per minute, the
predicted temperature increases by 37.68
Statistics: Unlocking the Power of Data
Lock5
Units
• It is helpful to think about units when
interpreting a regression equation
𝑦 = 𝛽0 + 𝛽1 𝑥
y units
y units
x units
y units
x units
𝑇𝑒𝑚𝑝 = 37.68 + 0.23𝐶ℎ𝑖𝑟𝑝𝑠
degrees
degrees
Statistics: Unlocking the Power of Data
degrees/
chirps per min
chirps per
minute
Lock5
Explanatory and Response
• Unlike correlation, for linear regression it
does matter which is the explanatory
variable and which is the response
𝑇𝑒𝑚𝑝 = 37.68 + 0.23𝐶ℎ𝑖𝑟𝑝𝑠
𝐶ℎ𝑖𝑟𝑝𝑠 = −157.8 + 4.25𝑇𝑒𝑚𝑝
Statistics: Unlocking the Power of Data
Lock5
Regression Line
Which plot goes
with the line
𝑦 = 𝑥 + 10?
(a)
(c)
Statistics: Unlocking the Power of Data
(b)
(d)
Lock5
Prediction
• The regression equation can be used to
predict y for a given value of x
𝑇𝑒𝑚𝑝 = 37.68 + 0.23𝐶ℎ𝑖𝑟𝑝𝑠
• If you listen and hear crickets chirping
about 140 times per minute, your best
guess at the outside temperature is
37.68  0.23 140  69.88
Statistics: Unlocking the Power of Data
Lock5
Prediction
37.68  0.23 140  69.88
Statistics: Unlocking the Power of Data
Lock5
Prediction
If the crickets are
chirping about 180
times per minute,
your best guess at
the temperature is
(a) 60
(b) 70
(c) 80
Statistics: Unlocking the Power of Data
Lock5
Prediction
𝑇𝑒𝑚𝑝 = 37.68 + 0.23𝐶ℎ𝑖𝑟𝑝𝑠
The intercept tells us that the predicted
temperature when the crickets are not
chirping at all is 37.68. Do you think this
is a good prediction?
(a) Yes
(b) No
Statistics: Unlocking the Power of Data
Lock5
Regression Line
 How do we find the best fitting line???
Statistics: Unlocking the Power of Data
Lock5
Predicted and Actual Values
• The actual response value, y, is the response
value observed for a particular data point
• The predicted response value, 𝑦, is the
response value that would be predicted for a
given x value, based on a model
• In linear regression, the predicted values fall
on the regression line directly above each x
value
• The best fitting line is that which makes the
predicted values closest to the actual values
Statistics: Unlocking the Power of Data
Lock5
Predicted and Actual Values
y
ŷ 
Statistics: Unlocking the Power of Data
Lock5
Residual
The residual for each data point is
actual – predicted = 𝑦 − 𝑦
 The residual is also the vertical distance from
each point to the line
Statistics: Unlocking the Power of Data
Lock5
Residual
residual {
y  yˆ  63.5  61.44  2.06
• Want to make all the residuals as small as possible.
• How would you measure this?
Statistics: Unlocking the Power of Data
Lock5
Least Squares Regression
Least squares regression chooses
the regression line that minimizes
the sum of squared residuals
n
minimize
( y  yˆ )
i
2
i
i 1
*Note: you will never have to find the equation by hand, this is
just letting you know what technology is doing…
Statistics: Unlocking the Power of Data
Lock5
Least Squares Regression
Statistics: Unlocking the Power of Data
Lock5
Regression Caution 1
• Do not use the regression equation or line
to predict outside the range of x values
available in your data (do not extrapolate!)
• If none of the x values are anywhere near
0, then the intercept is meaningless!
Statistics: Unlocking the Power of Data
Lock5
Alkalinity and Mercury
Does the
relationship
look linear?
(a) Yes
(b) No
Statistics: Unlocking the Power of Data
Lock5
Regression Caution 2
• Computers will calculate a regression line for
any two quantitative variables, even if they are
not associated or if the association is not linear
• ALWAYS PLOT YOUR DATA!
• The regression line/equation should only be
used if the association is approximately linear
Statistics: Unlocking the Power of Data
Lock5
Regression Caution 3
• Outliers (especially outliers in both
variables) can be very influential on the
regression line
• ALWAYS PLOT YOUR DATA!
http://illuminations.nctm.org/LessonDetai
l.aspx?ID=L455
Statistics: Unlocking the Power of Data
Lock5
Life Expectancy and Birth Rate
Coefficients:
(Intercept) LifeExpectancy
83.4090
-0.8895
Statistics: Unlocking the Power of Data
Which of the following
interpretations is correct?
(a) A decrease of 0.89 in the
birth rate corresponds to a
1 year increase in predicted
life expectancy
(b) Increasing life expectancy
by 1 year will cause birth
rate to decrease by 0.89
(c) Both
(d) Neither
Lock5
Regression Caution 4
• Higher values of x may lead to higher (or lower)
predicted values of y, but this does NOT mean
that changing x will cause y to increase or
decrease
• Causation can only be determined if the values
of the explanatory variable were determined
randomly (which is rarely the case for a
continuous explanatory variable)
Statistics: Unlocking the Power of Data
Lock5
Exam 3
 Exam 3 next Friday, 4/24
 Will primarily be on Chapters 5, 6, 7, and 9.1,
but could also have questions from Units A or
B (Chapters 1, 2, 3, 4)
 3 double-sided pages of notes
 Bring a non-cell phone calculator
 Should know z* for 90%, 95%, 99%, but do
not need to know t* values
Statistics: Unlocking the Power of Data
Lock5
Practice Problems
 Units C and D review reading posted online
 We covered all of Unit C
 We skipped Chapter 8, 9.2, and 9.3 and haven’t
covered Chapter 10 yet, so there will be some
unfamiliar topics in the Unit D review
 Good sources of practice problems:
 Unit C Exercises and Review Exercises: All odd
problems
 Unit D Exercises and Review Exercises: 1, 3, 7, 11,
13, 21, 23, 27, 29, 31, 33, 35, 45, 47
 Chapters 5, 6, 7, and 9.1: All odd problems
Statistics: Unlocking the Power of Data
Lock5
To Do
 Read Section 2.6
 Do HW 2.6 (due Friday, 4/24)
Statistics: Unlocking the Power of Data
Lock5
Download