3. Correlation and Regression

advertisement
BivariateDataand
Sca.erPlots
BivariateData:Thevaluesoftwodifferentvariablesthatare
obtainedfromthesamepopula9onelement.
Whilethevariablesmaybeeithercategoricalorquan9ta9ve,we
willfocusoncaseswheretheyarebothquan9ta9ve.
Canwepredictvaluesofonevariablefromvaluesofthe
othervariable?
Dothevaluesofonevariablecausethevaluesoftheother
variable?
Sec9on3.1,Page59
1
ScaIerPlotExample
TI-83
Scatter Plots always have
and explanatory variable and
a response variable. The
choice is arbitrary. The
explanatory variable is
always plotted on the x-axis,
and the response variable is
always plotted on the y axis.
STAT – EDIT – ENTER; Enter x data in L1, and y in L2
2nd STAT PLOT – ENTER -1: Plot 1
Highlight ON
Type: Highlight first icon
XList: 2nd L1
YList 2nd L2
ZOOM 9: ZoomStat
TRACE; Use arrows to move to points and display values.
Sec9on3.1,Page60
2
LinearCorrela9on
Linear Correlation: A measure of the strength of a linear
relationship between two variables. The closer to a
straight line the dots are, the stronger the relationship.
If there correlation, then we say the two variables
are associated. Changes in the value of one
variable are associated with changes in the value
of the other variable.
Sec9on3.1,Page61
3
CoefficientofCorrela9on
MeasureofStrength
Zx Zy
(x − x )
(y − y )
r =∑
where Z x =
; Zy =
n −1
sx
sy
−1 ≤ r ≤1;
r = −1 perfect straight line negative slope
no relationship at all
r =0
r =1
perfect straight line with positive slope
r is not resistant to outliers
€
Also known as the Pearson Correlation
Coefficient.
Sec9on3.2,Page62
4
Problems
Problems,Page71
5
Correla9onCoefficient
TI-83Add-InProgram
Finding r.
STAT – EDIT – ENTER: Enter data in L1 and L2
PRGM-CORRELTN
2nd LI – ENTER – 2nd L2 – ENTER
SCATTER PLOT? – 1=YES; (Displays scatter plot)
ENTER; (Displays: r=.8394)
This is a moderately strong positive relationship.
Sec9on3.2,Page62
6
Associa9onandCausality
Elementary School Students
Reading Scores
8
Grade
Level
4
1
1
4
Shoe Size
8
Is this a reasonable association?
Does giving students bigger shoes cause reading
scores to improve?
What explains this association?
Lurking Variable: A variable that is not included
in the study but has an effect on the variables in
the study makes it appear those variables are
related.
Association alone can never establish causality!
Sec9on3.2,Page63
7
Problems
Problems,Page71
8
Problems
Problems,Page72
9
Problems
Problems,Page72
10
LinearRegression
Line of Best Fit
If a straight line model seems appropriate, the
best fit straight line is found by using the method
of least squares. Suppose that yˆ = a + bx is the
equation of a straight line, where yˆ (read “y-hat)
represents the predicted value of y that
corresponds to a particular value of x. The least
€ that we find the
squares criteria requires
2
constants, a and b such€that ∑ (y − yˆ ) is as small
as possible.
€
yˆ = a + bx
€
Sec9on3.3,Page65
11
LineofBestFit
The best line will be the one where the sum of the squares of the
“misses” is at a minimum. Calculus procedures are used to find the
coefficients, a and b such that the line ŷ = a + bx has the least squares.
b=r×
sy
sx
r is the correlation coefficient, sy is the standard
deviation of y-values and sx is the standard
deviation of the x values
€
Sec9on3.3,Page66
12
LinearRegression
TI-83Add-InProgram
a.  For the above data, make a scatter plot, and
comment on the suitability of the data for
regression analysis.
STAT – EDIT; Enter Height in L1, and Weight in L2.
PRGM – REGBASIC
X LIST=2ND L1; Y LIST=2ND L2
SCATTER PLOT: 1=YES
The pattern looks positive,
linear, and no outliers which
could cause problems.
Scatter Plot
Sec9on3.3,Page68
13
LinearRegression
TI-83Add-InProgram
b.  Find the regression equation and r.
ENTER; The program is paused to view graph, hitting
ENTER moves the program along.
The equation is:
yˆ =-186.4706 + 4.7059x
r, the coefficient of
correlation = .7979, a
relatively strong relationship.
c. Check the plot of the regression line versus the
scatter plot.
ENTER – 1=YES
Sec9on3.3,Page68
14
LinearRegression
TI-83Add-InProgram
d.  What is the value of the slope of the line, and what
does it mean?
b = 4.7059 is the slope of the line. It indicates the
number of units change in the y value for every one
unit increase in the x value. In this problem, for each
one inch increase in height, weight increases by
4.7059 lbs. Its units are lbs/inch.
e.  What is the value of the intercept of the line, and what
does it mean?
a = -186.4706 is the y intercept. It has no meaning in
this problem. It would be the weight of a person of
zero height.
f.  What is the value of r2 and what does it tell you?
It is called the index of determination. It measures the
strength of the model, 1 being perfect and 0 being
useless. It also equals the percentage of the variance
in the y-values explained by the model.
r2 = .6367 indicating a relative strong positive
correlation explaining 63.67% of the y variance.
Sec9on3.3,Page68
15
LinearRegression
TI-83Add-InProgram
g.
Check the residual plot and explain what it means
ENTER; 1 = YES
The horizontal line represents
the regression line. For each
actual value of x, the residual is
the actual y-value – predicted
y-value. The dots show the
“misses” or residuals.
If the residuals show some kind
of a pattern, it means that the
linear regression model is not
appropriate for the data, so
another model, i.e. quadratic,
may be better. Since there is
not pattern is this plot, the
linear model is appropriate for
this data.
Sec9on3.3,Page68
16
LinearRegression
TI-83Add-InProgram
h. Use the model to predict the weight of a woman who
is 65 inches tall.
PREDICTED Y: 1 = YES
X=65
Answer: 119.4 lbs
i. Use the model to predict the weight of a woman who
is 77 inches tall.
ENTER: 1 = YES
X=77
Answer 175.9 lbs.
Notice that the range of the
x values is from 61 to 69
inches. 77 inches is too far
above the actual values
used to develop the model.
While the result is
mathematically correct, the
result is not valid in the
context of the problem.
Sec9on3.3,Page68
17
Problems
Problems,Page72
18
Problems
a. 
b. 
c. 
d. 
e. 
Construct a scatter diagram.
Does the pattern appear linear?
Find the equation of best fit.
What is the value of r and what does it mean?
What is the slope? What are its units? Interpret
its meaning.
f.  What is the y-intercept value? What does it
mean?
g.  What does the residual plot show? What does it
mean?
h.  Estimate the the stride rate for a speed of 19.2
ft/sec. Is the estimate reliable? Why?
i.  Estimate the stride rate for a speed of 31 ft/sec.
Is the estimate reliable? Why?
Problems,Page73
19
Problems
A study was conducted to investigate the
relationship between the resale price, y (in
hundreds of dollars), and the age, x (in years), of
midsize luxury American automobiles. The
equation of the line of best fit was determined
below.
yˆ ($00) = 183.2 − 21.02x
(a) Find the resale value in hundreds of dollars
of such a car when it is 3 years old. $(00)
€
(b) Find the resale value in dollars of such a car
when it is 3 years old. $ (Hint: Multiply the
answer in (a) by 100)
(c) Find the resale value in dollars of such a car
when it is 6 years old. $
(d) What is the decrease in the resale price in
dollars of these cars each year? $
Problems,Page73
20
Problems
c.  What is the value of r and what does it mean?
d.  What is the slope? What are its units? Interpret
its meaning.
e.  What is the y-intercept value? What does it
mean?
f.  What does the residual plot show? What does it
mean?
g.  Estimate the # of intersections for a state with
450 miles. Is the estimate reliable? Why?
h.  Estimate the # of intersections for a state with
950 miles. Is the estimate reliable? Why?
Problems,Page73
21
Problems
a.  Construct a scatter diagram. What does it
indicate to you?
b.  Find the equation of best fit.
c.  What is the value of r and what does it mean?
d.  What is the slope? What are its units? Interpret
its meaning.
e.  What is the y-intercept value? What does it
mean?
f.  What does the residual plot show? What does it
mean?
g.  Estimate the price of an 8 year old car. Is the
estimate reliable? Why?
h.  Estimate price of a 22 year old car. Is the
estimate reliable? Why?
Problems,Page73
22
Download