Chapter 3 Review: Examining Relationships

advertisement
Chapter 3 Review:
Examining Relationships
Sam R. Kenan F. Rohil T. Daisy S.







The relationship between two variables can be strongly influenced
by other variables that are lurking in the background.
Response variables can help explain or even cause changes in
explanatory variables.
However, response variables don’t necessarily case changes in
explanatory variables.
A scatterplot is the most effective way to display the relationship
between two quantitative variables.
Our eyes are not good judges of how strong a linear relationship
is.
Correlation requires that both variables be quantitative, and it
does not describe curved relationships between variables.
Correlation is not resistant, and it is not a complete summary of
two-variable data
The Big Idea 3.1




Regression requires that we have an
explanatory variable and a response variable.
Extrapolation can be used to predict outside
the range of values of the explanatory
variable.
Residual plots make it easier to study the
residuals.
The coefficient of determination tells us how
well the least-squares line does at predicting
values of the response variable.
The Big Idea 3.2






Correlation and regression describe only
linear relationships.
Extrapolation often produces unreliable
predictions.
Correlation is not resistant.
Lurking variables can make a correlation or
regression misleading.
Association does not imply causation.
Correlations based on averages are usually
too high wen applied to individuals.
The Big Idea 3.3
A response variable measures an outcome of a study.
An explanatory variable helps explain or influences changes in a response
variable.
 Calling one variable explanatory and the other response doesn’t necessarily
mean that changes in one cause changes in the other!
 A scatterplot shows the relationship between two quantitative variables
measured on the same individuals.
 Direction: positive or negative association
 Form: Linear relationships, curved relationships,
and clusters
 Strength: Determined by how close the points in the scatterplot lie to a simple
form such as a line
 Correlation (r) measures the strength and direction of the linear association
between two quantitative variables x and y.
 r>0 for a positive association and r<0 for a negative association
 Correlation is always between -1 and 1. It is strongest when closest to 1 or -1.
 Correlation is not resistant, so outliers can greatly change the value of r.


Vocabulary 3.1







A regression line is a straight line that describes how a response variable y
changes as an explanatory variable x changes.
The slope b of a regression line ŷ=a+bx is the rate at which the predicted
response ŷ chnages along the line as the explanatory variable x changes. b is
thechange in ŷ hen x increases by 1.
The y intercept “a” of a regression line is the predicted response ŷ when the
explanatory variable x=0.
Extrapolation is the use of a regression line for prediction of values of the
explanatory variable outside the range of the data from which the line was
calculated.
The least-squares regression line is the line that minimizes the sum of the
squares of the vertical distances of the observed points from the line.
Residuals are the differences between observed and predicted values of y.
The coefficient of determination (r^2) is the fraction of the variance of one
variable that is explained by the least-squares regressions on the other
variable.
Vocabulary 3.2

Outliers-

Influential observations-

Lurking variables-
◦ An observation that lies outside the overall pattern of the
other observations.
◦ Points that are outliers in the y direction of a scatterplot
have large regression residuals, but other outliers need not
have large residuals.
◦ An observation is influential for a statistical calculation if
removing it would markedly change the result of the
calculation.
◦ Points that are outliers in the x direction of a scatterplot are
often influential for the least-squares regression line.
◦ A variable that is not among the explanatory or response
variables in a study and yet may influence the interpretation
of relationships among those variables.
Vocab 3.3

When exploring a bivariate relationship:
◦ Make and remember to interpret a scatterplot:
 Strength, Direction, Form
◦ Define x and y:
 Describe each Mean and Standard Deviation in
Context
◦ Find the Least Squares Regression Line.
 Write in context.
◦ Construct and Interpret a Residual Plot.
◦ Interpret r and r2 in context.
◦ Use the LSRL to make predictions...
Key topics








Correlation always satisfies -1 ≤ r ≤ 1
If r is equal to +- 1 then all points lie on the
line
The least squares regression line is :
ŷ=a+bx
b=r∙Sy/Sx
r2= (SST-SSE)/SST
SST=∑ (y-ŷ)2
SSE=∑ (y-ȳ)2
r=∑ (y-ŷ)
Formula Cheat sheet
LinRegBx(Xlist,Ylist,frequency) for data
table
 ShowLinear() for graphs
 2VarStat(Xlist,Ylist)

Calculator keystrokes

Explain why you should not use the LSRL
calculated earlier to make such a
prediction.
Question 1
NEA
-94
-57
-29
135
143
151
245
355
392
Fat
Gain
4.2
3.
3.7
2.7
3.2
3.6
2.4
1.3
3.8
NEA
473
489
535
571
580
620
690
Fat
Gain
1.7
1.6
2.2
1
.4
2.3
1.1
•Use the data from example 3.9 and your calculator to obtain the
equation of the LSRL that would be appropriate for predicting NEA
from fat gain
Question 2


Suppose the new subject’s fat gain is 3.0kg
One of the original 16subjects had a fat gain
of 3.0kg and that subjects NEA change was 57 calories. Explain why you should not just
predict an NEA change of -57 calories for this
new subject. What NEA change should you
predict for this individual?
Interpret the value of r2 you obtained in part
(b) How does this compare to the r2 we
obtained earlier for the line y=3.505-00344x
explain why this makes sense.
Questions 3 and 4

5): The Sanchez household is about to install solar panels
to reduce the cost of heating their house. In order to know
how much the panels help, they record their consumption
of natural gas before the panels are installed. Gas
consumption is higher in cold weather, so the relationship
between outside temp and gas consumption is important.

A)Describe the direction, form, and strength of the
relationship.
B)About how much gas does the regression line predict
that the family will use in a month that averages 20
degree-days per day?

Question 5

Consider the following historical data:

6) How strong is the relationship and is it
positive or negative?
Question 6
1.
2.
3.
4.
5.
6.
We can not use the provided model because we will be
extrapolating data outside of the range that the LSRL
covers.
Solve it by hand and you get the same LSRL. y=3.50500344x
Residuals are large so we can not assume that one
individual observation accounts for the whole association.
The correct approximation should be around 146.91
Still get around .606 just more accurate proving the
LSRL. The r2 means that 60.62% of the linear
relationship between the NEA and fat gained is explained
by the LSRL.
A)Positive, linear, and very strong
B) 500 cubic feet per day
There is a perfect linear relationship between x and y.
this is true because the r2 is equal to 1.00
1
Answers
2
3&4
5
6






Correlation makes no distinction between
explanatory and response variables (It makes no
difference which variable is x or y)
The correlation, r, does not change when units of
measurement are changed
Correlation never describes curved relationships
between variables, no matter the strength
Correlation is not resistant and it strongly
affected by outlying observations
The size of the regression slope does not tell how
important a relationship is
Extrapolation often produces unreliable results
Helpful Hints
Download