Sec. 4.2 PowerPoint - Palisades School District

advertisement
CHAPTER 4: MORE ON TWO
VARIABLE DATA
Sec. 4.2 – Cautions about Correlation and
Regression
CAUTIONS ABOUT CORRELATION AND
REGRESSION

Recall from chapter 3:

That correlation and regression describe only linear
relationships

That correlation and the LSRL are not resistant


One influential point or incorrectly entered data point can
completely change the data.
Always plot your data before interpreting regression
or correlation
EXTRAPOLATION



Extrapolation is the use of a regression line far outside
the domain of values of the explanatory variable x that you
used to obtain the line or curve.
 Such predictions are not accurate
Example
 Suppose that you have data on a child’s growth between
the years 3 and 8. You find a strong linear relationship
between age x and height y. If you fit a regression line
to these data and use it to predict the child’s height at
25 years old you would predict them to be 8 feet tall
Don’t stray far from the domain of x that actually
appears in your data
LURKING VARIABLES



Sometimes the relationship between two variables is influenced
by other variables that we did not measure or even think about
A lurking variable is a variable that is not among the
explanatory or response variables in study and yet may influence
the interpretation of relationships among those variables.
The relationship between two variables can be strongly influenced
by lurking variables.
 A lurking variable can falsely suggest a strong relationship
between x and y or it can hide a relationship that is really
there.
LURKING VARIABLES


Because lurking variables are often unrecognized
and unmeasured, detecting their effect is a
challenge
Many lurking variables change systematically
over time.

One method of detecting if time has an influence is to
plot residuals and response variables against the
time order if available.
LURKING VARIABLES
THE QUESTION OF CAUSATION


In many studies of the relationship between two
variables, the goal is to establish that changes in
the explanatory variable cause changes in the
response variable.
Even when a strong association is present, the
conclusion that this association is due to a causal
linking in the variables is often elusive.
EXPLAINING ASSOCIATION
Strong Associations can generally be explained by
one of three relationships.
1. Causation
2. Common Response
3. Confounding
Variable x and y show a strong association (dashed
line). This association may be the result of any of
several causal relationships (solid arrow).
EXPLAINING ASSOCIATION
Causation:
x causes y
Common Response:
x and y are reacting to
a lurking variable z
Confounding:
x may cause y, but y may instead be
caused by a confounding variable z
CAUSATION



Causation is not easily established.
The best evidence for causation comes from
experiments that change x while holding all other
factors fixed.
Even a very strong association between two
variables is not by itself good evidence that there is
a cause-and-effect link between the variables.
EXAMPLES OF DIRECT CAUSATION

The following relationships are examples of
direct causation, but “causation” is not a
simple idea.

Refer to p.233 for explanations
1. x = mother’s BMI
y = daughter’s BMI
2. x = amount of saccharin in a rat’s diet
y = count of tumors in the rat’s bladder
COMMON RESPONSE


Beware of lurking variables when thinking
about an association between two variables.
The observed association between the variables x
and y is explained by a lurking variable z. Both x
and y change to changes in z.
 This common response creates an association
even though there may be no direct causal link
between x and y.
EXAMPLES OF COMMON RESPONSE

The following relationships are examples of how
common response can create an association.

Refer to p.233 for explanations
3. x = a high school senior’s SAT score
y = the student’s first-year college GPA
4. x = monthly flow of money into stock mutual funds
y = monthly rate of return for the stock market
CONFOUNDING


Two variables are confounded when their effects
on a response variable cannot be distinguished
from each other. The confounded variables may be
either explanatory variables or lurking variables.
Confounding of several variables often prevents us
from drawing conclusions about causation.
EXAMPLES OF CONFOUNDING

The following relationships are examples of
confounding

Refer to p.234 for explanations
5. x = whether a person regularly attends religious services
y = how long the person lives
6. x = the number of years of education a worker has
y = the worker’s income

Homework: p.237-239 #’s 33-36, 38 & 41
Download