LESSON 4 – 2
( DAY 1 )
Cautions About Correlation and Regression
What is causation and how
can it be determined?
•To examine data for causation, and to examine data.
•To be careful when using models for extrapolation.
•To understand that causation may be hard to
determine do to effects of lurking variables, common
response, and confounding variables.
Correlation and Regression
• Correlation and regression describe only
linear relationships.
• The correlation r and the least-squares
regression line are not resistant. One
influential or incorrectly entered data point
can greatly change these measures.
• Always plot data before interpreting
regression or correlation.
Extrapolation is the use of a regression
line for prediction far outside the domain
of values of the explanatory variable x that
you used to obtain the line or curve. Such
predictions are often not accurate.
Page 226
Example 4.10
Lurking Variables
A lurking variable is a variable that is not
among the explanatory or response
variables in study and yet may influence
the interpretation of relationships among
those variables.
Lurking Variables
• The relationship between two variables can
be strongly influenced by lurking variables.
• Many lurking variables change
systematically over time.
• One method of detecting if time has an
influence is to plot residuals and response
variables over time if available.
Using averaged data
• Many regression or correlation studies work with
averages or other measures that combine
information from many individuals. Note this
carefully and resist the temptation to apply the
results of such studies to individuals.
• Correlations based on averages are usually too
high when applied to individuals. This is another
reminder that it is important to note exactly what
variables were measured in a statistical study.
The question of causation
In many studies of the relationship between two
variables, the goal is to establish that changes in
the explanatory variable cause changes in the
response variable.
Even when a strong association is present, the
conclusion that this association is due to a causal
linking in the variables is often elusive.
Explaining association:
Variable x and y show a strong association
(dashed line). This association may be the
result of any of several causal relationships
( solid arrow).
Causal associations
• Causation: changes in x cause changes in y.
• Common response: Changes in both x and y
are caused by changes in a lurking variable z.
• Confounding: The effect ( if any ) of x and y is
confounded with the effect of a lurking variable.
• Even when direct causation is present. It is
exactly a complete explanation of an association
between two variables.
Explaining association:
Common Response
• Beware of lurking variables when thinking
about an association between two variables.
• The observed association between the
variables x and y is explained by a lurking
variable z. Both x and y change to changes in
z. This common response creates an
association even though there may be no
direct causal link between x and y.
• Two variables are confounded when
their effects on a response variable
cannot be distinguished from each
other. The confounded variables may
be either explanatory variables or
lurking variables.
• Even a very strong association
between two variables is not by itself
good evidence that there is a causeand-effect link between the variables.
What are the criteria for
establishing causation when
we can’t dfo an experiment?
• The association is strong.
• The association is consistent.
• Higher doses are associated with stronger
• The alleged cause precedes the effect in
• The alleged cause is plausible.