Chapter 4 – Scatterplots and Correlation response variable a study.

advertisement
Chapter 4 – Scatterplots and Correlation
A response variable (also called dependent variable) measures an outcome of
a study.
An explanatory variable (also called independent variable) explains or
influences changes in a response variable.
A scatter plot reveals relationships or association between two quantitative
variables. Such relationships manifest themselves by any non-random
structure in the plot. Various common types of patterns are demonstrated in
the examples.
Scatter plot: A scatter plot is a plot of the values of Y versus the
corresponding values of X:
• Vertical axis: variable Y - usually the response variable
• Horizontal axis: variable X - usually some variable we suspect may be
related to the response, i.e. explanatory variable
Scatter plots can provide answers to the following questions:
1.
2.
3.
4.
5.
Are variables X and Y related?
Are variables X and Y linearly (+ve/-ve) related?
Are variables X and Y non-linearly related?
Does the variation in Y change depending on X?
Are there outliers?
1
Some examples:
No relationship
Strong Linear
Relationship
(negative
Quadratic Relationship
Strong Linear
Relationship
(positive
correlation)
correlation) Exact Linear
Relationship
(positive
correlation)
Sinusoidal Relationship (damped)
2
Measuring Linear Association: Correlation
The purpose of study “correlation” is to measure the strength of
relationship:
A quantity r, which measures strength of linear relationship (-1 ≤ r ≤ 1)
Draw pictures of scatter plots along with numerical values of r.
Formula:
S xy
r=
S xx S yy
where
n
n
n
S xx = ∑ ( X i − X ) , S yy = ∑ (Yi − Y ) , S xy = ∑ ( X i − X )(Yi − Y )
2
i =1
2
i =1
i =1
Example: Calculate r for simple data set
X
-1
0
0
1
Y
1
4
9
14
3
Facts about correlation:
1. Requires both variables be quantitative.
2. Doesn’t depend on units of measurement.
3. Doesn’t matter which variable is X, and which is Y.
4. -1 ≤ r ≤ 1, r = ±1 only for straight lines.
5. Measures the strength of only a linear relationship between two
variables.
6. Like the mean and s.d., the correlation is not a resistant measure
(affected by a few outliers).
7. Correlation ≠ Causation
• Correlation can be produced by chance (NFL wins the super bowl and the
stock market go up.)
• There is a relationship, but what is cause and what is effect? (anxiety
and bad grades)
• There is no real relationship between the two, but there is a correlation.
(eating sushi and speaking Japanese well; foot size and reading skill)
Q: The price of rice in India has a strong correlation to teachers' salaries
in Texas. Does this mean that one is causing the other?
4
Download