Chapter 4 – Scatterplots and Correlation A response variable (also called dependent variable) measures an outcome of a study. An explanatory variable (also called independent variable) explains or influences changes in a response variable. A scatter plot reveals relationships or association between two quantitative variables. Such relationships manifest themselves by any non-random structure in the plot. Various common types of patterns are demonstrated in the examples. Scatter plot: A scatter plot is a plot of the values of Y versus the corresponding values of X: • Vertical axis: variable Y - usually the response variable • Horizontal axis: variable X - usually some variable we suspect may be related to the response, i.e. explanatory variable Scatter plots can provide answers to the following questions: 1. 2. 3. 4. 5. Are variables X and Y related? Are variables X and Y linearly (+ve/-ve) related? Are variables X and Y non-linearly related? Does the variation in Y change depending on X? Are there outliers? 1 Some examples: No relationship Strong Linear Relationship (negative Quadratic Relationship Strong Linear Relationship (positive correlation) correlation) Exact Linear Relationship (positive correlation) Sinusoidal Relationship (damped) 2 Measuring Linear Association: Correlation The purpose of study “correlation” is to measure the strength of relationship: A quantity r, which measures strength of linear relationship (-1 ≤ r ≤ 1) Draw pictures of scatter plots along with numerical values of r. Formula: S xy r= S xx S yy where n n n S xx = ∑ ( X i − X ) , S yy = ∑ (Yi − Y ) , S xy = ∑ ( X i − X )(Yi − Y ) 2 i =1 2 i =1 i =1 Example: Calculate r for simple data set X -1 0 0 1 Y 1 4 9 14 3 Facts about correlation: 1. Requires both variables be quantitative. 2. Doesn’t depend on units of measurement. 3. Doesn’t matter which variable is X, and which is Y. 4. -1 ≤ r ≤ 1, r = ±1 only for straight lines. 5. Measures the strength of only a linear relationship between two variables. 6. Like the mean and s.d., the correlation is not a resistant measure (affected by a few outliers). 7. Correlation ≠ Causation • Correlation can be produced by chance (NFL wins the super bowl and the stock market go up.) • There is a relationship, but what is cause and what is effect? (anxiety and bad grades) • There is no real relationship between the two, but there is a correlation. (eating sushi and speaking Japanese well; foot size and reading skill) Q: The price of rice in India has a strong correlation to teachers' salaries in Texas. Does this mean that one is causing the other? 4