Review • association: an association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable. • response variable & explanatory variable • positive association & negative association • (categorical variable, categorical variable) — contigency table • (quantitative variable, quantitative variable) — scatterplot • correlation the correlation, r , summarizes the direction of the association between two quantitative variables and the strength of its straight-line trend. Denoted by r , it takes values between −1 and +1. y − ȳ 1 X 1 X x − x̄ zx zy = r= n−1 n−1 sx sy where n is the number of points, x̄ and ȳ are means, and sx and sy are standard deviations for x and y . The sum is taken over all n observations. • quadrant: a quadrant is any of the four regions into which a plane is divided by a horizontal line and a vertical line. Properties of the Correlation: • The correlation r always falls between −1 and +1. • The closer the absolute value of r to 1, the stronger the linear (straight-line) association, as the data points fall nearer to a straight line. • A positive correlation indicates a positive association, and a negative correlation indicates a negative association. • The value of the correlation does not depend on the variables’ units. • Two variables have the same correlation no matter which is treated as the response variable. • The correlation is designed for linear associations (straight-line relationships). • Regression Line: an equation for predicting the response outcome. The regression line predicts the value for the response variable y as a straight-line function of the value x of the explanatory variable. Let ŷ denote the predicted value of y . The equation for the regression line has the form ŷ = a + bx In the above formula, a denotes the y-intercepr and b denotes the slope. • The y-intercept is the predicted value of y when x = 0. • The slope b is the amount that ŷ changes when x changes by one unit.