Review • z-score: the z-score for an observation is the number of standard deviations that it falls from the mean. x= observation − mean standard deviation • An observation in a bell-shaped distribution is regarded as a potential outlier if it falls more than three standard deviations from the mean. • association: an association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable. • response variable: the outcome variable on which comparisons are made. • explanatory variable: the groups to be compared with respect to values on the response variable. • contigency table: a display for two categorical variables. Its rows list the categories of one variable and its columns list the categories of the other variable. Each entry in the table (called cell) is the number of observations in the sample with certain outcomes on the two variables. • conditional proportions & marginal proportions • positive association & negative association Two quantitative variables (say x and y ) are said to have a positive association when high values of x tend to pair with high values of y , and low values of x tend occur with low values of the y . They are said to have a negative association when high values of one variable tend to pair with low values of the other variable, and low values of one pair with high values of the other. • correlation the correlation summarizes the direction of the association between two quantitative variables and the strength of its straight-line trend. Denoted by r , it takes values between −1 and +1. Interpretation of correlation: • a positive value for r indicates a positive association and a negative value for r indicates a negative association. • the closer r is to ±1, the closer the data points fall to a straight line, and the stronger is the linear association. The closer r is to 0, the weaker is the linear association. Calculating the Correlation r : To obtain r , we first calculate the z-score for x value and y value of each observation and then find a typical value (average) of the products of the z-scores. 1 X x − x̄ y − ȳ 1 X r= zx zy = n−1 n−1 sx sy where n is the number of points, x̄ and ȳ are means, and sx and sy are standard deviations for x and y . The sum is taken over all n observations.