AP Statistics Unit 3: Examining Relationships 3.1A Scatterplots & Correlation Name_________________________ Date____________ Hour____________ 1. What is the difference between a response variable and an explanatory variable? The explanatory variable is your x variable. It is the variable that you are using to predict you y or response variable. 2. How are response and explanatory variables related to dependent and independent variable? Your prediction for the response variable depends on what your independent or explanatory variable is. 3. When is it okay to use a scatterplot to display data? When you have two groups of quantitative data. 4. You can describe the overall pattern of a scatterplot by the …. Strength, Direction, and Form 5. Which variable always appears on the horizontal axis of a scatterplot? Explanatory variable 6. Explain the difference between a positive association and a negative association. A positive association has a slope that is positive, or as the explanatory variable increases, so does the response variable. 7. How can quantitative data which belongs to different categories be differentiated on a scatterplot? Use different colors for the points. 8. What does correlation measure? Correlation measures how well the data can be modeled by a linear relationship. 9. Explain why two variables must both be quantitative in order to find the correlation between them. Because the formula requires a standard deviation and mean for both variables, and categorical data does not have a measure of spread and center. 10. What is true about the relationship between two variables if the relationship is: a) Near 0? No linear relationship b) Near 1? The relationship is positive and could almost be perfectly be modeled by a line. c) Near -1? Negative d) Exactly 1? All the points would fall exactly on a line with positive slope e) Exactly -1? All the points would fall exactly on a line with negative slope 11. Is correlation resistant to extreme observations? No. 2 EXAMINING RELATIONSHIPS 12. What does it mean if two variables have high correlation? It means that one the explanatory variable does a very good job of predicting what the response variable would be. The relationship follows a linear model very well and we can use a linear equation to make predictions about what the response variable could be based on a given explanatory. 13. What does it mean if two variables have weak correlation? A linear model does not do a very good job of explain the relationship between the two variables. 14. What does it mean if two variables have no correlation? A linear model would be useless for describing the relationship. 15. Are hot dogs that are high in calories also high in salt? The scatterplot below shows calories and salt content (measured as milligrams of sodium) in 17 brands of meat hot dogs. a) Roughly what are the highest and lowest calorie counts among these brands? Roughly what is the sodium level in the brands with the fewest and with the most calories? b) Does the scatterplot show a clear positive or negative association? Say in words what this association means about calories and salt in hot dogs. c) Are there any outliers? Is the relationship (ignoring any outliers) roughly linear in form? Still ignoring any outliers, how strong would you say the relationship between calories and sodium is? (a) The lowest calorie count is about 107 calories and the sodium level for this brand is about 145 mg. The highest calorie count is about 195 calories, and the sodium level for this brand is about 510 mg. (b) The scatterplot shows positive association; high-calorie hot dogs tend to be high in salt, and low-calorie hot dogs tend to have low sodium. (c) The lower left point is an outlier. Ignoring this point, the relationship is linear and moderately strong. AP Statistics Unit 3: Examining Relationships 3.1A Scatterplots & Correlation 16. The figure below is a scatterplot of school grade point average versus IQ score for 78 seventh-grade students. a) Is the correlation r for these data near -1, clearly negative but not near -1, near 0, clearly positive but not near 1, or near 1? Explain your answer. b) Refer to the figure in the last problem. Is the correlation in the scatterplot below closer to 1 than that for the scatterplot in the last problem, or closer to 0? Explain your answer. c) Both scatterplots contain outliers. Removing the outliers will increase the correlation r in one figure and decrease r in the other figure and decrease r in the other figure. What happens in each figure and why? (a) The correlation r is clearly positive but not near 1. The scatterplot shows that students with high IQs tend to have high grade point averages, but there is more variation in the grade point averages for students with moderate IQs. (b) The correlation r for the data in Figure 3.8 would be closer to one. The overall positive relationship between calories and sodium is stronger than the positive association between IQs and GPAs. (c) The outliers with moderate IQ scores in Figure 3.4 weaken the positive relationship between IQ and GPA, so removing them would increase r. The outlier in the lower left corner of Figure 3.8 strengthens the positive, linear relationship between calories and sodium, so removing this outlier would decrease r. 4 EXAMINING RELATIONSHIPS 17. The gas mileage of an automobile first increases and then decreases as the speed increases. Suppose that this relationship is very regular, as shown by the following data on speed (miles per hour) and mileage (miles per gallon): Speed 20 30 40 50 60 Mileage 24 28 30 28 24 a) Make a scatterplot of mileage versus speed. b) Show that the correlation between speed and mileage is r = 0. (use formula: 𝑟 = 1 𝑥 −𝑥̅ 𝑦 −𝑦̅ ∑ 𝑖 ∗ 𝑖 )Explain why the correlation 𝑛−1 𝑠 𝑠 𝑥 𝑦 is 0 even though there is a strong relationship between speed and mileage. (b) The speeds have a mean of 40 and a standard deviation of 15.81. The mileages have a mean of 26.8 mpg and a standard deviation of 2.68 mpg The table below shows the standardized values (labeled zspeed and zmpg) obtained by subtracting the mean and dividing by the standard deviation. The column labeled “product” contains the product (zspeed×zmpg) of the standardized measurements. The sum of the products is 0.0, so the correlation coefficient is also 0.0. The correlation coefficient r measures the strength of linear association between two quantitative variables; this plot shows a nonlinear relationship between speed and mileage.