MTH 109, Statistics, Fall 1999 Second Test: Correlation & Regression, Chapters 8 - 12 You must show your work to receive full credit (for those problems requiring computation). All numbered problems in part I are worth 9 points each. All numbered problems in part II are worth 5 points each, unless noted otherwise. I. The summary statistics comparing male and female recruits of big ten colleges are shown below. Assume the scatter diagram is football shaped. Average # of male recruits = 73, SD = 14 Average # of female recruits = 30, SD = 12 with r = 0.7 A. The slope of the SD line (where the number of male recruits is shown on the horizontal axis; the females on the vertical axis) would be: _____________; the slope of the regression line used to predict the number of female recruits, given the number of male recruits, would be: _______________; the slope of the regression line used to predict the number of male recruits, given the number of female recruits, would be: _______________; B. Find the equation of the regression line to predict the number of female recruits, given the number of male recruits. C. What would you predict for the number of female recruits, if you knew nothing about the number of male recruits? How much is your prediction likely to be off? D. What would you predict for the number of female recruits, if you knew that the number of male recruits was 53? How much is your prediction likely to be off? E. If it was known that the number of male recruits for a certain college was in the 40th percentile, what would probably be the percentile range of the number of female recruits? F. Of the colleges having 80 male recruits, what percentage of them had 40 or more female recruits? II. Given below is the scatter diagram of 106 major league infielders, comparing their number of times at bat to their number of hits. Use the diagram to answer the following questions. The questions are intended to be short answer with no computations needed. A. Do the two variables have a positive or negative association? How do you know? B. Do they have a strong or weak association? How do you know? C. Is the graph a homoscedastic or heteroscedastic scatter diagram? How do you know? D. Which of the following values would you say is closest to the correlation coefficient? -0.95 -0.6 -0.2 0 0.2 0.6 0.95 E. Which of the following values would you say is closest to the average of the at bats? 1,000 3,500 7,000 14,000 F. Would you say that the SD of the at bats is around: 1,000 2,500 4,000 8,000 G. Having the correlation that it does, it is plain that times at bat causes hits. Suppose that Cal Ripkin (data point (3210, 927) on this graph) was in a batting slump. The manager just needs to send him to bat more often to cause his hits to go back up. Do you agree? Why or why not? H. Should r be used as a summary statistic? Why or why not? (Worth 9 points)