Name…..……………………… Lab section …… Practice Exam 1, STAT 2331 Please do all questions. 1. Which of the following statements is ALWAYS TRUE? a. The sample median is more sensitive to extreme values than the mean. b. The sample standard deviation will be larger than the mean in a right-skewed distribution. c. The sample standard deviation is a measure of variation around the sample mean. d. The IQR is a measure of variation of the lower half of the data. e. If a distribution is perfectly symmetric, then the mean will be equal to the standard deviation. 2. The stem and leaf plot below summarizes the final year averages of a graduating honors class in the Dedman College. Select the correct statement. 6 88 7 24567 8 2346 9 014 10 a. b. c. d. e. The distribution is bimodal The mean is 95 The IQR is 50 The maximum score is 94. The median score is 68. 3. Suppose that two variables X and Y are known to have a correlation of 0.98. Which of the following statements do we know must always be true? a. X is normally distributed b. There is a 95% chance that Y values will be found within 2 standard deviations of their mean c. A regression of Y on X will produce a line with a positive slope d. The X variable will have a larger standard deviation that the Y variable e. 98% of Y’s variation is explained by Y’s linear relationship with X. 4. Which of the following are true statements about the correlation coefficient r? (i) (ii) (iii) (iv) The correlation r is always between -1 and 1 . Correlation r measures how well a straight line models your data. Correlation r varies with your units of measurement. The square of the correlation is the coefficient of determination. (a) (i) and (ii) only. (b) (i) and (iii) only. (c) (i) and (iv) only. (d) (i), (ii) and (iv). (e) (i) only. The following two multiple-choice questions refer to cats. The weights of the cats in Questions 5 and 6 are normally distributed with a mean of 9.5 pounds and a standard deviation of 1.5 pounds. 5. How much do the heaviest 2.5% of these cats weigh? a. b. c. d. e. 9.5 pounds or more 12.5 pounds or more 6.5 pounds or less 11 pounds or less 11 pounds or more 6. What percentage of these cats weigh less than 8 pounds? a. b. c. d. e. 32% 20% 16% 5% 50% 7. A 2331 statistics student has a grade of 86% going into the final exam. The final exam is worth 20% of the overall grade. What percentage score does the student need on the final exam to ensure she gets at least 80% overall in the class? a. b. c. d. e. 80% 90% 86% 17% 56% The following 4 multiple choice problems relate to the information below on frogs. We are interested in predicting how far a frog can jump (measured in cm), based on it’s leg length (also in cm). We collect data on 50 frogs, allowing each frog one leap. We find the correlation, r is 0.7, the mean of the leg length is 16cm, with a standard deviation of 2cm, and the mean jumping distance is 20cm, with a standard deviation of 5cm. 8. The equation for the least squares regression line of jumping distance on leg length is a. Jumping distance = 10.4+ 0.28 (leg length) b. Leg length = -8 + 1.75 (Jumping distance) c. Jumping distance = -8+ 1.75 (Leg length) d. Jumping distance = -8 + 0.28 (Leg length) e. Leg length = 10.4 + 0.28 (Jumping distance) 9. A frog with legs 20cm long would be predicted to jump a. 31.4cm b. 25.8cm c. 29.3cm d. 15.7cm e. 27cm 10. What proportion of variation in jumping distances is due to the linear relation to leg length? a. 49% b. 90% c. 0.9% d. 0.49% e. 81% 11. The predicted distance jumped by a frog of leg length 4cm is -1cm. The best explanation of this is a. Frogs this small can only jump backwards b. The researchers made a mistake in measuring distances c. The straight line relationship should not be extrapolated for such small frogs d. Frogs with legs this small are tasty. e. We should have done a regression of leg length on distance jumped. 12. Suppose we have recorded the weights and heights of 50 people. We decide to regress weight (Y) on height (X) because we wish to predict Y from X. We find that the coefficient of determination is 25%. Suppose we change our mind, and now want to regress height on weight. What will be the coefficient of determination for this new regression? a. 75% b. -25% c. 25% d. 50% e. It is not possible to calculate it given this information. The following questions concern a study on two airlines: “Alaskan Skies” and “America Best”. There are two departure locations considered, Seattle and Phoenix, and for each airline and location flights are categorized by either being “on time” or “delayed”. The following tables categorize flights by these 3 variables. Seattle Alaskan Skies (AS) America Best (AB) On time 3213 335 Phoenix Alaskan Skies (AS) America Best (AB) On time 400 2345 Delayed 976 228 Delayed 47 468 Total 4189 563 Total 447 2813 13) The on-time rates for the airlines AS and AB for flights from Seattle are respectively; a) b) c) d) e) 23.3% and 40% 76.7% and 60% 23.3% and 60% 32% and 40% 93% and 35% 14) The on-time rates for airlines AS and AB for flights from Phoenix are respectively; a) b) c) d) e) 93% and 87% 83% and 98% 73% and 56% 11% and 17% 89% and 83% 15) What percentage of Seattle passengers use AS? a) 50% b) 88% c) 76% d) 44% e) 24% 16) What percentage of Phoenix passengers use AB? a) 24% b) 77% c) 23% d) 44% e) 86% Now suppose we collapse the tables over airport and form a new table as follows Alaskan Skies America Best On time 3613 2680 delayed 1023 696 total 4636 3376 17) The on time rates for airlines AS and AB for all passengers are respectively; a) 50% and 60% b) 44% and 66% c) 90% and 50% d) 78% and 79% e) 22% and 17% 18) A correct statement about the variables is a) The collapsing variable is airport location. b) The collapsing variable is on time/delayed c) The collapsing variable is airline (AS, AB). d) There is no Simpson’s paradox here. e) The response variable is airport location. 19) The correct conclusion to draw here is a) Airport location is not a factor in whether or not a flight is on time. b) Yes, airline AB was slightly more on time overall, but it is because it mostly had passengers leaving from Phoenix, and airline AS mostly had passengers leaving from Seattle. c) Yes, airline AS was slightly more on time overall, but it is because it mostly had passengers leaving from Phoenix, and airline AB mostly had passengers leaving from Seattle.. d) We should fly airline AB, as it had a higher on time rate overall. e) Airport location is not related to airline. Each of the next several statements has just two options, true and false. Circle the one you believe to be correct in each case. Here we say a statement is true if it is ALWAYS true, otherwise we designate it false. 20) T F The mode is another name for the median. 21) T F The standard deviation is an outlier-resistant measure of variation. 22) T F If the data are normally distributed, 68% of the data are within one standard deviation of the mean. 23) T F If two variables are positively correlated, the regression line must have a positive slope. 24) T F The correlation coefficient (r) does not depend on the units of measurement. 25) T F Simpson’s paradox concerns 3 categorical variables.