MIDTERM REVIEW PODCAST OUTLINE AU20 1. The distribution of budget for the top-grossing movies of 2018 is summarized below. What describes the shape of this data set? a. Symmetric b. Skewed right c. Skewed left d. None of the above 2. Which measure of center for movie budget is going to be higher, based on the previous histogram? a. The mean will be higher b. The median will be higher c. The mean and median will be equal d. Not enough information to tell. For questions 3-6 use the following descriptive statistics from the 2018 movie data. (Note that the median for budget says 1e8 which is in exponential notation. This means 1 with 8 places after it, before the decimal point, or $100,000,000.) Column Mean Runtime 118.05882 Days Released 115.82353 THEATERS 4042.2069 BUDGET 1.1367647e8 OPENING WEEKEND 67120349 U.S. REVENUE 2.2249474e8 INT'L REVENUE 3.6916133e8 WORLD REVENUE 5.9165607e8 Critics Ratings 69.294118 Audience Ratings 67.088235 Std. dev. Median 17.227429 118 43.793227 110.5 310.24591 4118 74325541 1e8 54785996 47802879 1.569501e8 1.7324568e8 2.8464669e8 3.0070489e8 4.1461165e8 4.5143926e8 22.824136 71 17.180623 71 3. The distribution of the data for opening weekend revenue (in $) is shaped how? a. Skewed right b. Skewed left c. Symmetric d. Not enough information to tell 4. The correlation between opening weekend and U.S. revenue is the inverse of the correlation between U.S. revenue and opening weekend. a. True b. False 5. Looking (only) at the descriptive statistics above you can tell that critics ratings and audience ratings are at least moderately correlated. a. True b. False 6. The above table shows you that the standard deviation of any data set can never be larger than the mean of that same data set. a. True b. False 7. True or False: The best fitting line always has an SSE of 0. a. True b. False 8. Bob runs an experiment to see which brand of paper towel is more absorbent: Brand A or Brand B. He takes a random sample of 10 sheets from each brand of paper towel and puts each sheet in a cup of water and measures how much water was absorbed by the sheet by squeezing it tightly for 10 seconds and weighing the water that comes out. What is the response variable? a. Weight of the water squeezed out b. Which brand is more absorbent in the end c. Brand type (A or B) d. None of the above 9. Undercoverage occurs when a certain group from the population is sampled but does not respond. a. True b. False 10. Confidentiality is ________________ than anonymity. a. Weaker b. Stronger c. No different 11. The results of a well-designed experiment are ____________ than the results of a welldesigned observational study (assuming it is ethical to do an experiment.) a. Stronger b. Weaker c. No different 12. What are the units of the residuals? a. Same as the original units of X b. Same as the units of Y c. No units d. None of the above 13. Suppose the best fitting line for a data set is y = x+2. The 3 points in the data set are (0, 2); (1, 2); and (2, 5). What is the value of SSE in this case? 14. If the correlation is zero, what is the equation of the best-fitting regression line through the data? 15. Bob wants to survey OSU students regarding their opinions on textbook costs. What would response bias mean here? 16. Bob wants to survey OSU students regarding their opinions on textbook costs. Give a clear example of a self-selected (volunteer) sample in this case. Use the following edited StatCrunch output: Simple linear regression results: Sample size: 34 R-sq = 0.89443651 Parameter estimates: Parameter Coeff Std. Err. AlternativeDF T-Stat P-value Constant 40641523 14171899 ≠ 0 322.8677541 0.0073 X -2.70936040.16454091 ≠ 0 3216.466181<0.0001 17. What is the equation of the best-fitting regression line? Assume the variables are just named X and Y. 18. If P(A)= .3, P(B) = .2, and P(B|A) = .1, what is P(A OR B)? a. .44 b. .47 c. .48 d. None of the above 19. If A and B are disjoint, then A and B are independent. a. True b. False 20. In the following two-way table are opinion and gender independent? YES NO FEMALE 15 30 MALE 30 60 a. True b. False 21. In the following two-way table what notation stands for the probability that a female selected at random said yes? a. P(F|Y) b. P(Y|F) c. P(Y and F) YES NO FEMALE 15 30 MALE 30 60 22. What is the name of the distribution shown by the pie chart below? MALES (n = 1 00) Category favor oppose 28.6% 71.4% a. b. c. d. The marginal distribution of opinion The marginal distribution of opinion for the males The conditional distribution of males given opinion The conditional distribution of opinion given males 23. Bob guesses on every question of a 5 question multiple choice test where there are 4 choices for each answer. What’s the chance Bob gets at least 1 problem correct? 24. 40% of the workers at a certain factory work on the first floor, and the rest work on the 2 nd floor. 80% of the workers on the first floor come to work on time and 75% of the workers on the 2 nd floor come to work on time. You randomly select a worker that was on time. Which floor are they more likely to be working on, the first floor or the 2nd floor?