MDM4U1 Statistics of Two Variables Practice Test 8 1. Revenues at a golf course are compared to mean monthly temperatures and amounts of rainfall, as shown in the table below. Monthly Revenue ($1000s) 135 178 140 161 170 127 Mean Monthly Temperature 15 25 29 26 19 14 Amount of Rainfall (cm) 35 18 18 15 16 33 a) Create a scatter plot for revenue versus temperature and classify the linear correlation. Answer: moderate positive linear correlation: Revenue = 1.52 temperature + 119; r2 = 0.21 b) Determine the correlation coefficient. Answer: r = 0.46 c) Repeat parts a) and b) for monthly revenue versus amount of rainfall. Answer: strong negative correlation: r = − 0.78 Revenue = −1.78 rainfall + 192; r2 = 0.60 d) Which of these has a stronger linear correlation? Explain. Answer: Rainfall vs. revenue has the stronger linear correlation, because the absolute value of r is greater in that case. Use the information in the following table to answer questions 2 to 5. These data represent the number of applicants to a first-year university program over time. The last four values were projected numbers, estimated at the time the study was conducted. 2. a) Create a scatter plot and classify the linear correlation. Year 1998 1999 2000 2001 2002 2003 2004 2005 2006 Number of 114 125 133 144 180 215 198 191 203 Graduates Answer: strong (positive) linear correlation Graduates = 12.5833year − 25025; r2 = 0.82 b) Determine the correlation coefficient. Use linear regression to find the best-fit line. Answer: r = 0.91 c) Do there appear to be any outliers? Explain. Answer: (2003, 215) appears to be an outlier; (2002, 180) and (2004, 198) may be outliers 3. In one particular year, a “double cohort” of students graduated from high school. This graduating class was comprised of those students leaving OAC for the last time plus a full complement of grade 12 graduates. a) Is it evident from looking at the graph in question 2a) when this happened? Explain. Answer: Yes, this likely happened in 2003. b) Explain how this hidden variable has distorted the trend of these data. Answer: A “bulge” of graduates occurs near the double-cohort year; this has caused the line of best fit to appear above all of the other data. c) Is the effect of this hidden variable confined to just one year? Describe the “disturbance” that appears between 2002 and 2004, and explain why this might happen. Answer: No, the effect is spread over three years (from 2002 to 2004). A number of students may either fast track (graduate early), or take an extra year to graduate, in order to avoid graduating in the double cohort year. 4. a) Repeat the analysis of question 2, with the outlying region of points from 2002 to 2004 removed. Answer: strong (positive) linear correlation r = 1; no outliers Graduates = 11.196 year − 22258; r2 = 1.00 b) Describe the effects on the linear model of removing this cluster of data. Answer: Removing the outlying cluster of points has created an almost perfect linear correlation. 5. a) Use both models developed in questions 2 and 4 to predict the number of applicants in 2010. Answer: original model: 267; modified model: 247 b) Which model do you think has provided a more reliable prediction? Explain. Answer: The second prediction is probably more accurate, because after the double cohort, the original trend continues. Use the information in this table to answer questions 6 and 7. Number of Rounds 1 2 3 4 Number of Players 2 4 8 16 The number of players required for a single-elimination tennis tournament depends on the number of rounds, as shown in the table above. 6. a) Create a scatter plot, and perform a quadratic regression. Record the equation of the best-fit curve and the coefficient of determination. Answer: b) Is this a good mathematical model for this situation? Explain. Answer: This is a good model for the data shown, but not for extrapolation, because although it fits the points well, it will not give good predictions for extrapolation. c) Use this model to determine the number of players required for a 6round tournament of this nature. Is this a reasonable answer? Explain. Answer: 40; not reasonable; The tournament will not function properly (64 players are needed). 7. a) Determine a better mathematical model, using non-linear regression. Record the equation of the best-fit curve and the coefficient of determination. Answer: Exponential model is better. b) Use this model to determine the number of players required for a 6round tournament of this nature. Is this a reasonable answer? Explain. Answer: 64; reasonable; The tournament will function properly. c) Account for the discrepancies between these two models. Answer: Quadratic and exponential curves behave differently; although both of these models fit the given data well. The exponential model makes more sense when considering data beyond what is given, and is, thus, superior for extrapolating beyond the given data. 8. Explain each of the following types of cause and effect. Illustrate with an example. a) common-cause factor Answer: Common-Cause Factor: An external variable causes two variables to change in the same way. For example, suppose that a town finds that its revenue from parking fees at the public beach each summer correlates with the local tomato harvest. It is extremely unlikely that cars parked at the beach have any effect on the tomato crop. Instead good weather is a common-cause factor that increases both the tomato crop and the number of people who park at the beach. b) accidental relationship Answer: Accidental Relationship: A correlation exists without any causal relationship between variables. For example, the number of females enrolled in undergraduate engineering programs and the number of “reality” shows on television both increased for several years. These two variables have a positive linear correlation, but it is likely entirely coincidental. c) presumed relationship Answer: Presumed Relationship: A correlation does not seem to be accidental even though no cause-and-effect relationship or common-cause factor is apparent. For example, suppose you found a correlation between people’s level of fitness and the number of adventure movies they watched. It seems logical that a physically fit person might prefer adventure movies, but it would be difficult to find a common cause or to prove that the one variable affects the other. d) reverse cause and effect relationship Answer: Reverse Cause-and-Effect Relationship: The dependent and independent variables are reversed in the process of establishing causality. For example, suppose that a researcher observes a positive linear correlation between the amount of coffee consumed by a group of medical students and their levels of anxiety. The researcher theorizes that drinking coffee causes nervousness, but instead finds that nervous people are more likely to drink coffee. 9. Claire, a high school student, claims that listening to loud music helps her study. To defend her argument, she compiles the following results for four recent tests and her study habits: Test History Volume of Music While 0 Studying (Dial Setting) Score (percent) 48 English 1 59 Mathematics Science 2 1.5 72 70 a) Create a scatter plot for these data and classify the linear correlation. Answer: strong (positive) linear correlation Score = 8.03volume + 51.2; r2 = 0.82 b) Do these data support Claire’s claim? Explain. Answer: Yes; there is a strong positive linear correlation. c) To what extent has Claire established a cause and effect relationship? Answer: Causality has not been established to any extent. d) Identify at least two extraneous variables. Answer: Answers may vary, e.g., aptitude for various subjects, amount of study time, etc. e) Identify at least two types of bias that could be present in this study. Do you think that this bias could be intentional or unintentional? Explain. Answer: measurement bias, sample bias, response bias; Answers may vary, for example, it could be intentional, since Claire may be trying to convince her parents to let her listen to music while she studies. f) Suggest ways that Claire might improve the validity of her study. Answer: Answers may vary, e.g., create experimental and control groups, control extraneous variables – for example compare test results for one subject only, etc. 10. a) Explain the term “hidden variable.” Answer: an extraneous variable that is difficult to detect b) Describe an example of a relationship between two variables that could be obscured by a hidden variable, and how the hidden variable could be recognized. Answer: Answers may vary, for example, time series of Consumer Price Index; hidden variable is a dramatic change in energy costs. It could be detected by looking for a big jump in the graph