MDM4U1 Statistics of Two Variables Test 8

advertisement
MDM4U1 Statistics of Two Variables
Practice Test 8
1. Revenues at a golf course are compared to mean monthly
temperatures and amounts of rainfall, as shown in the table below.
Monthly Revenue ($1000s)
135 178 140 161
170 127
Mean Monthly Temperature
15
25
29
26
19
14
Amount of Rainfall (cm)
35
18
18
15
16
33
a) Create a scatter plot for revenue versus temperature and classify the
linear correlation.
Answer: moderate positive linear correlation:
Revenue = 1.52 temperature + 119; r2 = 0.21
b) Determine the correlation coefficient.
Answer: r = 0.46
c) Repeat parts a) and b) for monthly revenue versus amount of rainfall.
Answer: strong negative correlation: r = − 0.78
Revenue = −1.78 rainfall + 192; r2 = 0.60
d) Which of these has a stronger linear correlation? Explain.
Answer: Rainfall vs. revenue has the stronger linear correlation, because
the absolute value of r is greater in that case.
Use the information in the following table to answer questions 2 to 5.
These data represent the number of applicants to a first-year university
program over time. The last four values were projected numbers,
estimated at the time the study was conducted.
2. a) Create a scatter plot and classify the linear correlation.
Year 1998 1999 2000 2001 2002 2003 2004 2005 2006
Number of
114 125 133 144 180 215 198 191 203
Graduates
Answer: strong (positive) linear correlation
Graduates = 12.5833year − 25025; r2 = 0.82
b) Determine the correlation coefficient. Use linear regression to find the
best-fit line.
Answer: r = 0.91
c) Do there appear to be any outliers? Explain.
Answer: (2003, 215) appears to be an outlier; (2002, 180) and (2004,
198) may be outliers
3. In one particular year, a “double cohort” of students graduated from
high school. This graduating class was comprised of those students
leaving OAC for the last time plus a full complement of grade 12
graduates.
a) Is it evident from looking at the graph in question 2a) when this
happened? Explain.
Answer: Yes, this likely happened in 2003.
b) Explain how this hidden variable has distorted the trend of these data.
Answer: A “bulge” of graduates occurs near the double-cohort year; this
has caused the line of best fit to appear above all of the other data.
c) Is the effect of this hidden variable confined to just one year?
Describe the “disturbance” that appears between 2002 and 2004, and
explain why this might happen.
Answer: No, the effect is spread over three years (from 2002 to 2004). A
number of students may either fast track (graduate early), or take an
extra year to graduate, in order to avoid graduating in the double cohort
year.
4. a) Repeat the analysis of question 2, with the outlying region of points
from 2002 to 2004 removed.
Answer: strong (positive) linear correlation
r = 1; no outliers
Graduates = 11.196 year − 22258; r2 = 1.00
b) Describe the effects on the linear model of removing this cluster of
data.
Answer: Removing the outlying cluster of points has created an almost
perfect linear correlation.
5. a) Use both models developed in questions 2 and 4 to predict the
number of applicants in 2010.
Answer: original model: 267; modified model: 247
b) Which model do you think has provided a more reliable prediction?
Explain.
Answer: The second prediction is probably more accurate, because after
the double cohort, the original trend continues.
Use the information in this table to answer questions 6 and 7.
Number of Rounds 1 2 3 4
Number of Players 2 4 8 16
The number of players required for a single-elimination tennis
tournament depends on the number of rounds, as shown in the table
above.
6. a) Create a scatter plot, and perform a quadratic regression. Record
the equation of the best-fit curve and the coefficient of determination.
Answer:
b) Is this a good mathematical model for this situation? Explain.
Answer: This is a good model for the data shown, but not for
extrapolation, because although it fits the points well, it will not give
good predictions for extrapolation.
c) Use this model to determine the number of players required for a 6round tournament of this nature. Is this a reasonable answer? Explain.
Answer: 40; not reasonable; The tournament will not function properly
(64 players are needed).
7. a) Determine a better mathematical model, using non-linear
regression. Record the equation of the best-fit curve and the coefficient
of determination.
Answer: Exponential model is better.
b) Use this model to determine the number of players required for a 6round tournament of this nature. Is this a reasonable answer? Explain.
Answer: 64; reasonable; The tournament will function properly.
c) Account for the discrepancies between these two models.
Answer: Quadratic and exponential curves behave differently; although
both of these models fit the given data well. The exponential model
makes more sense when considering data beyond what is given, and is,
thus, superior for extrapolating beyond the given data.
8. Explain each of the following types of cause and effect. Illustrate with
an example.
a) common-cause factor
Answer:
Common-Cause Factor: An external variable causes two variables to
change in the same way. For example, suppose that a town finds that its
revenue from parking fees at the public beach each summer correlates
with the local tomato harvest. It is extremely unlikely that cars parked at
the beach have any effect on the tomato crop. Instead good weather is a
common-cause factor that increases both the tomato crop and the
number of people who park at the beach.
b) accidental relationship
Answer:
Accidental Relationship: A correlation exists without any causal
relationship between variables. For example, the number of females
enrolled in undergraduate engineering programs and the number of
“reality” shows on television both increased for several years. These two
variables have a positive linear correlation, but it is likely entirely
coincidental.
c) presumed relationship
Answer:
Presumed Relationship: A correlation does not seem to be accidental
even though no cause-and-effect relationship or common-cause factor is
apparent. For example, suppose you found a correlation between
people’s level of fitness and the number of adventure movies they
watched. It seems logical that a physically fit person might prefer
adventure movies, but it would be difficult to find a common cause or to
prove that the one variable affects the other.
d) reverse cause and effect relationship
Answer:
Reverse Cause-and-Effect Relationship: The dependent and
independent variables are reversed in the process of establishing
causality. For example, suppose that a researcher observes a positive
linear correlation between the amount of coffee consumed by a group of
medical students and their levels of anxiety. The researcher theorizes
that drinking coffee causes nervousness, but instead finds that nervous
people are more likely to drink coffee.
9. Claire, a high school student, claims that listening to loud music helps
her study. To defend her argument, she compiles the following results
for four recent tests and her study habits:
Test
History
Volume of Music While
0
Studying (Dial Setting)
Score (percent)
48
English
1
59
Mathematics Science
2
1.5
72
70
a) Create a scatter plot for these data and classify the linear correlation.
Answer: strong (positive) linear correlation
Score = 8.03volume + 51.2; r2 = 0.82
b) Do these data support Claire’s claim? Explain.
Answer: Yes; there is a strong positive linear correlation.
c) To what extent has Claire established a cause and effect relationship?
Answer: Causality has not been established to any extent.
d) Identify at least two extraneous variables.
Answer: Answers may vary, e.g., aptitude for various subjects, amount
of study time, etc.
e) Identify at least two types of bias that could be present in this study.
Do you think that this bias could be intentional or unintentional?
Explain.
Answer: measurement bias, sample bias, response bias; Answers may
vary, for example, it could be intentional, since Claire may be trying to
convince her parents to let her listen to music while she studies.
f) Suggest ways that Claire might improve the validity of her study.
Answer: Answers may vary, e.g., create experimental and control
groups, control extraneous variables – for example compare test results
for one subject only, etc.
10. a) Explain the term “hidden variable.”
Answer: an extraneous variable that is difficult to detect
b) Describe an example of a relationship between two variables that
could be obscured by a hidden variable, and how the hidden variable
could be recognized.
Answer: Answers may vary, for example, time series of Consumer Price
Index; hidden variable is a dramatic change in energy costs. It could be
detected by looking for a big jump in the graph
Download