Uploaded by lucfbi1234

testbank2

advertisement
AP Statistics Testbank 2
Name ____________________________________ Date _____________________ Period __________
A Few Formulas:
sx =
2
1 n
( xi − x ) 2
∑
n − 1 i −1
r=
1 n ⎛ xi − x ⎞⎛⎜ y i −
⎟
∑⎜
n − 1 i −1 ⎜⎝ s x ⎟⎠⎜⎝ s y
y ⎞⎟
⎟
⎠
∧
y = a + bx, b =
rs y
sx
, a = y − bx .
Section 1: Multiple-Choice and Short Response Questions
1) A researcher is interested in determining if one could predict the score on a statistics exam from the amount
of time spent studying for the exam. In this study, the explanatory variable is (circle the correct answer)
a)
b)
c)
d)
e)
the researcher.
the amount of time spent studying for the exam.
the score on the exam.
the fact that this is a statistics exam.
the likelihood of your passing Mr. S’s statistics course.
2) You are in the process of trying to determine if the score on a statistics examination can be predicted from
the amount of time spent studying. In this study, which is the explanatory variable and which is the response
variable?
Explanatory variable: ___________________________________
Response variable: ____________________________________
3) Suppose that you were to draw a scatterplot relating the heights and weights of mature adults in a particular
ethnic tribe. Either determine what the explanatory variable and the response variable should be, or state that
it really doesn’t matter which variable is called which.
Your answer: ____________________________________________________________________
4) Consider n data pairs ( x1 , y1 ), ( x 2 , y 2 ), ..., ( x n , y n ) . Assume that the mean of the x -values is x = 5 and that
the sample deviation of the x -values is s x = 4 . Assume also that the mean of the y-values is y = 10 and that
the sample deviation of the y-values is s y = 10 . Assume finally that the correlation of the data is given by
r = 0.6 . Of the following, which could be the least squares regression line? (Circle one.)
∧
a) y = − 5.0 + 3.0 x
∧
b) y =1.5 x
∧
c) y = 2.5 + 1.5 x
∧
d) y = 2.5 − 1.5 x
∧
e) y = − 2.5 + 1.5 x
5) A study found a correlation of r = − 0.61 between the gender of a worker and his or her income. You may
correctly conclude (circle the correct answer)
a)
b)
c)
d)
that the study is flawed; correlation makes no sense in this context.
that the study shows that women typically earn less than men.
that the study shows that the greater the salary, the greater the tendency to be a man.
that the study is flawed; only a positive correlation would be possible in this situation.
6) Suppose that we have 10 data pairs ( x1 , y1 ), ( x 2 , y 2 ), ..., ( x10 , y10 ) with correlation r = − 0.61 . Then the
correlation of the new set of data (2 x1 + 1, y1 ), (2 x 2 + 1, y 2 ), ..., (2 x10 + 1, y10 ) has value
a) 0.39
b) − 0.22
c) – 0.61
d) 0.61
7) We measure a response variable Y at each of several times. The resulting scatter plot of logY versus time of
measurement looks approximately like a positively sloping straight line. We may conclude that
a) the correlation between time of measurement and Y is negative, since logarithms of positive fractions
(such as correlations) are negative.
b) the rate of growth of Y is positive, but slowing down over time.
c) an exponential curve would approximately describe the relationship between Y and time of
measurement.
d) a mistake has been made. It would have been better to plot Y versus the logarithm of the time of
measurement.
8) A researcher wishes to study how the average weight Y (in kilograms) of children changes during the first
year of life. He plots these averages versus the age X (in months) and decides to fit a least-squares
regression line to the data with X as the explanatory variable and Y as the response variable. He computes
the following quantities.
r = correlation between X and Y = 0.9
x = mean of the values of X = 6.5
y = mean of the values of Y = 6.6
s x = standard deviation of the values of X = 3.6
s y = standard deviation of the values of Y = 1.2
The least-squares regression line is has equation (circle the correct answer):
∧
a) y = 4.65 + 0.3 x
∧
b) y = 4.65 − 0.3 x
∧
c) y = 0.3 + 4.65 x
∧
d) y = 4.65 + 2.7 x
∧
e) y = 2.7 + 4.65 x
9) Using least-squares regression, it is determined that the logarithm (base 10) of the population of a country is
approximately described by the equation
log(population) = –13.5 + 0.01x
where x is the year. Based on this equation, the population of the country in the year 2000 should be about
a)
b)
c)
d)
7.5
665
2,000,000
3,167,277
10) Assume that the scatterplot of (log x, log y ) appears linear. Then the scatterplot of ( x, y ) will look
a) linear
b) quadratic
c) exponential
d) logarithmic.
e) No discernable pattern will be recognizable.
11) The following is a two-way table describing the age and marital status of American women in 1995. The
table entries are in thousands of women.
Marital status
Never
Age (years)
married
Married
Widowed
Divorced
Total
18–24
9,289
3,046
19
260
12,614
25–39
6,948
21,437
206
3,408
31,999
40–64
2,307
26,679
2,219
5,508
36,713
≥ 65
768
7,767
8,636
1,091
18,262
Total
19,312
58,929
11,080
10,267
99,588
What percentage of the women aged 25–39 have never married?
a) 48%
b) 22%
c) 12%
d) 36%
Section 2: Free-Response Questions
∧
12) Suppose that you have 20 data pairs, all of which lie on the straight line whose equation is y = −1.2 x + 40 .
Compute the correlation of these data.
Ans: ______________________________
13) Suppose that you have data pairs ( x1 , y ), ( x 2 , y 2 ), K , ( x n , y n ) with correlation r = − 0.85 and whose
∧
regression line has equation y = 4.7 − 2.1 x . Compute s y given that s x = 2.8 .
Ans: ______________________________
∧
14) Suppose that you have 20 data pairs, all of which lie on the straight line whose equation is y = 1.2 x + 21.3 .
Assume that the mean of the 20 x -values is x = 4.3 with sample deviation s x = 1.8 . Compute the mean and
sample deviation of the 20 y -values.
y = _________________
s y = _________________
15) John's parents recorded his height at various ages up to 66 months. Below is a record of the result
Age (months)
36
48
54
60
66
Height (inches)
35
38
41
43
45
a) Compute the correlation coefficient of these data, using age as the explanatory variable.
Ans: ______________________________
b) Compute the equation of the regression line.
Ans: ______________________________
c) Use the regression line to extrapolate John’s age at age 6 years.
Ans: ______________________________
16) The scatterplot below plots the city miles per gallon on the horizontal axis versus the highway miles per
gallon on the vertical axis for 17 automobiles.
Highway miles
per gallon
City miles per gallon
a) On the graph above, sketch the regression line that best fits these data.
b) Suppose that the actual data resulting in the above scatterplot is {(11, 14), (14, 17), (16, 18), (16,17),
(17, 18), (17, 19), (17, 17), (19, 21), (19, 20), (20, 21), (20, 22), (21, 19), (21, 22), (24, 21), (25, 28),
(28, 29), (29,31)}
i)
Use your calculator to compute the equation of the regression line.
Ans: ______________________________
ii)
Use your calculator to compute the correlation.
Ans: _______________________________
17) Suppose that the scatterplot of log y versus x revealed close to a linear relationship with regression
equation log y = 12.3 − 3.2 x . Give a prediction of the response variable y given x = 2.1 .
Ans: _______________________________
18) Draw an example of a scatterplot (containing at least 10 data pairs) whose correlation r is negative and
such that | r | is fairly close to one.
19) Below is a scatterplot together with a particular point indicated, marked “x.”
a) Directly on the graph above, sketch a possible line of regression for the data.
b) Directly on the graph above, sketch a possible line of regression through the data with the point x
excluded.
c) Is the point marked “x” influencial? In addition to answering “yes” or “no,” explain what this means.
20) Animal-waste lagoons and spray fields near aquatic environments may significantly degrade water quality
and endanger health. The National Atmospheric Deposition Program has monitored the atmospheric
ammonia at swine farms since 1978. The dats on the swine population size (in thousands) and atmospheric
ammonia (in parts per million) for one decade are given below.
Year
Swine
Population
Atmospheric
Ammonia
a)
1988
0.38
1989
0.50
1990
0.60
1991
0.75
1992
0.95
1993
1.20
1994
1.40
1995
1.65
1996
1.80
1997
1.85
0.13
0.21
0.29
0.22
0.19
0.26
0.36
0.37
0.33
0.38
Construct a scatterplot for these data. Be sure to label the axes and give the units of measurement.
b) Compute the correlation coefficient.
Ans: ______________________________
c) Compute the equation of the regression line.
Ans: ______________________________
d) Based on the data (and your work), does it appear that the amount of atmospheric ammonia is linearly
related to the swine population size?
Ans: ______________________________
e) What percent of the variability of the atmospheric ammonia can be explained by the swine population
size?
Ans: ______________________________
21) The following two-way table is extracted from Moore and McCabe’s Introduction to the Practice of
Statistics.
Years of School Completed, by Age
Age group
Education
25 to 34
35 to 54
55 and over
Total
Did not complete high school
5,325
9,152
16,035
30,512
Completed only high school
14,061
24,070
18,320
56,451
College 1 to 3 years
11,659
19,926
9,662
41,247
College graduate
10,342
19,878
8,005
38,225
Total
41,388
73,028
52,022
166,438
a) What are the two categorical variables in this study?
Ans: _______________________________________________________________________
b) Which age group has the highest percentage of college graduates? What is this percentage?
Ans: _______________________________________________________________________
c) Plot a bar graph for the marginal distribution of levels of education.
Relative
frequency
22) Commercial airlines need to know the operating cost per hour of flight for each plane in their fleet. In a
study of the relationship between operating cost per hour and the number of passenger seats, investigators
computed the regression of operating cost per hour on the number of passenger seats. The 12 sample aircraft
used in the study included planes with as few as 216 passenger seats and planes with as many as 410
passenger seats. Operating cost per hour ranged between $3,600 and $7,800. Some computer output from
the regression analysis of these data is shown below.
Operating
Cost per Hour
($1000s)
Number of Passenger Seats
Predictor
Constant
Seats
Coef
1136
14.673
S = 845.3
R-Sq = 57.0%
StDev
1226
4.027
T
0.93
3.64
P
0.376
0.005
R-Sq (adj) = 52.7%
a) What is the equation of the least squares regression line that describes the relationship between
operating cost per hour and the number of passenger seats in the plane? Define any variables used in this
equation. Also, sketch this regression line on the graph above.
Ans: __________________________________________________
b) What is the value of the correlation coefficient for operating cost per hour and the number of passenger
seats in the plane? Interpret this correlation.
Correlation:_______________________
c) Suppose that you want to describe the relationship between operating cost per hour and the number of
passenger seats in the plane for planes only in the range of 250 to350 seats. Does the line shown in the
scatterplot still provide the best description of the relationship for data in this range? Why or why not?
23) Two pain relievers, A and B, are being compared for relief of postsurgical pain. Twenty different strengths
(doses in milligram) of each drug were tested. Eight hundred postsurgical patients were randomly divided
into 40 different groups. Twenty groups were given drug A. Each group was given a different strength.
Similarly, the other twenty groups were given different strengths of drug B. Strengths used ranged from 210
to 400 milligrams. Thirty minutes after receiving the drug, each patient was asked to describe his or her pain
relief on a scale of 0 (no decrease in pain) to 100 (pain totally gone).
The strength of the drug given in milligrams and the average pain rating for each group are shown in the
scatterplot below. Drug A is indicated with a's and drug B with b's.
Pain Relief
Strength (in milligrams)
a) Based on the scatterplot, describe the effect of drug A and how it is related to strength in milligrams.
b) Based on the scatterplot, describe the effect of drug B and how it is related to strength in milligrams.
c) Which drug would you give and at what strength, if the goal is to get pain relief of at least 50 at the
lowest possible strength? Justify your answer based on the scatterplot.
24) Lydia and Bob were searching the Internet to find information on air travel in the United States. They found
data on the number of commercial aircraft flying in the United States suring the years 1990–1998. The dates
were recorded as years since 1990. Thus, the year 1990 was recorded as year 0. They fit a least-squares
regression line to the data. The graph of the residuals and part of the computer output for their regression are
given below.
Residuals
Years Since 1990
Predictor
Constant
Years
Coef
2939.93
233.517
Stdev
20.55
4.316
t-ratio
143.09
54.11
p
0.000
0.000
s = 33.43
a) Is a line an appropriate model to use for these data? What information tells you this?
b) What is the value of the slope of the least-squares regression line? Interpret the slope in the context of
this situation.
c) What is the value of the intercept of the least-squares regression line? Interpret the slope in the context
of this situation.
d) What is the predicted number of commercial aircraft flying in 1992?
e) What was the actual number of commercial aircraft flying in 1992?
25) A simple random sample of 9 students was collected from a large university. Each of these students
reported the number of hours he or she had allocated to studying and the number of hours allocated to work
each week. A least squares linear regression was performed and part of the resulting computer output is
shown below.
Dependent variable is: percent kelled
R squared = 97.2% R squared (adjusted) = 96.9%
s = 4.505 with 14-2 = 12 degrees of freedom
Source
Regression
Residual
Sum of
Squares
8330.16
243.589
Variable
Coefficient
Constant
–20.5893
NO. Teaspoons 24.3929
df
1
12
s.e. of Coeff
3.242
1.204
Mean
Square
8330.16
20.2990
t-ratio
–6.35
20.3
F-ratio
410
Prob
≤ 0.0001
≤ 0.0001
Residuals
Predicted
a) What is the equation of the least-squares regression line given by this analysis? Define any variables
used in this equation.
b) If someone uses this equation to predict the percentage of weeds killed when 2.6 teaspoons of weed
killer are used, which of the following would you expect? (Check one)
‰ The prediction will be too large.
‰ The prediction will be too small.
‰ A prediction cannot be made based on the information given on the computer output.
Explain your reasoning.
26) A simple random sample of 9 students was collected from a large university. Each of these students
reported the number of hours he or she had allocated to studying and the number of hours allocated to work
each week. A least squares linear regression was performed and part of the resulting computer output is
shown below.
Predictor
Constant
Work
Coef
8.107
0.4919
StDev
2.731
0.1950
S = 4.349
R-Sq = 47.6%
R-Sq (adj) = 40.1%
T
2.97
2.52
P
0.021
0.040
The scatterplot below displays the data that were collected from the 9 students.
a) After point P, labeled on the graph above, was removed from the data, a second linear regression was
performed and the computer output is shown below.
Predictor
Constant
Work
Coef
11.123
0.1500
StDev
3.986
0.3834
S = 4.327
R-Sq = 2.5%
R-Sq (adj) = 0.0%
T
2.79
0.39
Does the point P exercise a large influence on the regression line? Explain.
P
0.032
0.709
b) The researcher who conducted the study discovered that the number of hours spent studying reported by
the student represented by P was recorded incorrectly. The corrected data point for this student is
represented by the letter Q in the scatterplot below.
Explain how the least squares regression line for the corrected data (in this part) would differ from the least
squares regression line for the original data.
Download