Final Review Packet - Everett Public Schools

advertisement
TOPICS AP STATS FINAL SEMESTER 1
Chapter 1 – Exploring Data






Quantitative data
Histograms, stemplots, boxplots, dotplots
Categorical data
Two way tables, bar graphs, pie charts
Describe distributions
o SOCS - Shape, Outliers, center, spread
Outliers – 1.5*IQR rule
Chapter 6 – Random Variables




Chapter 2 - Density Curves






Normal Distributions
Cumulative relative frequency graphs
Z-scores/percentiles/using table A
empirical rule (68-95-99.7)
calculator – normalcdf(lower, upper, µ, σ)
probability distributions
discrete and continuous random variables
o mean and standard deviation
transforming and combining random variables
o mean and standard deviation
binomial distribution
o BINS
o calculating probabilities
o mean
o std dev
geometric distribution
o BITS
o calculating probabilities
o mean
Chapter 7 – Sampling Distributions
Chapter 3 – Regression
 Least Squares Regression Line
 Find these values from formulas/calculator
 Interpretations/Definitions for the following:
o slope, intercept, r, r², residual, residual
plot, s (standard dev of residuals)
 scatterplot – DOFS – direction, outlies, form,
strength




Chapter 4 – Sampling/Experimentation








Know Vocabulary!
Simple Random Sample
Cluster vs. Stratified
Types of Sampling/Bias
Experiment vs. Observational Study
Blocking/matched pairs
Confounding/Lurking Variables
Randomization/Control/Replication
Chapter 5 – Probability






“or” vs. “and”
Independent
mutually exclusive (disjoint)
calculating probabilities from two-way tables,
tree diagrams, word problems
Venn diagrams
Definitions of populations, samples,
parameters, statistics, sampling distributions
know the difference between a population
distribution, distribution of a sample, and
sampling distribution
sample proportions
o mean, std dev, tests for normality,
probability
sample means
o mean, std dev, tests for normality,
probability
o Central Limit Theorem
And whatever else I taught you… It was all important.
All of it. 
AP STATS SEMESTER 1 FINAL REVIEW 2015-16
1. Here are the IQ test scores of 10 randomly chosen fifth-grade students:
145
139
126
122
125
130
96
110
To make a stemplot of these scores, you would use as stems
A. 0 and 1.
B. 09, 10, 11, 12, 13, and 14.
C. 96, 110, 118, 122, 125, 126, 130, 139, and 145.
D. 0, 2, 3, 5, 6, 8, 9.
E. None of the above is a correct answer.
118
118
2. If a distribution is skewed to the right, which of the following is true?
A. The mean must be less than the median.
B. The mean and median must be equal.
C. The mean must be greater than the median.
D. The mean is either equal to or less than the median,
E. It’s impossible to tell which of the above statements is true without seeing the data.
3. A small company estimating its photocopying expenses finds that the mean number of copies made per day for the past 12
months is 258 copies per day with a standard deviation of 24 copies per day. Which of the following is a correct
interpretation of standard deviation?
A. The number of copies made per day was always between 234 and 282.
B. About 95% of the time, the number of copies made per day was between 234 and 282.
C. The difference between the mean number of copies made per day and the median number of copies made per day was 24.
D. On average, the number of copies made each day was about 24 copies per day away from the mean, 258.
E. 1.5 times the interquartile range of copies made per day is 24.
4. At the beginning of the school year, a high-school teacher asks every student in her classes to fill out a survey that asks for
their age, gender, the number of years they have lived at their current address, their favorite school subject, and whether they
plan to go to college after high school. Which of the following best describes the variables that are being measured?
A. four quantitative variables
B. five quantitative variables
C. two categorical variables and two quantitative variables
D. two categorical variables and three quantitative variables
E. three categorical variables and two quantitative variables
5. The heights of American men aged 18 to 24 are approximately Normally distributed with a mean of 68 inches and a
standard deviation of 2.5 inches. Only about 5% of young men have heights outside the range
A. 65.5 inches to 70.5 inches
B. 63 inches to 73 inches
C. 60.5 inches to 75.5 inches
D. 58 inches to 78 inches
E. none of the above
6. Use the information in the previous problem. About what percentage of the men are over 70.5 inches tall?
A. 2.5
B. 5
C. 16
D. 32
E.68
7. Which of the following properties is true for all Normal density curves?
I. They are symmetric.
II. The curve reaches its peak at the mean.
III. 95% percent of the area under the curve is within one standard deviation of the mean.
A. I only
B. II only
C. I and II only D. I and III only E. All three statements are correct.
8. Kitchen appliances don’t last forever. The lifespan of all microwave ovens sold in the United States is approximately
Normally distributed with a mean of 9 years and a standard deviation of 2.5 years. What percentage of the ovens last more
than 10 years? A. 11.5%
B. 34.5%
C. 65.5%
D. 69%
E. 84.5%
9. Other things being equal, larger automobile engines consume more fuel. You are planning an experiment to study the
effect of engine size (in liters) on the gas mileage (in miles per gallon) of sport utility vehicles. In this study,
A. gas mileage is a response variable, and you expect to find a negative association.
B. gas mileage is a response variable, and you expect to find a positive association.
C. gas mileage is an explanatory variable, and you expect to find a strong negative association.
D. gas mileage is an explanatory variable, and you expect to find a strong positive association.
E. gas mileage is an explanatory variable, and you expect to find very little association.
10. A set of data describes the relationship between the size of annual salary raises and the performance ratings for
employees of a certain company. The least squares regression equation is 𝑦̂ = 1400 + 2000x where y is the raise amount (in
dollars) and x is the performance rating. Which of the following statements is not necessarily true?
A. For each one-point increase in performance rating, the raise will increase on average by $2000.
B. The actual relationship between salary raises and performance rating is linear.
C. A rating of 0 will yield a predicted raise of $1400.
D. The correlation between salary raise and performance rating is positive.
E. If the average performance rating is 1.2, then the average raise is $3800.
11. A copy machine dealer has data on the number of copy machines x at each of 89 customer locations and the number of
service calls in a month y at each location. Summary calculations give 𝑥̅ = 8.4, 𝑠𝑥 = 2.1, 𝑦̂ = 14.2, 𝑠𝑦 = 3.8, and r = 0.86.
About what percent of the variation in the number of service calls is explained by the linear relation between number of
service calls and number of machines?
A. 86%
B. 93%
C. 74%
D. 55%
E. Can’t tell from the information given
12. We wish to draw a sample of 5 without replacement from a population of 50 households. Suppose the households are
numbered 01, 02, . . . , 50, and suppose that the relevant line of the random number table is
11362
35692
96237
90842
46843
62719
64049
17823
Then the households selected are
A. households 11 13 36 62 73
B. households 11 36 23 08 42
C. households 11 36 23 23 08
D. households 11 36 23 56 92
E. households 11 35 96 90 46
13. A maple sugar manufacturer wants to estimate the average trunk diameter of Sugar Maples trees in a large forest. There
are too many trees to list them all and take a SRS, so he divides the forest into several hundred 10 meter by 10 meter plots,
selects 25 plots at random, and measures the diameter of every Sugar Maple in each one. This is an example of a
A. multistage sample. B. stratified sample. C. simple random sample. D. cluster sample.
E. convenience sample.
14. A survey was done in the town of Mechanicsville to estimate the proportion of cars that are red and made by companies
based in Japan. A random sample of 25 cars from a student parking lot at Lee-Davis High School was taken. Which of the
following statements is not correct?
A. This sample may not be representative of the cars in Mechanicsville because mainly students park at Lee-Davis High
School.
B. If the particular parking space is vacant, we can simply select another parking space at random because it is unlikely that a
space being vacant is related to the color or manufacturer of the car.
C. It would an error to simply select the first 25 parking spaces in the lot closest to the auditorium because there are a number
of parking spaces there reserved for Drivers Ed vehicles, whose primary color is white.
D. A different team doing the sampling independently would obtain different answers for their sample proportions.
E. The results will be the same regardless of the time of day that the sample is taken.
15. To test the effects of a new fertilizer, 100 plots were divided in half. Fertilizer A is randomly applied to one half, and B
to the other. This is
A. an observational study.
B. a matched pairs experiment.
C. a completely randomized experiment.
D. a block design, but not a matched pairs experiment.
E. not enough information
16. The Hemlock Woolly Adelgid is an insect that has accidentally been released in Eastern U.S. forests from Asia. Since it
has no natural enemies in the U.S., it is spreading rapidly. A forester studying the abundance of the insect in southern
Vermont wants to determine if it has spread that far north. He randomly selects 200 hemlock trees in a large Vermont forest
and finds that 46 of them show signs of damage from this insect. It would be appropriate to generalize the results of the
study to
A. all hemlock trees in southern Vermont.
B. all trees in southern Vermont.
C. the 200 hemlock trees that were randomly selected
D. all hemlock trees in the United states.
E. all hemlock trees in the forest from which the 200 trees were selected.
17. The probability that you will be ticketed for illegal parking on campus is about 1/3. During the last nine days, you have
illegally parked every day and have NOT been ticketed (you lucky person!). Today, on the 10th day, you again decide to park
illegally. Assuming the outcomes are independent from day to day, the probability that you will be caught is
A.
B.
C.
D.
E.
Use the two way table to answer the next 3 questions. The two-way table below gives information on seniors and juniors at
a high school and by which means they typically get to school.
Car
Bus
Walk
Totals
Juniors
146
106
48
300
Seniors
146
64
40
250
Totals
292
170
88
550
18. You select one student from this group at random. What is the probability that this student typically takes a bus to
school?
A. 0.256
B. 0.309
C. 0.353
D. 0.455
E. 0.604
19. You select one student from this group at random. If the student says he is a junior, what is the probability that he walks
to school?
A. 0.073
B. 0.160
C. 0.455
D. 0.600
E. 0.833
20. If P(A) = 0.24 and P(B) = 0.52 and A and B are independent, what is P(A or B)?
A. 0.1248
B. 0.28
C. 0.6352
D. 0.76
E. The answer cannot be determined from the information given.
21. There are 10 red marbles and 8 green marbles in a jar. If you take three marbles from the jar (without replacement), the
probability that they are all red is:
A. 0.069
B. 0.088
C. 0.147
D. 0.444
E. 0.171
22. You measure the age, marital status and earned income of an SRS of 1463 women. The number and type of variables
you have measured is
(a) 1463; all quantitative.
(b) four; two categorical and two quantitative.
(c) four; one categorical and three quantitative.
(d) three; two categorical and one quantitative.
(e) three; one categorical and two quantitative.
23. Consumers’ Union measured the gas mileage in miles per gallon of 38 1978–1979 model automobiles on a special test
track. The pie chart below provides information about the country of manufacture of the model cars used by Consumers
Union. Based on the pie chart, we may conclude that:
(a) Japanese cars get significantly lower gas mileage than cars of other countries.
This is because their slice of the pie is at the bottom of the chart.
(b) U.S cars get significantly higher gas mileage than cars from other countries.
(c) Swedish cars get gas mileages that are between those of Japanese and U.S.
cars.
(d) Mercedes, Audi, Porsche, and BMW represent approximately a quarter of the
cars tested.
(e) More than half of the cars in the study were from the United States.
24. The following is a cumulative relative frequency graph on the number of ounces of alcohol (one ounce is about 30 mL)
consumed per week in a sample of 150 students.
A study wished to classify the students as “light”, “moderate”, “heavy” and “problem” drinkers by the amount consumed per
week. About what percentage of students are moderate drinkers, that is consume between 4 and 8 ounces per week?
(a) 30%
(b) 50%
(c) 40%
(d) 75%
(e) 60%
25. “Normal” body temperature varies by time of day. A series of readings was taken of the body temperature of a subject.
The mean reading was found to be 36.5° C with a standard deviation of 0.3° C. When converted to °F, the mean and
standard deviation are
(°F = °C(1.8) + 32).
(a) 97.7, 32
(b) 97.7, 0.30
(c) 97.7, 0.54
(d) 97.7, 0.97
(e) 97.7, 1.80
26. The following is a histogram showing the actual frequency of the closing prices on the New York exchange of a
particular stock. Based on the frequency histogram for New York Stock exchange, the class that contains the 80th
percentile is:
(a) 20-30
(b) 10-20
(c) 40-50
(d) 50-60
(e) 30-40
27. There are three children in a room, ages three, four, and five. If a four-year-old child enters the room the
(a) mean age will stay the same but the variance will increase.
(b) mean age will stay the same but the variance will decrease.
(c) mean age and variance will stay the same.
(d) mean age and variance will increase.
(e) mean age and variance will decrease.
28. The weights of the male and female students in a class are summarized in the following boxplots:
Which of the following is NOT correct?
(a) About 50% of the male students have weights between 150 and 185 pounds.
(b) About 25% of female students have weights more than 130 pounds.
(c) The median weight of male students is about 162 pounds.
(d) The mean weight of female students is about 120 pounds because of symmetry.
(e) The male students have less variability than the female students.
29. For the density curve shown to the right,
which statement is true?
(a) The density curve is symmetric.
(b) The density curve is skewed right.
(c) The area under the curve between 0 and 1 is 1.
(d) The density curve is normal.
(e) None of the above is correct.
30. For the density curve shown in question 10, which statement is true?
(a) The mean and median are equal.
(b) The mean is greater than the median.
(c) The mean is less than the median.
(d) The mean could be either greater than or less than the median.
(e) None is the above is correct.
31. For the density curve shown, what is the mean?
(a) 0
(b) 0.25
(c) 0.50
(d) 0.75
(e) 1.0
32. A normal density curve has which of the following properties?
(a) It is symmetric.
(b) It has a peak centered above its mean.
(c) The spread of the curve is proportional to it standard deviation.
(d) All of the properties, (a) to (c), are correct.
(e) None of the properties, (a) to (c), is correct.
33. In a statistics course, a linear regression equation was computed to predict the final exam score from the score on the first
test. The equation was y = 10 + .9x where y is the final exam score and x is the score on the first test. Carla scored 95 on
the first test. What is the predicted value of her score on the final exam?
(a) 95
(b) 85.5
(c) 90
(d) 95.5
(e) None of the above
34. Refer to the previous problem. On the final exam Carla scored 98. What is the value of her residual?
(a) 98
(b) 2.5
(c) –2.5
(d) 0
(e) None of the above
35. A study of the fuel economy for various automobiles plotted the fuel consumption (in liters of gasoline used per 100
kilometers traveled) vs. speed (in kilometers per hour). A least squares line was fit to the data. Here is the residual plot
from this least squares fit.
What does the pattern of the residuals tell you about the linear model?
(a) The evidence is inconclusive.
(b) The residual plot confirms the linearity of the fuel economy data.
(c) The residual plot does not confirm the linearity of the data.
(d) The residual plot clearly contradicts the linearity of the data.
(e) None of the above
36. Suppose we fit the least squares regression line to a set of data. What do we call any individual
points with unusually large values of the residuals?
(a) Response variables
(b) The slope
(c) Outliers
(d) Correlations
(e) None of the above
37. The effect of removing the right-most point (near the positive x-axis) in the scatterplot shown would be:
(a) The slope of the LSRL will increase; r will increase
(b) The slope of the LSRL will increase; r will decrease
(c) The slope of the LSRL will decrease; r will increase
(d) The slope of the LSRL will decrease; r will decrease
(e) No change
38. If removing an observation from a data set would have a marked change on the position of the LSRL fit to the data, what
is the point called:
(a) Robust
(b) A residual
(c) A response
(d) Influential
(e) None of the above
39. Which of the following distributions are more likely to be skewed to the left than skewed to the right?
I. Scores on an easy test
II. Scores on a hard test
III. Scores in a soccer match
(a) I only
(b) I and II
(c) I and III
(d) II and III
(e) I, II, and III
40. What do we call a sample that consists of the entire population?
(a) A stratum
(b) A multistage sample
(c) A mistake. A sample can never be the entire population.
(d) A census
(e) None of the above. The answer is _________________________.
41. A member of Congress wants to know what his constituents think of proposed legislation on health insurance. His staff
reports that 228 letters have been received on the subject, of which 193 oppose the legislation. What is the population in
this situation?
(a) The constituents
(b) The 228 letters received
(c) The 193 opposing the legislation
(d) Congress
(e) None of the above. The answer is _____________________________.
42. Which of the following is a method for improving the accuracy of a sample?
(a) Use no more than 3 or 4 words in any question
(b) When possible, avoid the use of human interviewers, relying on computerized dialing instead
(c) Use large sample sizes
(d) Use smaller sample sizes
(e) None of the above. The answer is _____________________________.
43. We say that the design of a study is biased if which of the following is true?
(a) There is very large sample
(b) Random placebos have been used
(c) Certain outcomes are systematically favored
(d) The correlation is greater than 1 or less than –1
(e) None of the above. The answer is _____________________________.
44. Control groups are used in experiments in order to . . .
(a) Control the effects of lurking variables such as the placebo effect
(b) Control the subjects of a study so as to insure all participate equally
(c) Guarantee that someone other than the investigators, who have a vested interest in the outcome, control how the
experiment is conducted
(d) Achieve a proper and uniform level of randomization
(e) None of the above. The answer is ______________________________.
45. The probability of any outcome of a random phenomenon is
(a) The precise degree of randomness present in the phenomenon
(b) Any number as long as it is between 0 and 1
(c) Either 0 or 1, depending on whether or not the phenomenon can actually occur or not
(d) The proportion of a very long series of repetitions on which the outcome occurs
(e) None of the above
46. If you choose a card at random from a well-shuffled deck of 52 cards, what is the probability that the card chosen is not a
heart?
(a) 0.25
(b) 0.50
(c) 0.75
(d) 1
(e) None of the above
47. You play tennis regularly with a friend, and from past experience, you believe that the outcome of each match is
independent. For any given match you have a probability of .6 of winning. The probability that you win the next two
matches is
(a) 0.16
(b) 0.36
(c) 0.4
(d) 0.6
(e) 1.2
48. If P(A) = 0.24 and P(B) = 0.52 and A and B are independent, what is P(A or B)?
(a) 0.1248
(b) 0.28
(c) 0.6352
(d) 0.76
(e) The answer cannot be determined from the information given same as #20. Woops.
49. In a population of students, the number of calculators owned is a random variable X with P(X = 0) = 0.2, P(X = 1) = 0.6,
and P(X = 2) = 0.2. The mean of this probability distribution is
(a) 0
(b) 2
(c) 1
(d) 0.5
(e) The answer cannot be computed from the information given.
50. Refer to the previous problem. The variance of this probability distribution is
(a) 1
(b) 0.63
(c) 0.5
(d) 0.4
(e) The answer cannot be computed from the information given.
51. The weight of reports produced in a certain department has a normal distribution with mean 60g and standard deviation
12g. What is the probability that the next report will weigh less than 45g?
(a) 0.1042
(b) 0.1056
(c) 0.3944
(d) 0.0418
(e) The answer cannot be computed from the information given.
52. In a large population of college students, 20% of the students have experienced feelings of math anxiety. If you take a
random sample of 10 students from this population, the probability that exactly 2 students have experienced math
anxiety is
(a) 0.3020
(b) 0.2634
(c) 0.2013
(d) 0.5
(e) 1
(f) None of the above
53. Refer to the previous problem. The standard deviation of the number of students in the sample who have experienced
math anxiety is
(a) 0.0160
(b)
1.265
(c)
0.2530
(d)
1
(e) .2070
54. In a certain large population, 40% of households have a total annual income of $70,000. A simple random sample of 4
of these households is selected. What is the probability that 2 or more of the households in the survey have an annual
income of over $70,000?
(a) 0.3456
(b) 0.4000
(c) 0.5000
(d) 0.5248
(e) The answer cannot be computed from the information given.
55. A factory makes silicon chips for use in computers. It is known that about 90% of the chips meets specifications. Every
hour a sample of 18 chips is selected at random for testing. Assume a binomial distribution is valid. Suppose we collect
a large number of these samples of 18 chips and determine the number meeting specifications in each sample. What is
the approximate mean of the number of chips meeting specifications?
(a) 16.20
(b) 1.62
(c) 4.02
(d) 16.00
(e) The answer cannot be computed from the information given.
56. Which of the following are true statements?
I. The expected value of a geometric random variable is determined by the formula (1 – p)n–1p.
II. If X is a geometric random variable and the probability of success is .85, then the probability distribution of X will
be skewed left, since .85 is closer to 1 than to 0.
III. An important difference between binomial and geometric random variables is that there is a fixed number of trials in
a binomial setting, and the number of trials varies in a geometric setting.
(a)
(b)
(c)
(d)
(e)
I only
II only
III only
I, II, and III
None of the above gives the complete set of true responses.
FREE RESPONSE:
1. Suppose that jumps by Olympic men high jumpers have a normal distribution with mean 2.12 meters and standard
deviation .12 meters; women’s jumps have a normal distribution with mean 1.80 meters and standard deviation .09 meters. A
man and woman Olympic high jumper are picked at random.
(a) What is the probability the sum of their jumps is over 4 meters?
(b)What is the probability that the man jumped higher than the woman?
2. A company advertises it has a process that can extract 5 kg of protein from 100 kg of seaweed. The company claims that
its yields follow a Normal distribution with mean 5 kg and a standard deviation of 1.5 kg. An independent group of scientists
replicated their process on a random sample of fifteen 100 kg clumps of seaweed. The scientists yields had a mean of 4.82 kg
of protein.
(a) Is there sufficient evidence to dispute the advertisement? Justify your answer.
A large scale test of a second company’s process shows yields of protein that are normally distributed with a mean of 4.75 kg
and a standard deviation of .83 kg.
(b) What is the probability that using this second process a 100-kg clump of seaweed will yield at least 5 kg of protein?
(c) What is the probability that using this second process on ten randomly selected 100-kg clumps of seaweed, at least two of
them yield at least 5 kg of protein?
3. The GPAs of random samples of 50 male and 50 female students at a large university are noted and summarized below.
Write a few sentences comparing the distributions of male and female students at this university.
4. Suppose the number of minutes per day high school students spend text messaging is normally distributed with a mean of
21 minutes.
(a) Which is more likely: an SRS of 25 students text messaging an average of less than 20 minutes per day, or an SRS of 100
students text messaging an average of less than 20 minutes per day? Explain.
(b) Suppose the sampling distribution of 𝑥̅ for samples of size 100 has a standard deviation of .8 minutes. What is the
probability an SRS of 100 students text messaging an average of more than 23 minutes?
(c) Suppose the original population is not normal, but rather is skewed right (to the higher values). How would your answer
in part b change?
5. In a simple random sample, 20 college graduates were asked their starting salaries at their first employments.
Saralary ($1,000)
Frequency
0 – 10
1
10 - 20
0
20 - 30
10
30 - 40
5
40 - 50
4
(a) Draw a cumulative relative frequency plot of this data.
(b) If given the raw data, would the mean or median best describe the typical starting salary for college students? Explain.
6. Sixty professional runners participated in an experiment to evaluate two new training programs, A and B. The athletes
were randomly divided into two groups, one group to undergo program A and the other program B. Every runner’s time (in
seconds) at a preselected distance was obtained and then evaluated a second time after 3 months on the program. They then
recorded the difference in times.
a.) What type of experiment was this?
b.) Name the experimental units, treatments, explanatory and response variables in this study.
c.) Create a diagram for this experiment.
A summary of the calculated differences in time for the sixty runners are in the table below. The differences were calculated
as old time minus new time (so positives represent improvements).
Program
A
B
Values < 𝑄1
-3,-2
-3, 1
𝑄1
-1
2
median
1
3
𝑄3
9
4
Values > 𝑄3
11,14
5, 9
d.) Draw parallel boxplots showing any outliers, if any.
e.) which program should be chosen if the goal is to have the greatest percentage of runners improve their times? Explain.
f.) Which program should be chosen if the goal is to have the greatest mean improvement in the runners’ times? Explain.
7. A random sample of adults with various blood alcohol levels was given a concentration test consisting of a series of small
objects each of which could be fit into an appropriately shaped hole. Each person had 5 minutes to find the proper holes for
as many objects as possible. Computer regression output of number of objects successfully placed plotted against blood
alcohol content (.01 percent) is as follows:
(a) What is the equation of the regression line?
(b) Interpret the slope in context.
(c) Interpret the y-intercept in context.
(d) Calculate r and describe the relationship between BAC and the performance task.
(d) Interpret 𝑟 2 .
(e) Interpret “s”.
(f) Test subject 32 had a BAC level of .04% and a residual of about 2.1. How many small objects was he able to fit?
8. The probability that Bonds hits a homer on any given at-bat is .12, and each at-bat is independent.
(a) What is the probability that the next homer will be on his fifth at-bat?
(b) What is the probability that he has exactly one homer in five at-bats?
(c) What is the expected number of homers in every 10 at-bats?
(d) What is the expected number of at-bats until the next homer?
Download