Exam I Review - Monica Dabos

advertisement
Math 140
Review I
PART 1
Multiple choices
1. Which of the following measurements is likely to have the least variation?
a.
The individual weights in ounces of oranges in a randomly selected five-pound bag of oranges at the
market.
b.
The individual mass measured in grams of quarters in a randomly selected ten dollar roll of quarters.
c.
The individual heights of children, measured in inches, in a randomly selected class of sixth grade
students.
2. Marital status of each member of a randomly selected group of adults is an example of what type of
variable?
a. Numerical variable
b. Categorical variable
c. Neither
3. “People with diabetes are at higher risk for certain cancers than those without the blood sugar disease,
suggests a new study based on a telephone survey of nearly 400,000 adults.”
a. Observational study b. Controlled experiment
A fitness instructor measured the heart rates of the participants in a yoga class at the conclusion of the class.
The data is summarized in the histogram below. There were fifteen people who participated in the class
between the ages of 25 and 45. Use the histogram to answer questions (4) and (5).
4. How many participants had a heart rate between 120 and 130 bpm?
a. 2 b. 4
c. 3
d. 5
5. What percentage of the participants had a heart rate greater than 130 bpm?
a. 13% b. 27%
c. 33% d. 53%
6. Determine whether the variable would best be modeled as numerical or categorical : The temperature
of a greenhouse at a certain time of the day. a. Numerical
b. Categorical
7. Determine whether the variable would best be modeled as numerical or categorical: The number of
tomatoes harvested each week from a greenhouse tomato plant.
1
a. Numerical
b. Categorical
8. Below is the standard deviation for extreme 10k finish times for a randomly selected group of women and
men. Chose the statement that best summarizes the meaning of the standard deviation.
Women: s
a.
b.
c.
d.
s
On average, men’s finish times will be 0.21 hours faster than the overall average finish time.
On average, women’s finish times will be 0.17 hours less than men’s finish times.
The distribution of men’s finish times is less varied then the distribution of women’s finish times.
The distribution of women’s finish times is less varied then the distribution of men’s finish times.
9. In simple regression analysis the quantity that gives the amount by which Y (dependent variable) changes
for a unit change in X (independent variable) is called the
A) Coefficient of determination
B) Slope of the regression line
C) Y intercept of the regression line
D) Correlation coefficient
E) Standard error
10. The correlation coefficient may assume any value between
A) 0 and 1
B) - and 
C) 0 and 8
D) -1, and 1
E) -1, and 0
11. In simple regression analysis, if the correlation coefficient is a positive value, then
A) The Y intercept must also be a positive value.
B) The coefficient of determination can be either positive or negative, depending on the value of the slope.
C) The least squares regression equation could either have a positive or a negative slope.
D) The slope of the regression line must also be positive.
E) The standard error of estimate can either have a positive or a negative value.
12. The strength of the relationship between two quantitative variables can be measured by the:
A) slope of a simple linear regression equation
B) Y intercept of the simple linear regression equation
C) coefficient of correlation
PART II (MUST SHOW ALL YOUR WORK)
2
13. The waiting times (in minutes) to be served at a bank for a simple random sample of 22 customers are:
8.35 3.82 10.49 8.37 5.64 8.02 6.17 9.66 5.47 5.90 5.79
2.54 4.23 1.45 4.90 5.41 4.08 8.01 3.00 3.96 2.24 1.00
A. Complete a frequency table~ you chose the intervals.
Include columns for frequency and relative frequency.
B. Create a histogram for these ages.
[0,2)
[2,4)
Frequency
2
5
Relative
Frequency
0.09
0.23
Total
C. Describe the distribution (use as much description as possible make sure include descriptions of
shape, center, and spread in context)
14. For each of the following variables, state whether it is qualitative or quantitative. If the variable is
quantitative, say whether it is discrete or continuous.
A. Total amount of snowfall in New York City in a year
B. Number of females in State Prisons in 2003
C. Ethnicity of students at SBCC
D. The number of different zip codes in California counties
15. How much do users pay for Internet Fax Providers? Here are the monthly fees (in dollars) paid by a
random sample of 6 users of commercial Internet Fax service providers in February 2010.
14 10 13 10 15 4
a) What is the standard deviation for the monthly fees paid? (MUST do it by hand showing all your
work to receive credit)
b) What does this number tell you about the monthly fees paid?
3
16. the following data, which gives the ages (in numerical order) at which a sample of 35 American mothers
first gave birth.
14
16
16
16
17
17
18
18
18
19
19
19
20
20
20
20
20
21
21
21
22
23
23
24
24
24
24
26
27
28
28
31
32
33
50
a) FIND THE FIVE POINTS SUMMARY
b) Construct a BOX PLOT
c) IDENTIFY OUTLIERS
17. The World Almanac and Book of Facts 2004 reported the percent of people not covered by health
insurance in the 50 states and Washington, D. C., for the year 2002. Computer output gives these summaries
for the percent of people not covered by health insurance
a) Is there an outlier in the data
b) Looking at the histogram what measure of center and spread would you use? Explain
18. Data was collected on handgrip strength of adults. The histogram below summarizes the data. Which
statement is true about the distribution of the data shown in the graph. Describe the Graph using measures of
center shape and spread in context and try to explain what do you think maybe the reason for this shape?
4
19. The side-by-side boxplots below show cumulative GPAs for sophomores, juniors and seniors taking intro
stats course in Autumn 2003. Compare the groups; use as much description as possible.
20. Line segments with slopes 2, 1, , 0, - , -1, -2, and undefined are shown. Match the slopes with their
corresponding lines:
æ
21. Use the following formulas to compute the slope ç b1 =
è
r isy ö
and y-intercept b0 = y - b1 ix for the
sx ÷ø
regression line. Then give the equation of the regression line ŷ  b0  b1 x .
(
)
 r  0.847 
Mean Standard Deviation
Explanatory Variable 11.6 2.35
Response Variable
54.52 7.21
5
22. Use the following formulas to compute the slope and y-intercept for the regression line. Then
give the equation of the regression line.  r  0.746 (This problem is worth 5 points)
ŷ = a + bx where b = r i
sy
sx
and a = y - bx .
Mean Standard Deviation
Explanatory Variable 23.7 7.2
Response Variable
97.5 13.4
23. The following scatterplot, correlation coefficient and regression line describe the relationship between the
year (x) and the millions of dollars spent on Halloween candy (y) in the U.S.
(This problem is worth 15 points)
r = 0.941
The regression equation: Y = - 106 + 0.0542 X
Scatterplot of Halloween Candy Sales (Millions $ vs Years)
Halloween Candy Sales (in Millions of $)
2.3
2.2
2.1
2.0
1.9
1.8
1.7
1.6
1.5
1.4
1994
1997
2000
Years
2003
2006
a) What is the slope of the regression line? What does the slope mean in this context?
b) How well does the regression line fit this data? How confident would you be in
making predictions with the regression line?
c) Use the regression equation to predict how much was spent in 2004 on candy sales.
(Hint: Since the x variable is in actual years, plug in 2004 into the regression
equation.)
c) Can we use this regression equation to predict candy sales in the year 2037? Why?
6
24. The following scatterplot and r 2 describe the relationship between the year and the millions of dollars
spent on Halloween candy in the U.S. Tell whether each of the following statements is a valid or invalid
interpretation of r 2 . (This problem is worth 5 points)
r 2  0.885
Scatterplot of Halloween Candy Sales (Millions $ vs Years)
Halloween Candy Sales (in Millions of $)
2.3
2.2
2.1
2.0
1.9
1.8
1.7
1.6
1.5
1.4
1994
1997
2000
Years
2003
2006
a) Since the r 2 value is 88.5%, this indicates that time causes candy sales to increase.
b) There is a 88.5% chance that we can accurately predict the candy sales for a
given year between 1994 and 2006.
c) 88.5% of the variability in candy sales can be attributed to the linear relationship
with the year.
d) There was an average of 88.5 million dollars in candy sales.
Complete the following problems.
25. The calculated correlation coefficient values are -0.977, -0.487, 0.006 and 0.777.
Match the correlation coefficient values with its scatterplot.
7
Download