Document

advertisement
AP STATISTICS Linear Regression Review
Name
Part 1: Free Response
Show all your work. Indicate clearly the methods you use, because you will be graded on the
correctness of your methods as well as on the accuracy and completeness of your results and
explanations.
The weights of children in the Egyptian village of Nahya were recorded. Here are the mean weights of
the 170 children in that village:
Age(months) 1
Weight(kg) 4.3
2
5.1
3
5.7
4
6.3
5
6.8
6
7.1
7
7.2
8
7.2
9
7.2
10
7.2
1. Make a scatterplot of mean weight against time.
Don't forget to scale and label your
axes appropriately.
2. Is there an explanatory-response relationship?
If so, which is the explanatory variable,
and which is the response variable?
3. Determine the equation of the LSRL for this data. Find and interpret the correlation.
4. What is the value of the coefficient of variation? Interpret this value.
5. Interpret the slope of the LSRL.
6. Interpret the y-intercept of the LSRL.
7. Use the LSRL to predict the WEIGHT when AGE= 18 months.
8. Sketch a residual plot.
11
7.5
12
7.8
9. Is the least squares line an acceptable summary of the overall pattern of growth? Explain.
Joey read in his biology book that fish activity increases with water temperature, and he decided to
investigate this issue by conducting an experiment. On nine successive days, he measures fish activity
and water temperature in his aquarium. Larger values of his measure offish activity denote more
activity. The figure below presents the scatterplot of his data.
0
0
450
0
0
0
Activity
0
0
300
0
0
69.0
72.0
75.0
78.0
81.0
Water Temperature n °F)
10. One of the following numbers is the correlation coefficient between fish activity and water
temperature; circle the correct number.
-0.20
0.03
0.52
0.86
11. How would each of the following change (on not change) the value of the correlation?
a) converting the temperature to Celsius
b) switching the axes so that activity was on the x axis and temperature on the y-axis
c) adding a new point at (66, 500), i.e., water temperature= 66°F and fish activity= 500 to the plot
12. At summer camp, one of Carla's counselors told her that air temperature can be determined from the
number of cricket chirps per minute.
To determine a formula, Carla collected data on temperature and number of chirps per minute on
12 occasions. She entered the data into lists and found the following results:
r = 0.461
x = 166.8, sx = 31.0
y = 78.83 Sy = 9.11
a) Use this information to determine the equation of the LSRL to determine temperature based on the
numbe of cricket chirps. Define the variables.
b) One of Carla's data points was recorded on a particularly hot day (93°F). She counted 249 cricket
chirps in one minute. What temperature would Carla's model predict for this number of cricket
chirps? (Round to the nearest degree.)
c)
What is the residual for the data point in part b?
13. A mathematics professor wishes to analyze the relationship between the number of papers (in
hundreds) graded by his department's student homework graders and the total amount of money paid to
the graders. He collects data for 12 randomly chosen graders and uses MINITAB to do regression
analysis. Below is a portion of the MINITAB output. (Here, COST = amount paid, PAPERS = # papers
in hundreds, and the intervals listed at the bottom are computed for 1,600 papers.)
Predictor
Constant
PAPERS
Coef
35.80
12.0835
s = 6.526
Stdev
17.06
0.9738
t-ratio
2.10
12.41
R-sq = 93.9%
P
0.062
0.000
R-sq (adj) = 93.3%
a. What is the least-squares regression equation?
b. Interpret se and sb.
c. Is the linear model a useful tool in predicting cost from the number of papers? Give statistical
evidence to support your answer.
d. Compute a 95% confidence interval for the slope of the true regression line.
14. Typical heights and weights of growing children are presented in the table. Construct a model that
will allow the prediction of weight from a child’s known height. Justify why your model is more
appropriate than a model using simple linear regression.
Age (year)
1
2
3
4
5
6
7
Height (cm)
73.6
83.8
91.4
99.0
104.1
111.7
119.3
Weight (kg)
9.1
11.3
13.6
15.0
17.2
20.4
22.2
Age (year)
8
9
10
11
12
13
14
Height (cm)
127.0
132.0
137.1
142.2
147.3
152.4
157.5
Weight (kg)
25.4
28.1
31.3
34.9
39.0
45.5
48.5
Part II: Multiple Choice
The following information is used in questions 1–4.
A random sample of 80 companies from the Forbes 500 list was selected and the relationship between
sales (in hundreds of thousands of dollars) and profits (in hundreds of thousands of dollars) was
investigated by regression. A least-squares regression line was fit to the data using statistical software,
with sales as the explanatory variable and profits as the response variable. Here is the output from the
software:
Dependent variable is Profits
R squares = 66.2%
s = 466.2 with 80 – 2 = 78 degrees of freedom
Variable
Constant
Sales
Coefficient s.e. of Coefficient
–176.644
61.16
0.092498
0.0075
P-value
0.0050
≤0.0001
1. Using the above data, approximately what is the intercept of the least-squares regression line?
(a) 0.0925
(b)
0.0075
(c)
–176.64
(d)
61.16
(e) None of the above.
2. Using the above data, approximately what is a 90% confidence interval for the slope of the leastsquares regression line?
(a) 0.0925 ± 0.0075
(b)
0.0925 ± 0.012
(c)
–0.0925 ± 0.0075
(d) –0.0925 ± 0.012
(e)
None of the above.
3. Using the above data, what is the value of the t statistic for testing whether the slope of the leastsquares regression line is 0?
(a) 0.0075
(b)
0.082
(c)
0.092
(d)
12.33
(e) None of the above.
4. Using the above data, is there strong evidence (and if so, why) of a straight line relationship between
sales and profits?
(a) Yes, because the slope of the least-squares line is positive.
(b) Yes, because the P-value for testing if the slope is 0 is quite small.
(c) No, because the value of the square of the correlation is relatively small.
(d) It is impossible to say because we are not given the actual value of the correlation.
(e) None of the above. The answer is
.
5. A simple random sample of years and earnings was organized into pairs (time in years, earnings in
$1000’s). The scatterplot appears exponential and the transformation (x, y)  (x, ln y) is applied to
the data. A TI calculator yields the linear regression equation y = a + bx where a = .3079, b = .464,
and r2 = .922. Which of the following is a valid conclusion?
(a) The earnings gained after 12 years are approximately $5.88
(b) The earnings gained after 12 years are approximately $356,345.
(c) The earnings will increase by approximately $464,000 each year.
(d) The original investment was $307.90.
(e) None of the above.
6. A linear regression t-test on the slope was performed on some data showing how pollutants
effected a population of fish. The computer output is below.
SOURCE
DF
SUM OF SQUARES
MEAN SQUARE
F VALUE
PR > F
MODEL
ERROR
CORR. TOTAL
1
16
17
2.21459712
6.45556062
8.67015774
2.21459712
0.40347254
5.49
0.0324
PARAMETER
INTERCEPT
POLLUTANT
ESTIMATE
7.5641
-1.0269
T FOR H0:
PARAMETER=0
3.82
-2.34
PR > |T|
0.0015
0.0324
STD ERROR OF
ESTIMATE
1.978
0.438
It is expected that the number of fish should decrease with an increase in pollutant. An appropriate null
and alternate hypothesis to test the slope, the test statistic, and the p-value are:
(a) H0:  = 0, Ha:  ≠ 0, t = –2.34, and p-value = .0324
(b) H0:  = 0, Ha:  ≠ 0, t = 3.82, and p-value = .0007
(c) H0:  = 0, Ha:  < 0, t = –2.34, and p-value = .0324
(d) H0:  = 0, Ha:  ≠ 0, t = 3.82, and p-value = .0015
(e) H0:  = 0, Ha:  < 0, t = –2.34, and p-value = .0162
7. The equation of the least squares regression line for a certain set of data is
What is the residual for the point ( 4,7)?
a) 2.78 b) 3.00
c) 4.00
d) 4.22
e) 7.00
y
=
1.3 + 0.73x.
10. A set of data relates the amount of annual salary raise and the performance rating. The least squares
regression equation is y = 1,400 + 2,000x where y is the estimated raise and x is the
performance rating. Which of the following statements is not correct?
(a) For each increase of one point in performance rating, the raise will increase on average by
$2,000.
(b) This equation produces predicted raises with an average error of 0.
(c) A rating of 0 will yield a predicted raise of $1,400.
(d) The correlation for the data is positive.
(e) All of the above are true.
11. Which of the following would not be a correct interpretation of a correlation of r = -.30?
(a) The variables are inversely related.
(b) The coefficient of determination is 0.09.
(c) 30% of the variation between the variables is linear.
(d) There exists a weak relationship between the variables. (e) All
of the above statements are correct.
12. A local community college announces the correlation between college entrance exam grades and scholastic
achievement was found to be -1.08. On the basis of this you would tell the college that:
(a ) The entrance exam is a good predictor of success.
(b) The exam is a poor predictor of success.
(c) Students who do best on this exam will be poor students.
(d) Students at this school are underachieving.
(e) The college should hire a new statistician.
Download