Final Exam ST511 Fall 2011 final version.docx

advertisement
1
Final Exam ST511
Score ________________
Name ________________________
Date__________________________
Because most of the questions are multiple choice, it is important that you round your answers as
the LAST step.
1.
Old Faithful Geyser in Yellowstone National Park (YNP) derives its name and fame for the
regularity of its eruptions. Predicting times of geyser eruptions is one of the unique aspects of a
Yellowstone Park Ranger’s job. Predictions are made using statistical methods, and regression
analysis in particular. This question uses data from Old Faithful Geyser. The variables in the
data set are eruption duration (x, in minutes) and the time interval to the start of the next eruption
(y, in minutes). (For example, the point ( xi , yi ) = (3, 58) means that ith eruption had a duration
of 3 minutes, and then there was a time interval of 58 minutes to the start of the next eruption.)
Theory suggests that the longer the duration of an eruption, the more water and steam that is
released, and thus it takes a longer time for pressure to build up for a subsequent eruption;
suggesting the hypothesis that long (short) durations of eruptions tend to be followed by long
(short) intervals to the next eruption. A scatter plot and the results of a regression analysis of
eruption duration on the time interval to subsequent eruptions are given below.
y  71.00901 , s y2  163.81892
x  3.576126 , s x2  1.1749479
Parameter Estimate
Intercept
Slope
SE Coef
T-Stat
P-Value
33.96676 1.4278698 23.788412 <0.0001
10.358206 0.3821834 27.102709 <0.0001
Analysis of variance table for regression model:
Source
Regression
DF
SS
MS
F-stat
P-value
1 27859.926 27859.926 734.5569 <0.0001
Error
220
8344.056
Total
221
36203.98
37.92753
2
1. What is the estimated regression line of interval on duration?
a) y   0  1 x
b) y  1.42786988  0.3821834 x
c) y  1.42786988  0.3821834 x  
d) y  33.96676  10.358206 x
e) y   0  1 x  
2.
What hypothesis test should be conducted in order to determine if there is a linear relationship
between durations and waiting intervals?
a) H 0 :  0  0 vs H A :  0  0
b) H 0 : 1  1 vs H A : 1  1
c) H 0 : 1  0 vs H A : 1  0
d) H 0 : 1  0 vs H A : 1  0
e) H 0 : 1  0 vs H A : 1  0
3.
What hypothesis test should be conducted in order to determine whether the data support the
theory that longer (shorter) durations tend to be followed by longer (shorter) waiting intervals?
a) H 0 : 1  1 vs H A : 1  1
b) H 0 : 1  0 vs H A : 1  0
c) H 0 :  0  12 vs H A :  0  12
d) H 0 : 1  0 vs H A : 1  0
e) H 0 :  0  0 vs H A :  0  0
4.
What percentage of the variation in waiting time interval times is explained by the preceding
duration times?
a) (77 2 )%
b) ( 77 )%
c) (232 )%
d) 77%
e) 23%
5.
What is the correlation between interval and duration?
a) -0.77
b) 0.77
c) 0.88
d) 88
e) 0.23
f) None of the above
3
6.
To test H 0 : 1  0 vs. H A : 1  0 , which test statistic should be used:
i) t calc  23.788412
a)
b)
c)
d)
e)
ii) t calc  27.102709
iii) Fcalc  734.5569
i only
ii only
iii only
ii and iii
i, ii, and iii
7. To test H 0 : 1  0 vs. H A : 1  0 , which test statistic should be used:
i) t calc  23.788412
a)
b)
c)
d)
e)
8.
ii) t calc  27.102709
iii) Fcalc  734.5569
i only
ii only
iii only
i and iii
ii and iii
Which of the following is a 95% confidence interval for the slope of the linear regression of
interval on duration:
a) ˆ1  t 220,0.025ˆ ˆ
1
b) ˆ1  t 220,0.05ˆ ˆ
1
c) ˆ1  t 220,0.025 (37.92753)
( 37.92753 )
d) ˆ  t
1
220, 0.025
e) [33.96676 – 1.96(1.4278698), 33.96676 + 1.96(1.4278698)]
9.
An eruption ends after a duration of 4 minutes. What is the predicted waiting time until the start
of the next eruption? Round to the hundredths place for your final answer.
Answer: ______________
4
10. Consider the problem of testing H 0 :   0.6 versus H A :   0.6 using binomial data,
Y ~ Bin (n  1273,  ) . A scientist reported the test statistic z calc  1.73 . What is the p-value
associated with this test statistic.
a) 0.9582
b) 0.9163
c) 0.0418
d) 0.0836
e) 0.0500
11. If we have a random variable X from a normal distribution with mean  and variance  2 , then
we know the following about the sampling distribution of x :
s2
a) x is normally distributed with mean  and variance
n
s
b) x is normally distributed with mean  and variance
n
s
c) x has mean  and variance
but we don’t know if x has a normal distribution or not
n
d) x is normally distributed with mean  and variance
e) x has mean  and variance
2
n
2
but we don’t know if x has a normal distribution or not
n
12. Suppose a die is tossed 5 times. What is the probability of the die coming up 1 exactly 2 times?
a)
b)
c)
d)
e)
0.028
0.161
0.167
0.333
There is not enough information to answer this question.
5
13. Nine hundred (900) high school freshmen were randomly selected for a national survey. Among
survey participants, the mean grade-point average (GPA) was 2.7. The population standard
deviation is known to be 0.4. What is the margin of error, assuming a 95% confidence level?
a) 0.013
b) 0.026
c) 0.500
d) 1.960
e) None of the above
14. A national achievement test is administered annually to 3rd graders and is known to be normally
distributed. The test has a mean score of   100 and a variance of  2  225 . If Jane's z-score is
1.20, what was her score on the test?
a)
b)
c)
d)
e)
f)
82
88
100
112
118
None of the above
15. The stemplot below shows the number of hot dogs eaten by contestants in a recent hot dog eating
contest. Which of the following statement s are true?
I. The sample range is 70.
II. The sample median is 46.
III. The sample mean is 47.
a)
b)
c)
d)
e)
I only
II only
III only
I and II
I, II, and III
6
16. The probability distribution of the random variable pest type, X, is as follows:
Pest type
x
Probability
P(X=x)
1
2
3
4
0.25
0.50
0.15
0.10
What is the standard deviation,  , of the probability distribution?
a)
b)
c)
d)
e)
f)
0.50
0.62
0.79
0.89
2.10
None of the above
17. A public opinion poll surveyed a simple random sample of voters. Respondents were classified by
gender (male or female) and by voting preference (Republican, Democrat, or Independent).
Results are shown below.
Voting Preferences
Republican Democrat Independent Row total
Male
200
150
50
400
Female
250
300
50
600
Column total
450
450
100
1000
If you conduct a chi-square test of independence, what is the expected frequency count of male
Independents?
a)
b)
c)
d)
e)
40
50
60
180
270
7
18. Suppose X and Y are independent random variables. The variance (  X2 ) of X is equal to 16; and
the variance (  Y2 ) of Y is equal to 9. Let Z = X - Y.
What is the standard deviation (  Z ) of Z?
a)
b)
c)
d)
e)
2.65
5.00
7.00
25.0
It is not possible to answer this question, based on the information given.
19. Twelve identical samples of tissue were divided randomly into three groups of four specimens
each. One of the three levels of glucose concentration are randomly assigned to each of the three
groups, and each specimen in a group was treated with the assigned level of glucose. After a
period of time, the insulin released by the tissue samples was recorded. These appear in the table
below.
Concentration
Low
1.49
1.75
3.50
1.87
Medium
3.33
4.21
3.39
2.87
Source
Sum of Squares
Between samples
i) ????
Within samples
Total
3.60587
8.51823
a) Explain what the ANOVA F-test is testing
H0:
HA:
High
1.88
2.02
1.87
2.39
Degrees of
Freedom
2
Mean Square
F test
iii) ????
6.1304
9
iv) ????
ii)????
8
b) Complete the ANOVA table below SHOWING WORK to support your answers.
Source
Sum of Squares
Between
samples
i) ????
Within samples
Total
i)
ii)
iii)
iv)
3.60587
8.51823
Degrees of
Freedom
2
Mean Square
F test
iii) ????
6.1304
9
iv) ????
ii)????
9
20. Suppose we have a dependent variable y and we believe the independent variables x1 and x2 can
be used to predict y using a multiple linear regression (MLR) model. After fitting a MLR to the
data, the following output was obtained:
Analysis of Variance
Source
Regression
Error
Total
df
2
22
24
SS
181369.07387
427625.96613
608995.04000
MS
90684.53693
19437.54392
F
4.665
P
0.0205
Parameter Estimates
Predictor
Constant
x1
x2
Coef
426.033719
-6.341394
3.375532
SE Coef
252.81248037
2.61207977
1.92991857
T
1.685
-2.428
1.749
P
0.1061
0.0238
0.0942
a) Write the estimated model:
b) Based on the t test-statistic and the associated p-value associated with x2 what hypotheses are
being tested?
H0:
HA:
c) In non-statitsical terms, based on the p-value associated with x2, what can we conclude about
x2? Use   0.05 .
d) Can the p-value 0.0238 be used to test H 0 : 1  0 versus H A : 1  0 ? Explain why or why
not.
Download