practice problems for final exam

advertisement
ST 305
PRACTICE PROBLEMS FOR FINAL EXAM
Reiland
Topics covered on final exam: Chapters 24-27 in text.
This material is covered in webassign homework assignments 12 through 14
Exam information: 3 hour time limit; materials allowed: calculator (no laptops), 1 8 "# x11 sheet
of paper (2 sided)with notes, definitions, formulas, etc. Normal, t, and chi-square tables will be
provided with the exam.
The questions on the final exam will be multiple choice format.
Answers are at the end of the document.
1.
To investigate the possible link between fluoride content of drinking water and cancer, the cancer
death rates (number of deaths per 100,000 population) from 1952-1969 in 20 selected U. S. cities - the
ten largest fluoridated cities and the ten largest cities not fluoridated by 1969 - were recorded. These
data were used to calculate for each city the annual rate of increase in cancer death rate over this 18
year period. The data are given below:
FLUORIDATED
City
Chicago
Philadelphia
Baltimore
Cleveland
Washington
Milwaukee
St. Louis
San Francisco
Pittsburgh
Buffalo
NONFLUORIDATED
Annual Increase in
Cancer Death Rate
1.0640
1.4118
2.1115
1.9401
3.8772
-.4561
4.8359
1.8875
4.4964
1.4045
City
Los Angeles
Boston
New Orleans
Seattle
Cincinnati
Atlanta
Kansas City
Columbus
Newark
Portland
Annual Increase in
Cancer Death Rate
.8875
1.7358
1.0165
.4923
4.0155
-1.1744
2.8132
1.7451
-.5676
2.4471
a.
Construct a 95% confidence interval for the difference between the mean annual increases in
cancer death rates for fluoridated and nonfluoridated cities. (Use 18 degrees of freedom; do not
assume equal variances).
b.
What assumption(s) must be satisfied for the confidence interval in part a) to be valid?
2a. We are interested in comparing the average supermarket prices of two leading colas. Our sample was
taken by randomly selecting eight supermarkets and recording the price of a six-pack of each brand of
cola at each supermarket. The data are shown in the following table:
Brand 1 cola
Brand 2 cola
Difference
1
$2.25
$2.30
$-0.05
2
2.47
2.45
0.02
Supermarket
3
4
2.38 2.27
2.44 2.29
-0.06 -0.02
5
2.15
2.25
-0.10
6
2.25
2.25
0
7
2.36
2.42
-0.06
8
2.37
2.40
-0.03
Summary statistics: . œ  !Þ!$(&ß =. œ !Þ!$)"
Find a 98% confidence interval for the difference in mean price of brand 1 and brand 2.
i)  !Þ!$(& „ !Þ!%!% ii)  !Þ!$(& „ !Þ"$*$ iii)  !Þ!$(& „ !Þ!%("
iv)  !Þ!$(& „ !Þ!$%(
2b. Perform a hypothesis test to test if the difference in mean prices of brand 1 and brand 2 colas is
different from 0. Use α = .02. Define the parameters, state hypotheses, calculate test statistic and T value.
ST 305
3.
Final Exam Practice Problems
page 2
A new weight-reducing technique, consisting of a liquid protein diet, is currently undergoing tests by
the Food and Drug Administration (FDA) before its introduction into the market. The weights of a
random sample of five people are recorded before they are introduced to the liquid protein diet. The
five individuals are then instructed to follow the liquid protein diet for 3 weeks. At the end of this
period, their weights (in pounds) are again recorded. The results are listed in the table. Let ." be the
true mean weight of individuals before starting the diet and let .# be the true mean weight of
individuals after 3 weeks on the diet.
1
2
3
4
5
Weight before diet 156 201 194 203 210
Weight after diet
149 196 191 197 206
Test to determine if the diet is effective at reducing weight. Use α œ Þ"!.
4.
A cell phone company wants to determine if the use of text messaging is independent of age. The
following data has been collected from a random sample of customers.
Under 21
21-39
40 and over
Regularly use
text messaging
82
57
6
Do not regularly use
text messaging
38
34
83
4a. What is the expected value for the “under 21” and “regularly use text messaging” cell.
4b. To conduct a hypothesis test using a 0.01 level of significance, the value of the critical value is:
i) 16.812 ii) 15.086 iii) 9.210 iv) 2.576 v) 2.33
4c. What contribution does the cell “21-39” and ”Do not regularly use text messaging” make to the value
of the test statistic?
4d. The value of the test statistic is 88.3. The appropriate conclusion is
i) reject H! and conclude the variables have a curvilinear relationship;
ii) reject H! and conclude the variables are not related, that is, conclude that the variables are
independent.
iii) do not reject H! and conclude the variables are independent.
iv) reject H! and conclude the variables are related.
4e. Use the values of the standardized residuals
9,=/<@/./B:/->/.
/B:/->/.
to select the correct statement regarding
the relationship between text messaging and age.
i) The ”Under 21” age group uses text messaging less than the other age groups.
ii) For each age group the standardized residual has a negative value and a positive value, so text
message and age are not related.
iii) The older the age group, the less that text messaging is used.
5.
A simple linear regression was used to predict the score y on a final exam from the score x on the first
exam. The slope of the least squares regression line is .(&. The standard error of the slope is ."" and
the sample size is 200. A 90% confidence interval for the true slope is (though the .0 is 8  #, use
200 .0 in the >-table)
a. .'% to .)' b. .&$ to .*( c. .&( to .*$ d.  .'% to .)' e. none of the above
6.
Refer to the previous problem. To test the null hypothesis that the slope is zero versus the one-sided
alternative that the slope is positive, we use the test statistic t œ
a. 6.82 b. .05 c. .15 d. .95 e. .75
ST 305
7.
Final Exam Practice Problems
page 3
When two competing teams are equally matched, the probability that each team wins any game is
!Þ&!Þ The National Basketball Association (NBA) championship goes to the team that wins four
games in a best-of-seven series. If the teams were evenly matched, the probability that the final series
ends with one of the teams sweeping four straight games would be #Ð!Þ&Ñ% œ !Þ"#& [team A wins in 4
games with probability Ð!Þ&Ñ% à team B can also win in 4 games with the same probability, so the
probability the series ends in 4 games is #Ð!Þ&Ñ% ].
Similarly team A can win in 5 games if team A wins 3 of the first 4 games and then wins game 5. So
team A wins in 5 games with probability % G$ Ð!Þ&Ñ$ Ð!Þ&чÐ!Þ&Ñ œ !Þ"#&. But team B can also win in
5 games with the same probability, so the probability that the series ends in 5 games is !Þ#&. Similar
probability calculations show that the probability is !Þ$"#& that the series lasts six games, and the
probability is !Þ$"#& that the series lasts the full seven games. The table below shows the number of
games it took to decide each of the last 57 NBA champs. Do you think the teams are usually equally
matched? Give statistical evidence to support your conclusion.
Length of series
NBA finals
8.
4 games
7
6 games
22
7 games
15
The president of a large university has been studying the relationship between male/female
supervisory structures in his institution and the level of employees' job satisfaction. The results of a
recent survey are shown in the table below. Conduct a test at the 5% significance level to determine
whether the level of job satisfaction depends on the boss/employee gender relationship.
Level of Satisfaction
Satisfied
Neutral
Dissatsfied
9.
5 games
13
Male/Female
60
27
13
Boss/Employee
Female/Male
15
45
32
Male/Male
50
48
12
Female/Female
15
50
55
(True or false) In a hypothesis test, a T -value of !Þ!$ means that there is only probability !Þ!$ that the
null hypothesis is true.
10. A chi-squared test for independence with 6 degrees of freedom results in a test statistic 13.61. Using
the tables, the most accurate statement that can be made about the P-value for this test is that:
a. P-value > .10 b. P-value > .05
c. .05 < P-value < .10
d. .025 < P-value < .05
11. Sociologists are of the opinion that there has been a decrease in the difference in ages at first marriage
for men and women since 1975. We want to examine data to determine if this decrease is significant.
The following data summary and regression results were obtained, where the B variable is year and the
C variable is the age difference (husband age  wife age) at first marriage.
Variable
Count
Year (B)
24
husband-wife age (C) 24
Mean
1986.5
2.3125
StDev
7.071
0.249
ST 305
Final Exam Practice Problems
page 4
a.
b.
c.
Interpret the value of the least squares slope ," .
What is the value of the test statistic for testing L! À "" œ !?
For the hypothesis test L! À "" œ ! vs L+ À ""  !ß select the choice below that gives the
correct T -value and correct conclusion.
i. The T -value is 0.68; do not reject L! À "" œ !; there is no linear relationship since 1975
between year and age difference between husband and wife at first marriage.
ii. The T -value is 0.000152; reject L! À "" œ !; there is evidence that since 1975 the age
difference (husband age  wife age) has increased.
iii The T -value is 0.0001275; reject L! À "" œ !; there is evidence that since 1975 the age
difference (husband age  wife age) has decreased.
iv The T -value is 0.0001275; do not reject L! À "" œ !; there is no linear relationship since 1975
between year and age difference between husband and wife at first marriage.
d.
e.
What is a 95% confidence interval for the slope?
Find a 95% confidence interval for the mean difference (husband age  wife age) at first marriage
in 1998.
Find a 95% prediction interval for the difference (husband age  wife age) for a particular couple
getting married for the first time in 1998.
f.
12. Over 6 decades the Gallup Organization has periodically asked the following question:
If your party nominated a generally well-qualified person for president who happened to
be a woman, would you vote for that person?
Below is a table showing the percentage answering “yes” and the year of the century (37 =
1937).
% Yes 92 82 78 80 76 73 66 53 57 55 57 54 52 48 33 33
Year
99 87 84 83 78 75 71 69 67 63 59 58 55 49 45 37
summary statistics:
B œ '(Þ%%
a.
b.
c.
=B œ "'Þ(
C œ '"Þ)"
=C œ "(Þ"*
< œ Þ*("
Determine the estimates ,! and ," of the parameters "! and "" in the linear model
y œ "!  "" x  %, where B is year and C is the percentage who respond “yes”.
Use the least squares line to estimate the percentage of respondents that would say “yes” in 1997.
Determine the estimate =/ of the standard deviation 5 of the error component %
ST 305
Final Exam Practice Problems
page 5
(note that the sum of squares of residuals  ÐC3  sC3 Ñ# œ 255.748).
"'
3œ"
d.
e.
f.
Calculate a 95% confidence interval for the slope "" . (Note that WIÐ," Ñ œ  / )
8" =B
=
Conduct an appropriate hypothesis test (use α = .05) to determine if the year of the century is
useful for predicting the percentage of respondents that would answer “yes” to the above question.
State the hypotheses, find the value of the test statistic, and state your conclusion based on the Pvalue or the rejection region.
What assumptions must be (approximately) satisfied for the above statistical procedures to be
valid? Perform 2 diagnostics to check these assumptions.
SOLUTIONS
1.
a.
B" œ 2.2573, s21 œ 2.753, n" œ 10; B# œ 1.3411, s22 œ 2.429, n# œ 10, t.025,") œ 2.101,
(2.2573  1.3411) „ 2.101 2.753
10 
2.429
10
œ .9162 „ 1.5124 œ (  .5962, 2.4286).
b. (i) The two samples must be independently selected (ii) The distributions of increases in cancer
death rates for fluoridated and nonfluoridated cities can be approximated by normal models. (iii) The
two groups we are comparing must be independent of each other.
2a. i). . „ >Þ!"ß( =.8 œ  !Þ!$(& „ #Þ**)! !Þ!$)"
œ  !Þ!$(& „ !Þ!%!%Þ
)
2b. L! À ."  .# œ .. œ !
LEÀ ."  .# œ .. Á !
α œ Þ!#
where ." is the mean price of brand 1 cola and .# is the mean price of brand 2 cola.
test statistic: > œ Þ!$(&
œ  #Þ()%.
Þ!$)"
)
3.
T  @+6?/ œ #T Ð>(  #Þ()%Ñ œ Þ!#(Þ Since T  @+6?/  Þ!#, do not reject the null hypothesis
L! À ."  .# œ .. œ !. There appears to be no significant difference in the mean price of the 2
brands of soda.
L! À ."  .# œ .. œ !ß LE À ."  .# œ ..  !à α œ Þ"!
. œ &ß =. œ "Þ&)Þ
&
> œ =.. œ "Þ&)
œ (Þ!)à T -value œ T Ð>%  (Þ!)Ñ ¸ Þ!!"Þ
8
&
Conclusion: Reject L! and conclude that the mean weight of individuals after the diet is less than the
mean weight of individuals before the diet.
4.
Under 21
21-39
40 and over
4a.
4b.
4c.
4d.
4e.
Regularly use
text messaging
82 (58)
57 (43.983)
6 (43.017)
Do not regularly use
text messaging
38 (62)
34 (47.017)
83 (45.983)
‡ -96?78 >9>+6
expected count: <9A >9>+6
œ &)Þ
1<+8. >9>+6
iii) 9.210 (.0 œ Ð$  "чÐ#  "ÑÑ
#
Ðobserved - expected)#
œ Ð$%%(Þ!"(Ñ
œ $Þ'!%
expected
%(Þ!"(
iv) reject H! and conclude the variables are related.
iii) The older the age group, the less that text messaging is used. (see table below)
ST 305
Final Exam Practice Problems
standardized residuals
Under 21
21-39
40 and over
Regularly use
text messaging
$Þ"&
"Þ*'
 &Þ'%
page 6
9,=/B:
/B:
Do not regularly use
text messaging
 $Þ!&
 "Þ*!
&Þ%'
5. c ," „ >‡ WIÐ," Ñ œ Þ(& „ "Þ'&#&‡Þ""
,"
6. a > œ WIÐ,
œ Þ(&
Þ""
"Ñ
7. (chap 26)
L! À teams are evenly matched
L+ À teams are not evenly matched
The table below shows the observed values in each cell and the expected cell values in parentheses if
the teams are evenly matched.
Length of series
NBA finals
Test statistic:
#
\ # œ Ð((Þ"#&Ñ

(Þ"#&
4 games
7 (7.125)
Ð"$"%Þ#&Ñ#
"%Þ#&

5 games
13 (14.25)
Ð##"(Þ)"#&Ñ#
"(Þ)"#&
6 games
22 (17.8125)

Ð"&"(Þ)"#&Ñ#
"(Þ)"#&
7 games
15 (17.8125)
œ "Þ&%
Rejection region: if the null hypothesis is true, the test statistic \ # has a chi-square distribution with
Ð5  "Ñ œ Ð%  "Ñ œ $ .0 . If α œ Þ!&, the RR is \ #  (Þ)"&. Note that T  @+6?/ œ !Þ'(
8.
Conclusion: do not reject L! . There is no evidence that the NBA championship series are inconsistent
with the conjecture that the teams are evenly matched.
(chap 26)
Expected cell counts are in parentheses:
Boss/Employee
Level of Satisfaction Male/Female Female/Male
Male/Male
Female/Female
Satisfied
60 (33.175) 15 (30.521)
50 (36.493) 15 (39.81)
Neutral
27 (40.284) 45 (37.062)
48 (44.313 50 (48.341)
Dissatsfied
13 (26.54)
32 (24.417)
12 (29.194) 55 (31.848)
L! À Boss/employee relationship and job satisfaction are independent
L+ À Boss/employee relationship and job satisfaction are dependent
Test statistic: ;# œ  Ð9,=/<@/./B:/->/.Ñ
œ *#Þ(!*; degrees of freedom œ Ð3  "чÐ4  "Ñ œ '
/B:/->/.
"#
#
"
T  @+6?/ œ Þ!!!. Conclusion: reject H! and conclude that boss/employee relationship and job
satisfaction are related.
9. false 10. d
11. a. ," œ  !Þ!#% (approximately); this means that each year since 1975 the average difference
(husband age  wife age) has decreased by Þ!#%
,"
b. > œ WIÐ,
œ !Þ!#$*&'ÞÞÞ
!Þ!!&&!$ÞÞÞ œ  %Þ$&$""
"Ñ
c. iii. The T -value given in the Excel output is always for a 2-tail test; since we are conducting a 1tail test L+ À ""  !, the T -value is !Þ!!!#&&
œ !Þ!!!"#(&
#
d. Ð  !Þ!$&$'*(ß  !Þ!"#&%$$&Ñ from the output; notice that the interval is entirely negative.
e. use sC"**) „ >‡8# WIÐ.
s"**) Ñ;
for the calculations below, note that from the output we have =/ œ !Þ")''#'%*;
sC"**) œ %*Þ*!#"$!%$  !Þ!#$*&'&##Ð"**)Ñ œ #Þ!$(
ST 305
Final Exam Practice Problems
WIÐ.
s"**) Ñ œ WI # Ð," Ñ ‚ ÐB/  BÑ# 
=#/
8
WIÐC
s"**) Ñ œ WI # Ð," Ñ ‚ ÐB/  BÑ# 
=#/
8
œ ÐÞ!!&&!$$"&Ñ# ‚ Ð"**)  "*)'Þ&Ñ# 
page 7
!Þ")''#'%*#
#%
œ Þ!($)'))*à
>‡8# WIÐ.
s"**) Ñ œ #Þ!(%Ð Þ!($)'))*Ñ œ Þ"&$#!%à
so sC"**) „ >‡8# WIÐ.
s"**) Ñ œ #Þ!$( „ Þ"&$#!% Ê ("Þ))$(*', #Þ"*!#!%)
f. use sC"**) „ >‡8# WIÐC
s"**) Ñ;
for the calculations below, note that from the output we have =/ œ !Þ")''#'%*;
sC"**) œ %*Þ*!#"$!%$  !Þ!#$*&'&##Ð"**)Ñ œ #Þ!$(à
ÐÞ!!&&!$$"&Ñ# ‚ Ð"**)  "*)'Þ&Ñ# 
 =#/ œ
!Þ")''#'%*#
#%
 !Þ")''#'%*# œ Þ#!!("%à
>‡8# WIÐC
s"**) Ñ œ #Þ!(%ÐÞ#!!("%Ñ œ Þ%"'#)à
so sC"**) „ >‡8# WIÐC
s"**) Ñ œ #Þ!$( „ Þ%"'#) Ê "Þ'#!(#ß #Þ%&$#)
=
12. a. ," œ < =BC œ Þ*(" "(Þ"*
œ
Þ***%*à ,! œ C  ," B œ '"Þ)"  Þ***%*Ð'(Þ%%Ñ œ  &Þ&*$&);
"'Þ(
b. sC*( œ  &Þ&*$&)  Þ***%*Ð*(Ñ œ *"Þ$'
 #&&Þ(%)
c. =/ œ  WWI
œ %Þ#(%;
8# œ
"%
d. WIÐ," Ñ œ  / =  %Þ#(% œ Þ!''";
8" =B
"&‡"'Þ(
confidence interval is ," „ >‡8# WIÐ," Ñ œ Þ***%* „ #Þ"%&ÐÞ!''"Ñ œ Þ***%* „ Þ"%"() Ê
ÐÞ)&(("ß "Þ"%"#(Ñ
e. L! À "" œ ! vs L+ À "" Á !à
,"
test statistic > œ WIÐ,
œ Þ***%*
Þ!''" œ "&Þ"#à
"Ñ
for α œ Þ!&ß the rejection region is >  #Þ"%& and >   #Þ"%& (8  # œ "'  # œ "% .0 Ñ;
the T -value is 0 to nine decimal places.
Conclusion: since the test statistic is in the rejection region, reject L! À "" œ ! and conclude that year
is useful for predicting the percentage of respondents that will answer “yes” to the question.
f. In the model C3 œ "!  "" B3  %3 , the error terms %3 are assumed to be independent and normally
distributed with mean 0 and standard deviation 5 for all B, that is, %3 µ iid R Ð!ß 5Ñ for all BÞ
=
Below is the least squares line on the scatterplot, a plot of the residuals against the B-values, and a
histogram of the residuals. The first 2 plots cast some doubt on the assumption of independent errors
since the residuals appear to occur in groups of positive and negative residuals. The histogram casts
some doubt on the normality of the errors since the residuals show some left-skewness.
%Yes vs Year
105
y = 0.9994x ‐ 5.5827
R² = 0.9423
95
85
75
65
55
45
35
25
35
45
55
65
75
85
95
105
ST 305
Final Exam Practice Problems
Residual Plot
10
Residuals
5
0
‐5 30
50
70
‐10
‐15
Year
90
110
page 8
Download