Stat 101 L – Final Exam Name: ______________________ May

advertisement
Stat 101 L – Final Exam
May 5, 2008
Name: ______________________
INSTRUCTIONS: Read the questions carefully and completely. Answer each question
and show work in the space provided. Partial credit will not be given if work is not
shown. When asked to explain, describe, or comment, do so within the context of the
problem. Be sure to include units when dealing with quantitative variables.
1. [15 pts] Short answer.
a) [2] Statistics is about _______________. (Fill in the blank with one word.)
b) [2] A ____________________ is a numerical summary of a sample, while
a _____________________ is a numerical summary, of a population.
c) [3] What does “90% confidence” mean? In your explanation you cannot
use the words chance, sure, probability or confident.
d) [3] Explain why we use t instead of z when constructing a confidence
interval for the population mean.
e) [2] Holding all other things the same, if the sample size is decreased the
width of the confidence interval will _______________________.
f) [3] Sketch a normal model with μ = 80 and σ = 20.
1
2. [12 pts] Multiple Choice:
a) ___ The correct interpretation of a 95% confidence interval is?
A: I am 95% confident that the sample mean is in the interval.
B: 95% of the sample values are in the interval.
C: 95% of the population values are in the interval.
D: I am 95% confident that the population mean is in the interval.
b) ___ The P-value is …?
A: The probability of getting a value of the test statistic more extreme than
the one observed when the null hypothesis is false.
B: The probability that the null hypothesis is true.
C: The probability of getting a value of the test statistic more extreme than
the one observed when the null hypothesis is true.
D: The probability that the null hypothesis is false.
c) ___ You have calculated the correlation coefficient between two variables to
be –0.95. This would indicate?
A: A strong positive linear relationship.
B: A strong negative linear relationship.
C: No relationship.
D: No linear relationship.
d) ___ The difference between an observed value of a response variable and the
corresponding predicted value of the response is called?
A: An outlier.
B: A statistic.
C: An influential point.
D: A residual.
e) ___ When you get one measurement on each individual in a sample of males
and one measurement on each individual in a sample of females you have
A: paired sample data.
B: two independent sample data.
C: population data.
D: cluster sample data.
f) ___ Which is not a fundamental principle of experimental design?
A: Control.
B: Randomization.
C: Replication.
D: They are all fundamental principles.
2
3. [10 pts] Below is a screen shot of the web page that simulates sampling from a
population of Reese’s Pieces that contains 36% orange pieces.
a) [2] If we randomly select samples of 100 Reese’s Pieces what will be the mean of
the distribution of the sample proportion of orange Reese’s Pieces, p̂ ?
b) [4] If we randomly select samples of 100 Reese’s Pieces what will be the standard
deviation of the distribution of the sample proportion of orange Reese’s Pieces?
c) [4] Sketch the distribution of the sample proportion of orange Reese’s Pieces, p̂ ,
for random samples of size 100. Use the space to the right of the Reese’s Pieces
machine above.
3
4. [22 pts] Suppose you received a diamond as a gift. You know the size of the
diamond (50 milligrams) but you don’t know the price ($). A random sample of
25 diamonds (weighing from 25 to 100 mg) is obtained and you find the
corresponding prices.
a) [4] Answer the questions Who? and What? for this problem.
b) [4] Below is a plot of price versus weight.
800
700
Price ($)
600
500
400
300
200
100
20
30
40
50
60
70
80
Weight (mg)
Describe the relationship in terms of direction, form, strength, and indicate
any unusual points.
4
Below is partial JMP output for the least squares regression line.
Linear Fit
Predicted Price ($) = –172.04 + 12.273*Weight (mg)
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.975972
0.974928
23.77098
330.64
25
c) [5] Give an interpretation of the slope within the context of the problem.
d) [2] Use the least squares regression line to predict the price of a diamond
weighing 50 mg.
e) [2] One of the 25 diamonds in the sample weighs 50 mg and has a price of
$489. What is the residual for this diamond?
f) [3] Graph the least squares regression line on the plot in b). In order to get
full credit it must be obvious to me that you are using the equation to draw
the line.
g) [2] How much of the variability in the price of diamonds can be explained
by the linear relationship with weight?
5
3
.99
2
.95
.90
.75
.50
1
0
.25
.10
.05
.01
-1
200
250
300
Blood Cholesterol
350
Moments
Mean
Std Dev
Std Err Mean
upper 95% Mean
lower 95% Mean
N
264.6875
42.115268
10.528817
287.12914
242.24586
16
-2
-3
150
Normal Quantile Plot
5. [16 pts] A major medical center in the Northeastern U.S. conducted a study
looking at blood cholesterol levels and incidence of heart attack. Below is JMP
output on cholesterol for 16 randomly selected people who had had a heart attack.
Test Mean = value
Hypothesized Value
Actual Estimate
df
Std Dev
Test Statistic
Prob > |t|
Prob > t
Prob < t
240
264.688
15
42.1153
t Test
2.3448
0.0332
0.0166
0.9834
We wish to see if the mean cholesterol level of all people who had a heart attack
is greater than 240.
a) [3] Set up appropriate null and alternative hypotheses. Be sure to clearly
define the population parameter you are testing.
b) [4] Verify the nearly normal condition is met. Be sure to refer to all three
plots.
c) [3] What are the value of the test statistic and the P-value?
6
d) [2] Reach a decision using the P-value.
e) [4] State a conclusion in the context of the problem.
6. [12 pts] For each of the following indicate whether the study is an experiment or
an observational study. Also indicate the explanatory and response variables and
whether it is paired sample or independent sample data. Explain each choice
briefly.
a) [6] In a study of dietary calcium on blood pressure, 30 participants
experience two different diets each for one month. One diet is low in
calcium and the other diet is high in calcium. Which diet the participant
experiences first is determined by a flip of a coin. After each diet the
blood pressure of each participant in measured. The researchers want to
see if average blood pressure differs with diet.
b) [6] In a study on exercise and diet, 30 participants are asked to keep a
diary of the food they eat and the exercise they do. After 30 days the
diaries are examined and the participants are classified into one of two
groups, the healthy lifestyle group and the unhealthy lifestyle group. The
body mass index (BMI) of each participant is calculated. The researchers
wish to see if average BMI is different for the lifestyle groups.
7
7. [15 pts] A study was done in Michigan with students in grades 4 – 6. The
students were asked the following question: What would you most like to do at
school? The choices were Make good grades, Be good at sports, or Be popular.
Students came from Rural, Suburban and Urban schools. Below are the data.
Rural
Suburban
Urban
Total
Make good
grades.
57
87
103
247
Be good
at sports.
50
42
49
141
Be popular.
Total
42
22
26
90
149
151
178
478
a) [2] What is the probability that a student selected at random is from an Urban
school?
b) [2] What is the probability that a student selected at random wants to be good
at sports?
c) [3] If location of school (rural, urban, suburban) is independent of what
students most like to do at school, what is the expected count for the cell,
urban and be good at sports?
d) [3] What is the contribution to the χ 2 test statistic for the cell, urban and be
good at sports?
e) [1] How many degrees of freedom are there for the test of independence
between location and what students most like to do at school?
f) [4] The value of the χ 2 test statistic is 18.828 with an associated P-value =
0.0008. What does this tell you about school location and what students most
like to do at school? Be sure to justify your answer statistically.
8
.50
1
0
.25
.10
.05
.01
2
.95
.90
.75
.50
-1
.10
.05
-2
.01
4.5
4.6
4.7
-2
4
6
4
2
1
4.4
-1
-3
2
4.3
0
-3
3
4.2
1
.25
Count
.75
.99
4.8
pH of Rain
(1) City in Illinois
Mean
4.495
Std Dev
0.1432
N
16
Count
2
.95
.90
3
Normal Quantile Plot
3
.99
Normal Quantile Plot
8. [23 pts] A researcher is interested in seeing if the rain in a major city in Illinois is
more acid (has a lower pH) than rain in a major city in Texas. The researcher
measures the pH of rain on 20 randomly selected days in the city in Texas and on
a separate set of 16 randomly selected days in the city in Illinois. Below is JMP
output for the data.
(1) City in Illinois
(2) City in Texas
4
4.5
5
5.5
pH of Rain
(2) City in Texas
Mean
4.880
Std Dev
0.3843
N
20
a) [4] Describe the distribution of the pH of rain in the city in Illinois. Be sure
to comment of center, spread, and shape.
b) [4] Describe what you see in the Normal Quantile Plot for the pH of rain in
the city in Illinois. What does this indicate about the nearly normal condition?
9
We wish to test a hypothesis to see if the mean pH of rain in the two cities is the
same against an alternative that the mean pH of rain in the city in Illinois is less
than that in the city in Texas. Note: df = 25 for this problem.
c) [2] Set up the null and alternative hypothesis.
d) [4] Calculate the value of the test statistic.
e) [3] Use Table t to find the P-value.
f) [2] Use the P-value to reach a decision.
g) [4] State a conclusion within the context of the problem.
The Final Exam is worth 125 points. How many points do you think you got? _______
I will probably not finish grading Final Exams until the end of finals week. Course
grades will be available on Access Plus approximately one week after final exams. I
keep the final exam papers for one semester. If you wish to pick up your final exam you
can do so during final exam week fall 2008.
10
Formulas
y=
r=
∑y
n
∑ zx z y
sy
sx
Sampling Distribution of
Mean: p
x=
∑x
b0 = y − b1 x
n −1
yˆ = b0 + b1 x residual = y − yˆ
p̂ :
Standard Deviation: SD( p̂ ) =
∑ (x − x )
2
sx =
n −1
n
x−x
y− y
zx =
zy =
sx
sy
n −1
b1 = r
∑(y − y)
2
sy =
p (1 − p )
n
Standard Error: SE( y ) =
Single sample (Categorical Variable)
Confidence interval for p :
Test Statistic:
p̂ − z*
z=
p̂ (1 − p̂ )
p̂ (1 − p̂ )
to p̂ + z*
n
n
Single sample (Numerical Variable)
Confidence interval for μ:
y − t*
y:
Sampling Distribution of
Mean: μ
s
s
to y + t *
n
n
s
n
p̂ − po
p0 (1 − p0 )
n
Test Statistic:
y − μo
s
n
t=
df = n – 1
Two independent samples (Numerical Response, Categorical Explanatory)
Confidence interval for μ1 − μ2 :
Test Statistic:
(y
) *
1 − y2 − t
s12 s22
s2 s2
+
to ( y1 − y 2 ) − t * 1 + 2
n1 n2
n1 n2
Paired samples
Confidence interval for
d − t*
μd :
sd
s
to d + t * d
nd
nd
t=
(y
1
− y2 ) − 0
s12 s22
+
n1 n2
Test Statistic:
t=
d − μd
sd
nd
df = nd – 1
Test of Independence (Categorical Response, Categorical Explanatory)
Expected =
(row total )(column total ) χ 2 = (Observed − Expected )2
∑
total in sample
Expected
df = (r − 1)(c − 1)
11
Download