Practice Final Exam Answers

advertisement
Math 311
Practice Final Exam
1
Part I—Definitions and Brief Answers (Type your answers.)
1. Define Confidence Level.
A confidence level is a probability, based on a probability distriburion, that a score
will end up within a particular confidence interval.
2. Define Type I Error.
A Type I error is a probability of rejecting Ho, when, in fact, Ho is true.
3. Define Type II Error.
A Type II error is the probability of NOT rejecting Ho, when, in fact, Ho is false.
4. Define Population Variance
and state the formula.
The population variance is the average of the squared differences from the mean of
5. Define Population Standard
Deviation and state the formula.
6. State precisely how and why
the formulas in (3.) and (4.) are
different from those for
samples.
7. There are many types of
standard deviations. What is
their common usage?
8. An anthropologist wishes to
use a normal curve to make a
statistical decision about
population height on a
combined group of Pygmy
people (average height 4’11”)
and Watusi people (average
height = 5’9”). Briefly comment
on the validity of the procedure.
9. A quality control engineer
tests two small samples of
electronic switches for duration
of service. Sample A switches all
burn out after similar durations,
but Sample B switches burn out
after very different durations,
lasting a few seconds to a few
years. What type of statistic
should the engineer use to test
differences of life expectancies
of the two types of switches?
all the populations scores. 𝜎2 =
2
∑𝑛
𝑖=1(𝜇−𝑥𝑖 )
𝑛
The population standard deviation is the square root of the population variance 𝜎 =
2
∑𝑛
𝑖=1(𝜇−𝑥𝑖 )
√
𝑛
.
The sample variance and standard deviation differ from those of the population by
dividing by n-1 (weighted average). This is done to control for bias: relatively small
samples tend to be close to the mean, making measures of spread too small, unless
we “average” by dividing by a smaller number (n-1).
Intuitively, the standard deviation is a scaling factor (yardstick) for calculating
probability; roughly 68% of the scores lie within 1 SD of the mean, 95% lie within
2SDs of the mean, and 99.7% lie within 3 SDs of the mean.
The combined population of Pigmies and Watuis is NOT normally distributed with
respect to height. In fact it is bimodal (two humps). Statistics based on heights of
this population will not be valid.
The scenario suggest an independent, (small sample) t-test where (unequal)
variances are not pooled.
Math 311
Part II: Hand Calculations
Practice Final Exam
2
Calculate the following using the data set: 10, 15, 25, 23, 26, 36, 50. SHOW YOUR WORK
1. Mean
The mean is the arithmetic average of the data, approximately
26.42857143
2. Median
3. Assume the sample above is
from a normally distributed
population. Calculate the
sample standard deviation
(Start with the formula and
show the main steps.)
The ordered data set is 10,15,23,25,26,36,50, and the median is the middle score: 25.
2
∑𝑛
𝑖=1(𝜇−𝑥𝑖 )
𝑠= √
𝑛−1
~√
2
∑𝑛
𝑖=1(26.43−𝑥𝑖 )
𝑛−1
~ 13.30234
4. Assume the sample in the
data set above is from a
normally distributed
population. Estimate a 68%
confidence interval around the
Sample Mean. (Show your
reasoning!)
68% of the scores should lie approximately 1SD = 13.3 away from the sample mean =
26.43. Therefore the confidence interval is approximated by
5. Mars Candy Company claims
that 13% (by number) of their
plain m&ms are brown. In a
sample of 20 m&ms, what is
the probability that 10 to 13
of them (inclusive) are brown?
(Show your reasoning.)
6. Suppose μ = 55, σ =3, 𝑥̅ =45,
n=30 for a normally distributed
sample. Transform to the zscale and demonstrate if 𝑥̅ is
inside or outside a 99%
confidence interval.
7. Suppose a pre-test and posttest are given to 10 students
with the average of the post –
pre scores being 3.1. S = 5.06.
A significant improvement was
expected at the 95%
confidence level. Was the
improvement significant?
Calculate a confidence interval
and test the proper
hypothesis.
We can compute this probability exactly, using the binomial distribution:
P = C(20,10)(.1310)(.87)10+ C(20,11)(.1311)(.87)9 + C(20,12)(.1312)(.87)8 +
C(20,13)(.1313)(.87)7 ~ 0.000783
(26.43-13.30, 26.43+13.30) = (13.13, 39.73)
Use a table or MS Excel to find the 99% CI on the z-scale: (-2.576, 2.576)
𝑥̅ −μ
√n
Z = σ/
45−55
√30
= 3/
~ -18.26.
Ho: upost < upre
Ha: upost > upre
The correct design is a paired (dependent) sample test with a one-sided, 95% CI,
which on the score difference scale is (-∞, 0+1.83*5.06/√10) = (-∞, 2.93).
Note that post-pre = 3.1 is outside of this interval, so the improvement was significant
at the 95% confidence level.
Math 311
Part III—Scenarios
Download Data Sets here. Type your answers.
Practice Final Exam
3
Scenario 1. A random sample of voters from Math 311 indicated the voting choices shown in the data set. Can you be
95% certain that voting for a gubernatorial candidate was associated with voting for a presidential candidate?
Paste 3-D Pivot Chart Here.
Ho: All four observed proportions are the same as
expected proportions.
Ha: At least one observed proportion is not equal to an
expected proportion.
Presidential Choice vs Gubernatorial
Choice
Past Pivot Table Here
Count of
Gubernatorial
Choice
Row Labels
I
M
Grand Total
Column Labels
O R Grand Total
16 1
17
6
6
16 7
23
20
16
10
1
0
R
I
M
Paste Observed Value Matrix
6
O
X2 = 18.55
Df = (n-1)(m-1) = 1
16 1
[
]
0 6
P = .0000165
Conclusion
Paste Expected Value Matrix
11.8 4.17
[
]
5.17 1.83
Reject Ho.
Choice of presidential candidate is associated with
(dependent on) choice of governor (or vice-versa). However,
results are compromised by cell values less than 5.
Math 311
Practice Final Exam
4
Scenario 2
A mathematics instructor is wondering if his students performed significantly better than last year’s students on his final
exam at the 95% confidence level. Last year’s average was 74. This year’s data can be found in the data set.
Ho: u< 74
Name your technology method here:
Ha: u > 74
Calculator
Circle one:
One-sided
Two-sided
Type of Test: 1 sample t-test: one-sided (easy with
Note: I would accept a two-sided test, because the
calculator; otherwise done by hand; Excel doesn’t do this.) problem suggests only slight evidence for a one-sided test.
Appropriate t or z (circle one) value: t = -.203
Circle one:
Retain Ho Reject Ho
P-value: p = .58
Conclusion: This year’s students did NOT significantly
outperform last year’s students at the 95% confidence
level.
Scenario 3
A quality control engineer suspects an extruding machine loses accuracy when the shop gets warm in the afternoon. To
test her conjecture the engineer sets a 95% confidence interval and randomly measures circular extruded lids from the
machine in both morning and afternoon samples. The lids are paired by the order in which they are extruded during
these shifts, and there is a 1 hour lunch break between samples. What can the engineer rightly conclude? (See the
data.) Here you must guide yourself through the complete statistical process, and show the way through it. You may use
technology to help you, but indicate which technology you used.
What I am looking for here is that you thoroughly consider the issue rather than follow a protocol. In essence, this
would be “data mining” for a quality control engineer. Here is a synopsis:
This scenario suggests a paired-sample t-test, but losing accuracy is not well-defined, so a two-sided, paired-sample test
is best. Homogeneity of variances is assumed here, and the differences of scores are skewed to the left. Importantly,
the sample variances differ by a factor of 100, and are greater in the afternoon. Although the paired sample t-test
shows no significant differences at the 95% confidence level (p = .43), the great afternoon variance may explain
accuracy loss in the heat of the afternoon. Also, the tolerance level of the lid diameters was NOT stated, but the
afternoon average lid diameter is almost .2 units larger than that in the morning. All of this suggests (WITHOUT
SIGNIFICANCE) that there might be some effect that cannot be measured in diameter (only).
Differences
5
4
3
2
1
0
Download