final, F08 - Humboldt State University

advertisement
Statistics 108, Final, Fall 2008
Name (print):_____________________
137 points.
I confirm that I am allowed only a calculator and one 3-by-5 inch note card of notes for this exam. I will not look at
anybody else’s exam and I will take all necessary efforts to prevent others from seeing my exam. I will use the provided
probability tables to cover up my work and answers. The consequence of using additional test aids, copying from others,
or allowing others to copy my work can result in an F grade for this class.
I have read and agree with the above statement.
Signature: _____________________
(a.) Show all work to receive full credit.
(b.) Circle your final answer.
(c.) Ask the instructor for clarification if any questions are unclear.
(1) Suppose you are estimating the mean weight of adult fish of a particular species. You collect a sample of 100 fish.
After calculating the mean and standard deviation of the 100 fish, you state that 95% confidence interval for the mean
weight of fish is (486, 522) grams.
(1a. 3pts) “You are 95% confident that the mean weight of _________________ is between 486 and 522 grams.”
Which phrase, (i) or (ii), most correctly fills the blank?
(i) the 100 sampled fish
(ii) the fish in the population
(1b. 3pts) Expanding on question (1a), specifically explain what is meant by being “95% confident”. Use the idea
of repeated samples or the movie “Groundhog Day”.
(1c. 3pts) The general structure of most confidence intervals is (fill in blank):
Estimate  Multiplier  _____________
(2. 2pts) Suppose insects of a particular species were sampled from two different environments. 40 insects were sampled
from environment A and 39 insects from environment B. A two-sample t-test was performed to test the null hypothesis of
equal lengths. The resulting test statistic was 2.39 giving a p-value of 0.02. Using a level of significance of 5%, we
should ____________ the null hypothesis. (Circle: (i)keep, (ii)reject )
(3. 3pts) When performing a statistical test, we do our calculations assuming the __________ hypothesis is true. Circle
the correct answer to fill in the blank:
(a) alternative (b) null
(c) power (d) third
(4. 2pts) In general, when performing a statistical test, we first calculate a test statistic. The test statistic is then compared
to a probability distribution to obtain a _________________ which we compare to 0.05 (α) to determine our decision
regarding the hypotheses.
Circle the correct answer to fill in the blank:
(a) beta value, (b) mean, (c) p-value, (d) standard deviation
1
(5. 4pts) Circle which statement best describes a p-value?
 The p-value is the probability of the null hypothesis being true.
 The p-value is the probability of the alternative hypothesis being true.
 Assuming the alternative hypothesis is true, the p-value is the probability of getting a test statistic as
extreme or more extreme than the test statistic you calculated from your sampled data.
 Assuming the null hypothesis is true, the p-value is the probability of getting a test statistic as extreme or
more extreme than the test statistic you calculated from your sampled data.
 The p-value is the probability of committing a type 1 error.
 The p-value is the probability of committing a type 2 error.
(6. 3pts) Inspect the below dotplots. Which sample has the larger variance?
Circle: (a) Sample 1, (b) Sample 2, (c) Same variance
Dotplot of Sample 1, Sample 2
Sample 1
Sample 2
36
54
72
90
108
126
144
162
Data
(7) A company claims on average there are 16 ounces of cereal in their boxes of cereal. A suspicious statistician
performed a 1-sample t- test was to determine if the mean weight of the company’s cereal boxes was less than 16 ounces.
The contents from 100 boxes were measured resulting in a mean of 15.8 ounces and a standard deviation of 1.
(7a. 3pts) Complete the null and alternative hypotheses.
𝐻0 : µ
16
𝐻𝐴 : µ
16
(7b. 3pts) Calculate the t-statistic.
(8) Suppose you performed a t-test where 𝐻0 : 𝜇 = 𝜇0 vs. 𝐻𝐴 : 𝜇 ≠ 𝜇0 and your degrees-of-freedom was 10.
(8a. 3pts) Using α =0.05, state the critical value(s) for this test.
(8b. 3pts) Suppose the t-statistic was 1.37. Calculate the p-value.
2
(9. 2pts) In the Venn diagram below, shade in the appropriate region or regions to represent 𝐴̅ .
(10. 4pts) Circle which one statement best describes the key idea of the central limit theorem.
 As the sample size increases, the distribution of the sample becomes distributed like the normal
distribution.
 As the sample sizes increases, the distribution of the sample means become distributed more like the
normal distribution.
 As the sample size increases, the population variance decreases.
 As the sample size increases, the population becomes distributed more like the normal distribution.
(11. 2pts) In the Venn diagram below, shade in the appropriate region or regions to represent 𝐴 ∩ 𝐵 .
(12. 4pts) Fill in the blanks: A ______________ [chose: continuous or discrete] random variable’s distribution is
described by a density curve. The height of the density curve will always be 0 or greater and the total area beneath the
curve will be ______________[hint: a number] .
3
(13) Suppose the university performed a study to get an understanding about the finances of all students at Humboldt State
University. To get this picture, the university randomly sampled 200 student ID numbers, contacted those students and
interviewed them.
(13a. 3pts): The researcher calculated the mean amount of debt of the 200 students.
for these 200 students a parameter or statistic? Circle: (i) Parameter (ii) Statistic
Is the mean amount of debt
(13b. 4pts): 80 of these 200 students had student loans. Calculate a 95% confidence interval for the proportion
of HSU students that have student loans. Show the formula you used.
(14. 4pts) In simple linear regression, the best line for y   0   1 x is determined via “least-squares”. Describe in detail
how the line is determined. (Hint: Explain “least-squares”.)
(15. 2pts) In the Venn diagram below, shade in the appropriate region or regions to represent 𝐴⋃𝐵̅ .
4
(16. 2pts) In the Venn diagram below, shade in the appropriate region or regions to represent 𝐴̅ ∩ 𝐵̅ .
(17. 3pts) Morning weather conditions were recorded for 80 days of winter in Eureka. Suppose 20 mornings were rainy,
40 foggy, and 20 sunny. Sketch a pie chart of this data making sure you label the rainy, foggy, and sunny sections. How
many degrees should be assigned to the “foggy” slice.
(18.) The number of eggs in a nest is distributed according to the following fictitious probability distribution. A nest can
contain as little as one egg and as many as 4 eggs. The probability distribution for the number of eggs (k) is given below
with the exception of the probability for 3 eggs.
K P(X=k)
cdf: P( X  k )
1
0.2
???=
2
0.3
???=
3
?????
???=
4
0.1
???=
(18a. 4pts) Fill in the table for P(X=3). Show your work.
(18b. 3pts) Fill in the table for the cumulative distribution function (cdf) column. Show your work.
(18c. 3pts) Calculate the expected value for X, E(X). Show your work.
5
(19.) Below is a scatter plot and simple linear regression using tree diameter to predict height. The regression equation is
HEIGHT = 62.03 + 12.65 DIAMETER.
Fitted Line Plot
HEIGHT = 62.03 + 12.65 DIAMETER
90
S
R-Sq
R-Sq(adj)
85
5.53846
27.0%
24.4%
HEIGHT
80
75
70
65
60
0.6
0.8
1.0
1.2
DIAMETER
1.4
1.6
1.8
(19a. 4pts) If a randomly sampled tree had a diameter of 1.72 units, using the regression analysis, what would be
its expected height? Show your calculations. Answer: ___________________.
(19b. 2pts) Calculate the residual value for the height of the tree in the problem 19a, assuming its actual height
was 88. Show your work.
Answer: ___________________________
(19c. 3pt) For each unit change in diameter, you would expect height to increase how many units?
Answer:
__________________
(20. 5pts) Suppose the random variable X is distributed according to the normal distribution with mean=50 and sd=5.
Furthermore suppose a random sample of 25 values will be collected and their sample mean 𝑋̅ calculated.
Calculate P(𝑋̅ > 51). Show your work.
Answer:__________________
6
(21. 3pts) A fisheries student studied the weights of 2-year old rainbow trout raised in three different creeks. 35 fish were
captured in creek A, 33 in creek B, and 35 in creek C. Below is Minitab ANOVA output for the data.
One-way ANOVA: A, B, C
Analysis of Variance
Source
DF
SS
Factor
2
4.09
Error
100
100.93
Total
102
105.02
Level
A
B
C
N
35
33
35
Pooled StDev =
Mean
2.503
2.985
2.656
1.005
MS
2.04
1.01
StDev
0.997
1.033
0.985
F
2.02
P
0.138
Individual 95% CIs For Mean
Based on Pooled StDev
---------+---------+---------+------(---------*--------)
(---------*---------)
(---------*---------)
---------+---------+---------+------2.45
2.80
3.15
Which conclusion is most appropriately inferred from the ANOVA output if using a level of significance of 5%?

There is insignificant statistical evidence to suggest that the mean weight of 2-year old rainbow trout
differs between creeks A,B, and C. (One-way ANOVA, P=0.138)

There is statistically significant evidence that the mean weight of 2-year old rainbow trout differs
between creeks A, B, and C. (One-way ANOVA, P=0.138)

There is insignificant statistical evidence to suggest that the standard deviations of 2-year old rainbow
trout differs between creeks A,B, and C. (One-way ANOVA, P=0.138)

There is statistically significant evidence that the standard deviations of 2-year old rainbow trout differs
between creeks A, B, and C. (One-way ANOVA, P=0.138)
(22. 3pts) Suppose a statistical test is performed using 𝛼 = 0.05. Also suppose the null hypothesis is true, but when
performing the test you do not know that the null hypothesis is true. What is the probability of committing a Type 1 error
for this test?
Answer:_________________
(23. 5pts) Suppose the average length of 27 salamanders is 56mm with a standard deviation of 14mm. Calculate the 95%
confidence interval for salamander lengths. Show your work.
7
(24.) Suppose you were doing a psychological test on mice and wanted to know whether or not they favor a specific color.
You placed food in red, black, and white bowls. 198 different times, a mouse (different mouse each time) would be
observed to see which bowl of food it finished all the food from first. The data were: red=64, black=60, and white=74.
Calculate the chi-square statistic for a goodness-of-fit test on these data to test the idea that the mice had no preference.
(24a. 3pts.) Expected number of red =________, black=__________, white= ______________. Show your
work.
(24b. 3pts) Chi-square statistic = _________________________. Show your work.
(24c. 2pts.) Determine the critical value: df=____________, critical value=________________
(24d. 2pts) True or False: Reject the null hypothesis.
(25. 4pts) Circle the correlation that best fits the below graph?
(i) –1.34 (ii) –1 (iii) –0.79 (iv) –0.05 (v) 0 (vi) +0.05 (vii) +0.79 (viii) +1 (ix) +1.34
Scatterplot of X vs Y
115
110
X
105
100
95
90
90
95
100
105
110
115
Y
(26. 4pts) Suppose you toss 3 fair dice in a row.
What is the probability of the first die being a 6, the second being a 1, and the third being a 6? Show your work.
8
(27.) The below contingency table consists of data describing the number of cycles between stopping birth control and a
planned pregnancy. Women are categorized by smokers (1st row) and non-smokers (2nd row).
first cycle
Smokers
29
non-smokers 198
Total
227
2+ cycles
71
288
359
total
100
486
586
(27a. 3pts) Estimate the probability: P( 2+ cycle | smoker ).
Probability = _________________
(27b. 3pts) Assuming independence between smoking and number of cycles, how many women were expected to
fall in the category of non-smoker & 2+ cycles ? Show your work.
Expected value=____________________
(27c. 2pts) The chi-square statistic value for this chi-square test of independence analysis was 4.817. Using a
level of significance of 5% (α=0.05), determine the appropriate critical value.
Df = ____________
Critical value= ______________
(27d. 3pts) In words, state the null and alternative hypothesis for this test.
Ho:
Ha:
(27e. 2pts) True or False: The null hypothesis should be rejected.
9
(28. 3pts) A doctor wanted to find out if smoking affected the physical endurance of mice. The doctor forced 100 mice to
breathe high daily doses of cigarette smoke and another 100 mice were not exposed to smoke. After 2 months the mice
were forced to run on a hamster wheel until they tumbled from exhaustion. The length of time until they started to tumble
was measured for each mouse. It was thought that the mean length of time for the smoking mice would be less than that
of the non-smoking mice. (This was not a real study – no mice were harmed in the course of making this problem.) Circle which statistical test
would be most appropriate for this dataset.







Chi-square test for independence
1-sample t-test
2-sample t-test
ANOVA
Chi-square goodness-of-fit test
paired t-test
simple linear regression
(29. 3pts) People have a “dominant” and “non-dominant” eye. If you were to look through a small hole in a paper (or
your hands) arms length away, you’d use your dominant eye. A scientist wanted to study whether people’s dominant eye
is, on average, stronger than their non-dominant eye. (Perhaps the dominant eye is exercised more.) Using randomly
selected people, she measured from how many feet away each person could read a sign with their dominant eye and again
with their non-dominant eye. Circle which statistical test would be most appropriate for this dataset.







Chi-square test for independence
1-sample t-test
2-sample t-test
ANOVA
Chi-square goodness-of-fit test
paired t-test
simple linear regression
10
Download