Statistics 101 Name Final Exam

advertisement
Statistics 101
Name
Final Exam
INSTRUCTIONS: Read the questions carefully and completely. Answer each question and show
all your work in the space provided. Partial credit cannot be given if work is not shown. When
asked to explain, describe, or comment, do so in the context of the problem.
1. (2 pts per answer) Multiple Choice. Please write the correct answer in CAPITAL LETTERS
in the blanks provided.
(a)
For data that is skewed to the right, the mean will be (A) less than (B) greater
than (C) about the same as the median.
(b)
As the sample size increases, the standard deviation of the sampling distribution
of the sample mean will (A) stay the same (B) increase (C) decrease.
(c)
As the sample size increases, the mean of the sampling distribution of the sample
mean will (A) stay the same (B) increase (C) decrease.
(d)
CNN posts a question on their web page concerning recent world events and
invites users to respond to the question. This is an example of a (A) simple random
sample (B) unbiased sample (C) convenience sample (D) voluntary response sample.
(e)
Which of the following is NOT a type of bias? (A) Non-response (B) Sampling
Variability (C) Response (D) Poor wording of questions.
(f)
Everything else being equal, the width of a 95% confidence interval will be (A)
less than (B) greater than (C) the same as the width of a 90% confidence interval.
(g)
Everything else being equal, the margin of error for a confidence interval with
a sample size of 30 will be (A) less than (B) greater than (C) the same as the margin of
error for a confidence interval with a sample size of 15.
(h)
The type of categorical display that must include all possible categories is a
(A) bar graph (B) pie chart (C) scatter plot (D) two-way table.
(i)
You have calculated the correlation coefficient between two variables as r = 0.
This would indicate (A) a strong positive linear relationship (B) a weak negative linear
relationship (C) no relationship (D) no linear relationship between the two variables.
1
(j)
In order to determine the effect of a new migraine drug on the length and severity of migraines, fifty subjects were randomly divided into two groups. One group took
the new drug at the onset of a migraine while the other group took no drugs. At the end
of the migraine, subjects were asked to rate the migraine by length and severity. This
experiment does not take into account (A) control (B) randomization (C) the placebo
effect (D) replication.
(k)
A 95% confidence interval for the population mean is calculated to be (10.5,
11.6). Choose the correct interpretation of this confidence interval. (A) We are 95%
confident that the sample mean is between 10.5 and 11.6. (B) 95% of the values from
the population are between 10.5 and 11.6. (C) 95% of the values from the sample are
between 10.5 and 11.6 (D) We are 95% confident that the population mean is between
10.5 and 11.6.
(l)
The mean and standard deviation should be used to summarize data that (A)
is skewed to the right (B) has no outliers and is symmetric (C) is skewed to the left (D)
has outliers.
(m)
An observation that does not fit the overall pattern of the data is called (A) an
outlier (B) an experiment (C) a variable (D) a bias.
(n)
In conducting a hypothesis test, the p-value is defined as the probability of (A)
getting the observed value of the test statistic or more extreme when the null hypothesis
is false (B) getting the observed value of the test statistic when the alternative hypothesis is true (C) getting the exactly the observed value of the test statistic when the null
hypothesis is true (D) getting the observed value of the test statistic or more extreme
when the null hypothesis is true.
(o)
The middle 50% of any data set is located between (A) the minimum and maximum (B) Q1 and Q3 (C) the range and interquartile range (D) the mean and median.
(p)
If the equation for a regression line is ŷ = 25 − 5.25x and R2 = 0.41, the value of
the correlation between the two variables is (A) -0.64 (B) 0.64 (C) -0.1681 (D) 0.1681.
(q)
The Central Limit Theorem states that as n increases, the sampling distribution
of Y will tend towards a (A) normal distribution with mean µ and standard deviation σ/n
(B) normal distribution with mean µ and standard deviation σ (C) skewed distribution
√
with mean µ and standard deviation σ/ n (D) normal distribution with mean µ and
√
standard deviation σ/ n.
2
2. (4 pts) The four graphs below depict the sampling distribution for the sample mean from a
sample of size n from a population that is normally distributed with mean 10 and standard
deviation 10.
(B)
0.0
0.0
0.05
0.01
0.10
0.02
0.15
0.20
0.03
0.25
0.04
(A)
-20
-10
0
10
20
30
40
-10
0
10
20
30
40
-20
-10
0
10
20
30
40
(D)
0.0
0.0
0.02
0.1
0.04
0.06
0.2
0.08
0.3
0.10
0.12
0.4
(C)
-20
-20
-10
0
10
20
30
40
Which graph depicts the sampling distribution of the sample mean when
(a)
n = 1?
(b)
n = 10?
(c)
n = 50?
(d)
n = 100?
3
3. Short Answer I (4 pts per answer) For each problem, state the null and alternative hypotheses.
DO NOT PERFORM THE ANALYSIS.
(a) The design of controls and instruments affects how easily people can use them. Twentyfive left-handed people were asked to turn a knob clockwise and counterclockwise. The
time it took each subject to turn the knob each way was then recorded. Do left-handed
people turn the knob faster counterclockwise?
(b) A previous study claimed the mean cellulose content of alfalfa hay is 142 mg/g. An
agronomist believes the true value is different from 142 mg/g. To test his belief, he
takes a simple random sample of 15 cuttings from the population and finds the sample
mean cellulose content is 145 mg/g with a sample standard deviation of 3 mg/g. Is this
sufficient evidence that the mean cellulose content of alfalfa hay is different than 142
mg/g?
(c) In a randomized comparative experiment on the effect of dietary calcium on blood pressure, 54 healthy white males were divided at random into two groups. One group received
calcium; the other, a placebo. Blood pressures for all 54 men were taken before and after
the study. The researchers believe the men in the calcium group will have a greater mean
reduction in systolic blood pressure than the men taking the placebo.
(d) In a survey, 75% of ISU undergrads surveyed would like to see VEISHA continued. Is
this sufficient evidence that at least 70% of all ISU undergrads would like to see VEISHA
continued?
4
4. (16 pts) Short Answer, II.
(a) (4 pts) In order to determine the effect of a new migraine drug on the length and severity
of migraines, fifty subjects were divided into two groups based on the personal preference
of the subject. One group took the new drug while the other group took the current
migraine drug on the market. At the end of each migraine, subjects were asked to rate
the migraine by length and severity.
Name two things wrong with this experiment.
(b) (4 pts) Iowa State University would like to obtain information from its students about
support for VEISHA across campus. A random sample of 500 students is taken and
surveys are sent to the campus addresses of the 500 students.
Name two sources of bias that could potentially affect the results of this sample.
(c) (4 pts) Of the four principles of experimentation, control, randomization, replication and
blocking, which one is not required for all experiments? Explain your answer.
(d) (4 pts) Of the four principle of experimentation, control, randomization, replication and
blocking, which one is the most important? Explain your answer.
5
5. (14 pts) The height of men in Statistics 101 is normally distributed with a mean height of
µ = 70.5 in and a standard deviation of σ = 3 in.
(a) (4 pts) What is the probability that a man selected at random from the class will have
a height less than 68 inches?
(b) (3 pts) What is the sampling distribution of the sample mean height of 5 men selected
at random from the class?
(c) (5 pts) What is the probability that the mean height of 5 men selected at random from
the class will be less than 68 inches?
(d) (2 pts) Do you need the Central Limit Theorem to answer parts (b) and (c)? Explain
your answer.
6
6. (15 pts) Many high school mathematics teachers believe the language and reading skills of a
student is an important part of the ability to learn and understand mathematics. The JMP
output attached contains a scatterplot of the ACT English and Math Scores for 100 students.
(a) (2 pts) Why was the ACT Math variable chosen to be the response variable?
(b) (2 pts) What is the least squares regression line for predicting ACT Math score from
ACT English score?
(c) (4 pts) What is the value of the slope? Give its interpretation in the context of the
problem.
(d) (4 pts) What is the value of R2 . Give its interpretation in the context of the problem.
(e) (3 pts) Describe the residual plot and make note of any potential problems with the
regression.
7
7. (10 pts) In the Gallup Poll from April 16 - 18, 2004, 520 out of the 1000 adults 18 and older
surveyed approved of the job President Bush is doing as president.
(a) (2 pts) What is the population the Gallup Poll is surveying?
(b) (4 pts) Calculate a 95% confidence interval for the population proportion of Americans
that approves of the President’s job performance. Assume all assumptions are met.
(c) (4 pts) Give an interpretation of the confidence interval you calculated in part (b).
8
6
4
0
2
Number of Stores
8
10
8. (28 pts) A manufacturer of small appliances employs a market research firm to estimate retail
sales of its products by gathering information from a sample of retail stores. This month a
simple random sample of n = 30 stores in the Midwest sales region finds that these stores
sold an average of 23.37 of the manufacturer’s hand mixers, with a sample standard deviation
of 2.43. A histogram of the 30 data points from the sample is given below.
18
20
22
24
26
28
Number of Mixers
(a) (3 pts) Describe the distribution of the number of mixers sold using the histogram. Make
sure to mention shape, center, spread and any outliers.
(b) (4 pts) Find a 95% confidence interval for the mean number of mixers sold by all stores
in the Midwest region.
(c) (4 pts) Give an interpretation of the confidence interval you calculated in part (b).
9
(d) (17 pts) Based on previous experience, company executives expect to sell a mean of
µ = 22 hand mixers during this month. Based on the data, is there sufficient evidence
to conclude the company had sales exceeding this expectation? Do the appropriate
hypothesis test. Assume a significance level of α = 0.05.
10
9. (21 pts) In an experiment to study the effect of the spectrum of ambient light on the growth of
plants, researchers assigned tobacco seedlings at random to two groups of eight plants each.
The plants were grown in a greenhouse uder identical conditions except for lighting. The
experimental group was grown under blue light, while the control group was grown under
natural light. Here are the data on stem growth in millimeters:
Control Group
4.0 3.5 3.9 3.7
4.0 3.6 3.8 3.7
Experimental Group
3.6 3.4 3.7 3.7
3.2 3.4 3.5 3.6
(a) (4 pts) Calculate the mean and median of the control group.
(b) (4 pts) The experimenter believes that the mean stem growth will be longer for the
control group than for the experimental group. What are the null and alternative hypotheses for this hypothesis test?
(c) (5 pts) The sample standard deviation for the control group is 0.18 and the sample
standard deviation for the experimental group is 0.17. Calculate the appropriate test
statistic for this hypothesis test.
11
(d) (3 pts) Find the p-value for this significance test.
(e) (2 pts) What is your decision for this significance test? Explain your answer. Assume a
significance level of α = 0.1.
(f) (3 pts) State your conclusion in the context of the problem.
12
10. (17 pts) A poll was conducted by the University of Montana in which the 202 respondents
were classified according to the area of Montana that they live in and their political party
affiliation. The results are in the table below.
West
Northeast
Southeast
Total
Democrat
39
15
30
84
Republican
17
30
31
78
Independent
12
12
16
40
Total
68
57
77
202
(a) (4 pts) The pollster would like to test if there is a relationship between the area of Montana in which a person lives and their party affiliation. State the appropriate null and
alternative hypotheses.
(b) (4 pts) Calculate the expected cell counts for the West - Democrat cell and for the
Northeast - Independent cell.
(c) (4 pts) Calculate the contribution of the cells West - Democrat and Northeast - Independent to the χ2 statistic.
(d) (2 pts) The total χ2 statistic for the two-way table is 13.849 with a p-value of 0.0078.
What is your decision about the hypothesis test if the significance level is α = 0.05?
Explain your answer.
(e) (3 pts) What is the conclusion for the test in the context of the problem?
13
Formulas
x=
xi
n
P
1
r=
n−1
y=
P
yi
n
P
sx =
(xi − x)(yi − y)
sx sy
z=
p̂ ± z
⋆
s
sP
(xi − x)2
n−1
!
sy
b1 = r
sx
y−µ
σ
z=
p̂(1 − p̂)
n
C%
z⋆
90
1.645
y−µ
√
σ/ n
po (1−po )
n
t=
y − µo
√s
n
(z ⋆ )2 (0.5)(0.5)
(M E)2
95
1.960
(y 1 − y 2 ) ± t
⋆
98
2.326
s
99
2.576
s2
s21
+ 2
n1 n2
y − y2
r1
s21
n1
expected cell count =
χ2 =
+
s22
n2
row total ∗ column total
table total
X (observed − expected)2
expected
14
(yi − y)2
n−1
b0 = y − b1 (x)
p̂ − po
z=q
s
y ± t⋆ √
n
n=
sy =
sP
Bivariate Fit of ACT Math Score By ACT English Score
ACT Math Score
35
30
25
20
15
15
20
25
30
ACT English Score
35
Linear Fit
Linear Fit
ACT Math Score = 11.727885 + 0.5483277 ACT English Score
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.40537
0.399303
3.42642
24.97
100
Analysis of Variance
Source
Model
Error
C. Total
DF
1
98
99
Sum of Squares
784.3554
1150.5546
1934.9100
Mean Square
784.355
11.740
F Ratio
66.8085
Prob > F
<.0001
Parameter Estimates
Term
Intercept
ACT English Score
Estimate
11.727885
0.5483277
Std Error
1.655936
0.067085
t Ratio
7.08
8.17
Residual
10
5
0
-5
-10
15
20
25
30
ACT English Score
35
Prob>|t|
<.0001
<.0001
Download