Answer key

advertisement
STATISTICS FOR THE SOCIAL AND BEHAVIORAL SCIENCES
RECITATION 8
ANSWER KEY
Chapter 4
1 True or False: As the sample size increases, the standard error of the sampling
distribution of 𝑦̅ increases. (Explain your answer).
False. As sample size increases the standard error of the sampling distribution
decreases. We know this because as n gets larger, we are closer to sampling the
entire population. If we sample the entire population the standard error would be 0
because our sample mean would be the population mean.
2 The standard deviation of a discrete probability distribution is
𝜎 = √∑(𝑦 − 𝜇)2 𝑃(𝑦)
Suppose P(y=1)=π and P(y=0)=1- π, where π represents a number between 0
and 1. Find the standard deviation and standard error of a sample proportion for
a random sample of size n (in terms of π and n).
The first thing we need to notice is that
𝜇 = 𝐸(𝑌) = 1 × 𝑃(𝑦 = 1) + 0 × 𝑃(𝑦 = 0) = 𝜋
Then,
𝜎 = √(1 − 𝜋 )2 × (𝜋) + (0 − 𝜋 )2 × (1 − 𝜋 ) =
√(1 − 2𝜋 + 𝜋 2 ) × 𝜋 + (0 − 2 × 0𝜋 + 𝜋 2 ) × (1 − 𝜋 ) =
√𝜋 − 2𝜋 2 + 𝜋 3 + 𝜋 2 − 𝜋 3 =
√𝜋 − 𝜋 2 = √𝜋 × (1 − 𝜋)
The standard error is
𝑠𝑒 =
√𝜋 × (1 − 𝜋)
√𝑛
3 Which of the following best explains the phenomenon that while in 10 flips of a
fair coin it may not be very surprising to get 8 Heads, it would be very surprising
to get 8,000 Heads in 10,000 flips of the coin.
A) The frequencies of events with the same likelihood of occurrence even
out, given enough trials or instances.
B) If we kept taking samples of 10 coin flips to record the number of Heads,
then 95% of the time the sample mean would be contained in a given 95%
confidence interval.
1
C) If we have a large enough sample (as n approaches infinity), our sample
mean will approach the expected value of the random variable.
𝑃(𝐴 𝑎𝑛𝑑 𝐵)
D) P(A|B)= 𝑃(𝐵)
E) Regardless of the shape of the population distribution, the shape of the
sampling distribution tends to be normal for a large enough sample size.
4 Which of the following statements is false?
A) Two disjoint outcomes (of the same event) cannot occur at the same time.
B) Two independent events cannot occur at the same time.
C) Two mutually exclusive outcomes (of the same event) cannot occur at the
same time.
D) Two complementary outcomes (of the same event) cannot occur at the
same time.
5 A statistician is studying blood pressure levels of Italians in the age range 7580. The following is some information about her study:
I.
II.
III.
IV.
The researcher collected the data by recording the responses of Italians in
the age range 75-80 who responded to an email in which the survey was
included.
The sample observations only make up about 4% of the population.
The sample size is 2,047.
The distribution of sample observations is skewed - the skew is easy to see,
although not very extreme.
The researcher is ready to use the Central Limit Theorem (CLT) in the main part
of her analysis. Which aspect of her study is most likely to prevent her from
using the CLT?
a) IV and I
b) I
c) III
d) II and III
e) IV
f) All of the above
6 About 30% of human twins are identical, and the rest are fraternal. Identical
twins are necessarily the same sex, half are males and the other half are females.
One-quarter of fraternal twins are both male, one-quarter both female, and onehalf are mixes: one male, one female. You have just become a parent of twins and
are told they are both girls. Given this information, what is the probability that
they are identical?
A) 50%
B) 72%
C) 33%
D) 46%
Given:
P(I) = 0.3
2
P(~I)= 0.7
P(FF|I)= 0.5
P(MM|I)= 0.5
P(MM|~I)= 0.25
P(FF|~I)= 0.25
P(FM|~I)= 0.25
𝑃(𝐼)𝑃(𝐹𝐹|𝐼)
P(I|FF) =
𝑃(𝐹𝐹)
=
(0.3)∗(0.5)
(0.325)
= 0.461
P(FF)=P(FF|I)P(I) + P(FF|~I)P(~I) = (0.5)*0.3 + (0.25)0.7 = 0.325
Chapter 5
7 How large a sample size is needed to estimate the mean annual income of
Native Americans correct to within $1000 with probability 0.99? We know that
about 95% of their incomes are between $6000 and 50000$ and that this
distribution of incomes is approximately normal.
If confidence level is 99%, then z=2.58
We need to approximate the population standard deviation. We know that the
mean is (6000+50000)/2= 28000. We know that 50000 is two standard deviations
away from 28000. Thus the standard deviation is (50000-28000)/2=11000.
11000
1000 = 2.58 ×
√𝑁
11000
√𝑁 = 2.58 ×
1000
11000 2
𝑁 = (2.58 ×
)
1000
𝑁 = 806
8 Suppose we collected a sample of size n = 100 from some population and used
the data to calculate a 95% confidence interval for the population mean. Now
suppose we are going to increase the sample size to n = 300. Keeping all else
constant, which of the following would we expect to occur as a result of
increasing the sample size?
I.
II.
III.
The standard error would decrease.
Width of the 95% confidence interval would increase.
The margin of error would decrease.
a) II and III
b) I and III
c) I and II
d) I, II, and III
e) None
9 Increasing the confidence level causes the width of a confidence interval to
3
a) increase
b) decrease
c) stay the same
10 Other things being equal, if we quadruple the sample size the width of a
confidence interval
a) becomes one quarter as wide
b) halves
c) stays the same
d) doubles
11 Based on responses of 1467 subjects in General Social Surveys, a 95%
confidence interval for the mean of close friends equals (6.8, 8.0). Which of the
following interpretations is (are) correct?
a) We can be 95% confident that 𝑦̅ is between 6.8 and 8.0.
b) We can be 95% confident that 𝜇 is between 6.8 and 8.0.
c) Ninety-five percent of the values of y = number of close friends (for this
sample) are between 6.8 and 8.0.
d) If random samples of size 1467 were repeatedly selected, then 95% of the
time 𝑦̅ would fall between 6.8 and 8.0.
e) If random samples of size 1467 were repeatedly selected, then in the long run
95% of the confidence intervals formed would contain the true value of 𝜇.
12 A random sample of 50 records yields a 95% confidence interval for the mean
age at first marriage of women in a certain county of 21.5 to 23.0. Explain what is
wrong with each of the following interpretations of this interval and give the
right interpretation.
a) If random samples of 50 records were repeatedly selected, then 95% of the
time the sample mean age at first marriage for women would be between 21.5
and 23.0 years.
The correct statement would be: If random samples of 50 records were repeatedly
selected, then 95% of the time the true mean age at first marriage for women
would be between 21.5 and 23.0 years.
b) Ninety-five percent of the ages at first marriage for women in the county are
between 21.5 and 23.0 years.
This is simply not true. All we know from the confidence interval is that with
probability 95% the true mean age at first marriage is in our interval.
c) We can be 95% confident that 𝑦̅ is between 21.5 and 23.0
No, we are 100% confident that 𝑦̅ is between 21.5 and 23.0. We use the sample
mean to calculate the confidence interval.
d) If we repeatedly sampled the entire population, then 95% of the time the
population mean would be between 21.5 and 23.0 years.
No, if we repeatedly sampled the entire population we would always get the true
mean.
4
Download