Week12slides

advertisement
STT 200 – LECTURE 5, SECTION 23,24
RECITATION 12
(4/2/2013)
TA: Zhen Zhang
zhangz19@stt.msu.edu
Office hour: (C500 WH) 3-4 PM Tuesday
(office tel.: 432-3342)
Help-room: (A102 WH) 9:00AM-1:00PM, Monday
1
Class meet on Tuesday:
12:40 – 1:30PM A224 WH, Section 23
1:50 – 2:40PM A234 WH, Section 24
 Example
(sampling distribution)

Recall that the data we have last time contain “yes/no” responses from
a population of 400 persons who were asked if they have wireless
internet access at home. The population proportion of “yes” 𝑝 is 0.5575.

We draw many random samples
with size n = 37, the sampling distribution
of 𝑝 can be approximated by
𝑝 ~ 𝑁 𝑝,

𝑝(1−𝑝)
𝑛
What if we don’t know 𝑝?
2
p = 0.5575
0.3
0.4
0.5
0.6
^
p
0.7
0.8
 Example

(confidence interval)
To study 𝑝, we draw sample with size n = 37, obtain 𝑝 and construct
95% confidence interval using 𝑝 ± 𝑧 ∗
We are 95% confident
that 𝑝 is between it.
The “95% confident” means if we draw samples and construct intervals
many times, approximately 95% intervals will cover 𝑝.
0.4
0.6
0.8

𝑝(1−𝑝)
.
𝑛
0.2
3
 Example
(check conditions)
To validate the confidence interval, we need to check several conditions:

Independence condition: the n = 37 responses in the sample are
chosen independently.

Randomness condition: the n = 37 responses in the sample are
chosen randomly. We used the table of random digits from 1 to 400.

10% condition: the sample size n = 37 is less than 10% of the
population size 400.

Success/failure condition: the n = 37 responses in the sample
contains at least 10 yeses and 10 nos.
4
 Construct
confidence interval step by step
To construct confidence interval for 𝑝 with confidence level C:

Determine the critical value 𝑧 ∗ , either using Normal table, or R/calculator.
To use R/calculator, note the total area below 𝑧 ∗ is C +
1−𝐶
2
=
𝐶+1
,
2
so we can
find 𝑧 ∗ using qnorm((C+1)/2) in R, or invnorm((C+1)/2) in DISTR in a Ti-83
Plus calculator. For example in a calculator:

for 95% confidence,
𝑧 ∗ = invnorm((0.95+1)/2)=1.96

for 90% confidence,
𝑧 ∗ = invnorm((0.90+1)/2)=1.645
𝑝(1−𝑝)
𝑛

Find 𝑆𝐸 𝑝 =

The margin of error is 𝑀𝐸

The confidence interval is 𝑝 ± 𝑀𝐸 = (𝑝 − 𝑀𝐸, 𝑝 + 𝑀𝐸)
= 𝑧 ∗ 𝑆𝐸 𝑝 = 𝑧 ∗
𝑝(1−𝑝)
𝑛
5
 Understand

confidence interval backwards
If a 95% confidence interval for 𝑝 is (0.6184, 0.8616), can you figure
out what is 𝑝, what is the margin of error, and what is the sample size?
Ans. 𝑝 is the middle point of this interval, or, the average of the two
endpoints, so
0.6184 + 0.8616
𝑝=
= 0.74
2
and the margin of error is half of the width, or |endpoint-middle point|
𝑀𝐸 = 0.8616 − 0.74 𝑜𝑟 = 0.74 − 0.6184 = 0.1216
Now since 𝑀𝐸 = 𝑧 ∗
𝑛=
𝑝(1−𝑝)
,
𝑛
𝑧 ∗ 2 𝑝(1 − 𝑝)
𝑀𝐸 2
=
we have
1.962 ∗ 0.74 ∗ 0.26
0.12162
= 50
6
 Relationship
Margin of error 𝑀𝐸 = 𝑧 ∗
𝑝(1−𝑝)
𝑛
determines the width of the confidence
interval. The following simulation shows the relationship between 𝑀𝐸 and
each of 𝑧 ∗ , 𝑝 and 𝑛 when other two fixed.
margin of error
For example, if the confidence level C increases, 𝑧 ∗ will increase, so 𝑀𝐸 will
increase, and the confidence interval is wider.
p = 0.5
0.0
0.2
0.4
^
p
0.6
0.8
1.0
fixed confidence level = 95%, n = 100
40
60
80
100
120
sample size n
140
fixed confidence level = 95%, p = 0.5
0.80
0.85
0.90
0.95
confidence level C
fixed n = 100, p = 0.5
7

Sample size determination
Recall that from 𝑀𝐸 = 𝑧 ∗
𝑝(1−𝑝)
,
𝑛
we have:
𝑧 ∗ 2 𝑝(1 − 𝑝)
𝑛=
𝑀𝐸 2
and want to determine the sample size 𝑛 of the data we will collect. We need
to guess 𝑝.

If “it is believed” or some “national study” gives a value for population
proportion 𝑝, we can use it and replace 𝑝.

We can also use 𝑝 from our pilot sample if we have.

If we totally have no idea about 𝑝, we can use a conservative guess based
on the “worst” scenario, that is, when 𝑝(1 − 𝑝) reaches its maximal (when
𝑝 = 0.5), it corresponds to the largest required sample size.
8
NEED SOME COFFEE?
9

Chapter 19 (Page 504): #7:
Which statements are true?
a)
For a given sample size, higher confidence means a smaller margin
of error.
b)
For a specified confidence level, larger samples provides smaller
margins of error.
c)
For a fixed margin of error, larger samples provide greater
confidence.
d)
For a given confidence level, halving the margin of error requires a
sample twice as large.
10

Chapter 19 (Page 504): #8:
Which statements are true?
a)
For a given sample size, reducing the margin of error will mean
lower confidence.
b)
For a certain confidence level, you can get a smaller margin of
error by selecting a bigger sample.
c)
For a fixed margin of error, smaller samples will mean lower
confidence.
d)
For a given confidence level, a sample 9 times as large will make a
margin of error one third as big.
11

Chapter 19 (Page 505): #14:
11% of a random sample of 1003 adults approved of attempts to clone a human.
a)
Find the margin of error if we want 95% confidence.
𝑀𝐸 = 𝑧 ∗
b)
𝑝(1 − 𝑝)
0.11 × 0.89
𝑛 = 1.96 ×
1003 = 0.0194
Explain what that margin of error means.
The pollsters are 95% confident that the true population of adults who approve of
attempts to clone humans is within 1.9% of the estimated 11%.
c)
If we only need to be 90% confident, will the margin of error be larger or
smaller? Explain.
Smaller, since the critical value 𝑧 ∗ decreases as confidence level decreases.
d)
Find that margin of error.
𝑀𝐸 = 𝑧 ∗
a)
𝑝(1 − 𝑝)
𝑛 = 1.645 ×
0.11 × 0.89
1003 = 0.0163
In general, if all other aspects of the situation remain the same, would smaller
samples produce smaller or larger margin of error?
Larger.
12

Chapter 19 (Page 506): #27:
In a random survey of 226 college students, 20 reported being “only” children.
Estimate the proportion of students nationwide.
a)
Check conditions for constructing a confidence interval.
The students’ birth orders are likely to be independent. The sample was random and
consisted of less than 10% of the population. There were 20 successes and 206 failures
(both greater than 10).
b)
Construct 95% confidence interval.
𝑝=
20
= 0.0885,
226
𝑀𝐸 = 1.96
𝑝(1 − 𝑝)
226 = 0.0370.
Hence the confidence interval is 0.0885 ± 0.0370 = (0.0515, 0.1255).
c)
Interpret your interval.
We are 95% confident that between 5.15% and 12.55% of all college students are “only”
children.
d)
Explain what “95% confidence” means in this context.
If we were to select repeated samples like this we’d expect about 95% of the confidence
intervals we created to contain the true proportion of all college students who are “only”
children.
13

Chapter 19 (Page 506): #28:
74% of 1644 randomly selected college freshmen returned to college the next
year. Estimate the national freshman-to-sophomore retention rate.
a)
Verify that the conditions are met.
It’s a random sample; both 74% and 26% of 1644 are greater than 10.
b)
Construct a 98% confidence interval.
The critical value is invnorm((1+0.98)/2) = 2.326, hence the margin of error =
2.326*sqrt(0.74*0.26/1644)=0.0252.
Hence the confidence interval is 0.74 ± 0.0252 = (0.7148,0.7652)
c)
Interpret your interval.
We’re 98% confident that between 71.48% and 76.52% of colleges freshman
return to college their sophomore years.
d)
Explain what “98% confidence” means in this context.
If we were to select repeated samples like this we’d expect about 98% of the
confidence intervals we created to contain the true proportion of all college
freshmen who return to be sophomores.
14

Sample size determination
In a University, it’s believed that 25% of adults over 30 love Statistics. We wish to
see if this percentage is the same among the 18 to 25 age group.
a)
How many of this younger age group must we survey in order to estimate the
proportion of those who love Statistics to within 5% with 90% confidence?
With 90% confidence, the critical value 𝑧 ∗ = 1.645. Thus
𝑧 ∗ 2 𝑝(1 − 𝑝) 1.6452 × 0.25 × (1 − 0.25)
𝑛=
=
= 202.9519
𝑀𝐸 2
0.052
So the required sample size is 203.
b)
If we want to cut the margin of error in half, how many of this younger age
group must we survey? Do you have any concerns about this sample? Explain.
𝑧 ∗ 2 𝑝(1 − 𝑝) 1.6452 × 0.25 × (1 − 0.25)
𝑛=
=
= 811.8075
𝑀𝐸 2
0.0252
So the required sample size is 812.
This large sample might be larger than 10% of the population.
15
APPENDIX 1
 R codes for example:
# please import the data we had in recitation 11 slide, otherwise it won’t work
haswi <- c("Yes","Yes","Yes","No","Yes","No","Yes","No“ ... ...
p <- mean(haswi=="Yes"); N <- length(haswi); n <- 37; replica <- 10000
set.seed(241)
phats <- numeric(replica)
interval <- matrix(0, replica, 2)
zstar <- qnorm((1+0.95)/2)
for (t in 1:replica){
mysamples <- haswi[sample(1:N, size=n)]
ph <- sum(mysamples=="Yes")/n; moe <- zstar*sqrt(ph*(1-ph)/n)
phats[t] <- ph
interval[t,] <- c(ph-moe, ph+moe)
}
phats <- na.omit(phats)
win.graph(w=12,h=6)
par(xaxt='n',mar=c(.8,2,.8,.8));
B <- 100
plot(1:B, ylim=range(interval[1:B,])+1*c(-.01,.01),type='n',ylab='',xlab='');
grid(col='gray60')
abline(h=p, col='red',lwd=2)
for(t in 1:B){
lines(x=c(t,t), y=interval[t,],col='gray40',lwd=2)
lines(x=t+c(-.3,.2), y=rep(interval[t,1],2),col='gray40',lwd=2)
lines(x=t+c(-.3,.2), y=rep(interval[t,2],2),col='gray40',lwd=2)
points(x=t, y=mean(interval[t,]), pch=16, cex=.8,col='blue2')
}
mean(p>=interval[1:B,1] & p<=interval[1:B,2])
16
APPENDIX 2
 R codes for the simulation study of finding relationship between margin of
error and sample proportion, sample size and confidence level.
a = function(p=0.5,z=0.95,n=100) return(qnorm((1+z)/2)*sqrt(p*(1-p)/n))
ps = seq(0,1,length=1e3)
win.graph(w=9,h=4)
par(mfrow=c(1,3), mar=c(4,4,0,0)+1, cex.lab=2, yaxt='n', cex.sub=1.3)
plot(a(ps)~ps, type='l', xlab=expression(hat(p)),ylab='margin of error', lwd=2,
sub="fixed confidence level = 95%, n = 100"); grid(col='gray70')
abline(v=0.5, col='red2',lwd=2); text(y=0,x=0.65,labels="p = 0.5",col='red2',cex=1.2)
ns <- seq(30,150,by=1)
plot(a(n=ns)~ns, type='l', xlab='sample size n',ylab='', lwd=2,
sub="fixed confidence level = 95%, p = 0.5"); grid(col='gray70')
zs <- seq(0.8,0.99,length=1e3)
plot(a(z=zs)~zs, type='l', xlab='confidence level C',ylab='', lwd=2,
sub="fixed n = 100, p = 0.5"); grid(col='gray70')
17
Thank you.
18
Download