Chapter 4 Simple Random Samples

advertisement
Chapter 4
Simple Random Sampling
Definition of Simple Random Sample
(SRS) and how to select a SRS
 Estimation of population mean and total;
sample size for estimating population
mean and total
 Estimation of population proportion;
sample size for estimating population
proportion
 Comparing estimates

Simple Random Samples

Desire the sample to be representative of
the population from which the sample is
selected

Each individual in the population should
have an equal chance to be selected

Is this good enough?
Example

1.
2.
3.




Select a sample of high school students as follows:
Flip a fair coin
If heads, select all female students in the school as the
sample
If tails, select all male students in the school as the
sample
Each student has an equal chance to be in the sample
Every sample a single gender, not representative
Each individual in the population has an equal chance
to be selected. Is this good enough?
NO!!
Simple Random Sample

A simple random sample (SRS) of size
n consists of n units from the population
chosen in such a way that every set of n
units has an equal chance to be the
sample actually selected.
Simple Random Samples (cont.)
•
•
•
•
•
Suppose a large History class of 500 students has 250 male and 250
female students.
To select a random sample of 250 students from the class, I flip a fair
coin one time.
If the coin shows heads, I select the 250 males as my sample; if the
coin shows tails I select the 250 females as my sample.
What is the chance any individual student from the class is included in
the sample? 1/2
This is a random sample. Is it a simple random sample? NO!
Not every possible group of 250 students has
an equal chance to be selected.
Every sample consists of only 1 gender –
hardly representative.
Simple Random Samples (cont.)
 The easiest way to choose an SRS is with
random numbers.
Statistical software
can
generate random
digits
(e.g., Excel
“=random()”,
ran# button on
calculator).
Example: simple random sample

Academic dept wishes to randomly choose a
3-member committee from the 28 members
of the dept
00 Abbott
01 Cicirelli
02 Crane
03 Dunsmore
04 Engle
05 Fitzpat’k
06 Garcia
07 Goodwin 14 Pillotte
08 Haglund 15 Raman
09 Johnson 16 Reimann
10 Keegan
17 Rodriguez
11 Lechtenb’g 18 Rowe
12 Martinez 19 Sommers
13 Nguyen
20 Stone
21 Theobald
22 Vader
23 Wang
24 Wieczoreck
25 Williams
26 Wilson
27 Zink
Solution
•
Use a random number table; read 2-digit pairs
until you have chosen 3 committee members
For example, start in row 121:
•
71487 09984 29077 14863 61683 47052 62224 51025
•
Garcia (07) Theobald (22) Johnson (10)
 Your calculator generates random numbers; you
can also generate random numbers using Excel
Sampling Variability
•
Suppose we had started in line 145?
•
19687 12633 57857 95806 09931 02150 43163 58636
Our sample would have been
19 Rowe, 26 Williams, 06 Fitzpatrick
•
Sampling Variability
•
•
•
•
•
Samples drawn at random generally differ
from one another.
Each draw of random numbers selects
different people for our sample.
These differences lead to different values for
the variables we measure.
We call these sample-to-sample differences
sampling variability.
Variability is OK; bias is bad!!
Example: simple random sample
Using Excel tools
 Using statcrunch (NFL)

4.3 Estimation of population
mean

Usual estimator
n
1
y   yi
n i 1
Recall that E ( y )  
What about the variability of y
from sample to sample?
4.3 Estimation of population
mean

For a simple random sample of size n chosen without
replacement from a population of size N
V ( y) 

  N n
2


n  N 1 
The correction factor takes into account that an estimate
based on a sample of n=10 from a population of N=20
items contains more information than a sample of n=10
from a population of N=20,000
4.3 Estimating the variance of the
sample mean

Recall the sample variance
n
1
2
2
s 
( yi  y )

n  1 i 1
It can be shown (Appendix A) that
N
2
2
E (s ) 

N 1
4.3 Estimating the variance of the
sample mean
So V ( y ) can be unbiasedly estimated by
N

n
s
n
s




Vˆ ( y )  
 1  

N n
 N  n 
n
1
is called the
N
finite population correction (fpc)
2
2
4.3 Estimating the variance of the
sample mean
n
1
If

,
N 20
the fpc is usually ignored, so
2
s
ˆ
V ( y) 
n
4.3 Example

Population {1, 2, 3, 4}; n = 2, equal weights
Sample
y
Pr. of sample
s2
Vˆ ( y )
{1, 2}
1/6
1.5
0.5
0.125
{1, 3}
1/6
2.0
2.0
0.500
{1, 4}
1/6
2.5
4.5
1.125
{2, 3}
1/6
2.5
0.5
0.125
{2, 4}
1/6
3.0
2.0
0.500
{3, 4}
1/6
3.5
0.5
0.125
For example, {1, 2}:
 2  0.5 0.5
ˆ
V ( y )  1  

 0.125
4
 4 2
4.3 Example

Population {1, 2, 3, 4};
Sample
=2.5,
2
= 5/4; n = 2, equal weights
y
Pr. of sample
Vˆ ( y )
s2
{1, 2}
1/6
1.5
0.5
0.125
{1, 3}
1/6
2.0
2.0
0.500
{1, 4}
1/6
2.5
4.5
1.125
{2, 3}
1/6
2.5
0.5
0.125
{2, 4}
1/6
3.0
2.0
0.500
{3, 4}
1/6
3.5
0.5
0.125
E ( y )  1.5  16   2.0  16   2.5  16   2.5  16   3.0  16   3.5  16   2.5  
V ( y )  (1.5  2.5)
2
 
1
6
 (3.5  2.5)
2
N n 

  

12  N  1  n
1
6
5
2
4.3 Example

Population {1, 2, 3, 4};
Sample
=2.5,
2
= 5/4; n = 2, equal weights
y
Pr. of sample
s2
Vˆ ( y )
{1, 2}
1/6
1.5
0.5
0.125
{1, 3}
1/6
2.0
2.0
0.500
{1, 4}
1/6
2.5
4.5
1.125
{2, 3}
1/6
2.5
0.5
0.125
{2, 4}
1/6
3.0
2.0
0.500
{3, 4}
1/6
3.5
0.5
0.125
0.5  2.0  4.5  0.5  2.0  0.5 5  N  2
E (s ) 
 

6
3  N 1 
0.125  0.5  1.125  0.125  0.5  0.125 5
ˆ
E (V ( y )) 

 V ( y)
6
12
2
4.3 Example Summary

Population {1, 2, 3, 4};
Sample
=2.5,
2
= 5/4; n = 2, equal weights
y
Pr. of sample
s2
Vˆ ( y )
{1, 2}
1/6
1.5
0.5
0.125
{1, 3}
1/6
2.0
2.0
0.500
{1, 4}
1/6
2.5
4.5
1.125
{2, 3}
1/6
2.5
0.5
0.125
{2, 4}
1/6
3.0
2.0
0.500
{3, 4}
1/6
3.5
0.5
0.125
2  N n 5
E ( y )  2.5   , V ( y ) 


n  N  1  12
2

n
s

  5
ˆ
E (V ( y ))  E   1    
 V ( y)
  N  n  12
4.3 Margin of error when
estimating the population mean
Margin of error (MOE), also called "bound
on the error of estimation"
n s

ˆ
t.025, n 1 V ( y )  t.025, n 1 1   ;
N n

often the value of z from N(0,1) is used:
2
n

1.96 Vˆ ( y )  1.96 1 
N

2
s


 n
t distributions
Very similar to z~N(0, 1)
 Sometimes called Student’s t
distribution; Gossett, brewery employee
 Properties:
i) symmetric around 0 (like z)
ii)degrees of freedom 

if  > 1, E(t ) = 0
if  > 2,  =   - 2, which is always
bigger than 1.
Student’s t Distribution
P(t > 2.2281) = .025
P(t < -2.2281) = .025
.95
.025
-2.2281
0
.025
2.2281
t10
Standard normal
P(z > 1.96) = .025
P(z < -1.96) = .025
.95
.025
-1.96
.025
0
1.96
z
Student’s t Distribution
z
y 
t
s
n
y 

n
Z
t
-3
-3
-2
-2
-1
-1
00
11
22
33
Figure 11.3, Page 372
Student’s t Distribution
y
t
s
n
Degrees of Freedom
s =
s2
n
s2 =
2
(X

X)
 i
i=1
Z
n -1
t1
-3
-3
-2
-2
-1
-1
00
11
22
33
Figure 11.3, Page 372
Student’s t Distribution
y
t
s
n
Degrees of Freedom
s =
s2
n
s2 =
2
(X

X)
 i
i=1
Z
n -1
t1
t7
-3
-3
-2
-2
-1
-1
00
11
22
33
Figure 11.3, Page 372
4.3 Margin of error when
estimating the population mean
Adding MOE to y and subtracting MOE
from y gives 95% confidence interval:
y  t.025, n 1
n s

1  
N n

2
or
n

y  1.96 1 
N

2
s


 n
4.3 Margin of error when
estimating the population mean

Understanding confidence intervals; behavior of
confidence intervals.
4.3 Margin of error when
estimating the population mean
More generally, (1   )% confidence interval:
y  t
n s

1  
N n

or
2
2
, n 1
y  z
n s

1  
N n

2
2
Comparing t and z Critical
Values
z = 1.645
z = 1.96
z = 2.33
z = 2.58
Conf.
level
90%
95%
98%
99%
n = 30
t = 1.6991
t = 2.0452
t = 2.4620
t = 2.7564
4.4 Determining Sample Size
to Estimate 
Required Sample Size To
Estimate a Population Mean 
If you desire a C% confidence interval
for a population mean  with an
accuracy specified by you, how large
does the sample size need to be?
 We will denote the accuracy by MOE,
which stands for Margin of Error.

Example: Sample Size to
Estimate a Population Mean 
Suppose we want to estimate the
unknown mean height  of male
students at NC State with a confidence
interval.
 We want to be 95% confident that our
estimate is within .5 inch of 
 How large does our sample size need to
be?

Confidence Interval for 
In terms of the margin of error MOE,
the CI for  can be expressed as
y  MOE
The confidence interval for  is
 s 
y t 

 n
*  s 
so MOE  tn 1 

 n
*
n 1
So we can find the sample size by solving
this equation for n:
MOE  t
*
n 1
 s 


 n
 t s 
which gives n  

MOE


*
n 1
2
Good news: we have an equation
 Bad news:

1. Need to know s
2. We don’t know n so we don’t know the
degrees of freedom to find t*n-1
A Way Around this Problem:
Use the Standard Normal
Use the corresponding z* from the standard normal
to form the equation
 s 
MOE  z 

n


Solve for n:
*
 zs 
n

MOE


*
2
Sampling distribution of y
Confidence level
.95


  1.96
n

  1.96
n
MOE
MOE
set MOE  1.96
 1.96  
n

 MOE 
2

n
and solve for n
Estimating s
Previously collected data or prior
knowledge of the population
 If the population is normal or nearnormal, then s can be conservatively
estimated by
s  range
6
 99.7% of obs. Within 3  of the mean

Example: sample size to
2
*
 z s 
estimate mean height µ of
n

MOE 

NCSU undergrad. male
students
We want to be 95% confident that we are
within .5 inch of , so
 MOE = .5; z*=1.96
Suppose previous data indicates that s
is about 2 inches.
 n= [(1.96)(2)/(.5)]2 = 61.47
 We should sample 62 male students

Example: Sample Size to
Estimate a Population Mean Textbooks

Suppose the financial aid office wants to
estimate the mean NCSU semester textbook
cost  within MOE=$25 with 98% confidence.
How many students should be sampled?
Previous data shows  is about $85.
2
 z *σ 
 (2.33)(85) 
n
 
  62.76
25


 MOE 
round up to n = 63
2
Example: Sample Size to Estimate a Population
Mean -NFL footballs
• The manufacturer of NFL footballs uses a machine to
inflate new footballs
• The mean inflation pressure is 13.0 psi, but random
factors cause the final inflation pressure of individual
footballs to vary from 12.8 psi to 13.2 psi
• After throwing several interceptions in a game, Tom
Brady complains that the balls are not properly
inflated.
The manufacturer wishes to estimate the
mean inflation pressure to within .025 psi
with a 99% confidence interval. How many
footballs should be sampled?
Example: Sample Size to Estimate a n   z * 
Population Mean 
 ME 
• The manufacturer wishes to estimate the mean
inflation pressure to within .025 pound with a 99%
confidence interval. How may footballs should be
sampled?
• 99% confidence  z* = 2.58; ME = .025
•  = ? Inflation pressures range from 12.8 to 13.2 psi
• So range =13.2 – 12.8 = .4;   range/6 = .4/6 = .067
 2.58  .067 
n
  47.8  48
 .025 
2
. . .
1
2
3
48
2
Required Sample Size To
Estimate a Population Mean 
The formula
2
 zs 
n

 MOE 
assumes an infinite population or
sampling with replacement (so no fpc).
*

It is frequently the case that we are
sampling without replacement.
Required Sample Size To Estimate a
Population Mean  When Sampling
Without Replacement.
(1   )% confidence interval:
y  t
, n 1
2

n  s2
1 

N

 n
Can't use since don't know n; use
y  z
2
y  z
2

n  s2
1 

N

 n

n  s2
1 

N

 n
MOE
Required Sample Size To Estimate a
Population Mean  When Sampling
Without Replacement.
y  z
2
MOE  z
n  s2
N  n
MOE

1 

2
n  s2

1 

N

 n
n
z 1
s
N
n 
MOE
z2 s2
n 

n
1
 
2 
N 
( MOE ) 
Required Sample Size To Estimate a
Population Mean  When Sampling
Without Replacement.
z2 s2
n 

n
1

2 
N 
( MOE ) 
z2 s2
n
 MOE 
1
1
N
n0
n
n0
1
N
2
z2 s2
 MOE 
2
z2 s2
where n0 
( MOE ) 2
4.3 Estimation of population total
Since   N 
we know that the estimator of
 is N times the estimator y of 
and MOE for estimating the total is N
times the MOE for estimating the mean
4.3 Estimation of population total
n
ˆ  Ny 
N  yi
i 1
n
 N  n 
V (ˆ)  V ( Ny )  N V ( y )  N 

 N 1  n
2
n s
2 
ˆ
V (ˆ)  N 1  
N n

Margin of error (MOE)
2
2
t.025, n 1
n
2 
N 1 
N

2
2
2
s
n
s


2 
or 1.96 N  1  

N n
 n

Required Sample Size To
Estimate a Population Total
2
n
s


MOE  z N 2 1  
N n

z s
n
2 
n
N 1   
2
N
( MOE )

2
2
2
N2
n
2
2
z2 s2
 MOE 
2
z2 s2
1 N
( MOE ) 2
N n0
z s
so n 
where n0 
1  Nn0
( MOE ) 2
4.3 Estimation of population total

Estimate number of lakes in Minnesota, the “Land of
10,000 Lakes”.
4.5 Estimation of
population proportion p



Interested in the proportion p of a population that has a
characteristic of interest.
Estimate p with a sample proportion.
http://packpoll.com/
4.5 Estimation of population proportion p
The data: y 
i
pˆ 
1
n

1 if item i has the characteristic of interest
0 if item i does not have the characteristic
n
y
i
 y
i 1
n
Underlying model: each y
 
n
E
y
i 1
i
B (1, p );
i
 
y
i 1
n
 np ; V
y
i 1
i
 np (1  p )
i
B ( n, p )
4.5 Estimation of population proportion p
pˆ 
1
n
n
 
 
    
    
n
y; E y
i
i 1
 np ; V
i
y
i 1
1
So: E ( pˆ )  E
1
i 1
i

1
n
E
y
n
i 1
y
n

i
n
i
 np (1  p )
i 1
n
y
n
V ( pˆ )  V
n
1
n
2
i
i 1
n
V
y
i 1
i


np
 p
n
np (1  p )
n
2

p (1  p )
n
4.5 Estimation of population proportion p
ˆ
ˆ
n
p
(1

p
)


Vˆ ( pˆ )  1  
 N  n 1
 n  pˆ (1  pˆ )
MOE  z 1  
 N  n 1
Required Sample Size To Estimate
a Population Proportion p When
Sampling Without Replacement.
1 n
ˆ   yi  y , we can use sample size formula for :
Since p
n i 1
n0
n
n
1 0
N
z2 s2
where n0 
.
2
( MOE )
Since V ( yi )   2  p (1  p ),
for s 2 we can use p (1  p )
(use prior information about p or p 
1
)
2
4.6 Comparing Estimates
We often like to compare the means 1 and 2
in different populations or the proportions p1 and p2 .
To compare 1 and 2 we estimate the difference
1  2 .
To compare p1 and p2 we estimate the difference
p1  p2 .
4.6 Comparing Estimates:
Comparing Means
Background: for random variables X and Y :
 E ( X  Y )  E ( X )  E (Y )   X  Y
V ( X  Y )  V ( X )  V (Y )  2C ov( X , Y )
If X and Y are independent, then Cov( X , Y )  0;
(we will focus on the independent case)
4.6 Comparing Estimates:
Comparing Means
x1 , x2 ,
, xn1 random sample from pop. 1 (1 ,  1 unknown)
y1 , y2 ,
, yn2 random sample from pop. 2 ( 2 ,  2 unknown)
independent random samples from the 2 populations
 E ( x  y )  E ( x )  E ( y )  1   2
V ( x  y )  V ( x )  V ( y )
2
2




n
s
n
s
Vˆ ( x )  1  1  1 ;Vˆ ( y )  1  2  2
 N1  n1
 N 2  n2
Population 1
Parameters: µ1 and 12
(values are unknown)
Sample size: n1
Statistics: x1 and s12
Population 2
Parameters: µ2 and 22
(values are unknown)
Sample size: n2
Statistics: x2 and s22
Estimate µ1 µ2 with x1 x2
60
Sampling distribution model for x1  x2 ?
E ( x1  x2 )  1  2 ; SD( x1  x2 ) 
( x1  x2 )  ( 1  2 )
SE ( x1  x2 )

2
1
n1


2
2
2
s1
SE ( x1  x2 ) 
n2
n1
 n s
1  N  n


1
1

n2
 n s
 1 
n
N


2
fpc: SE ( x1  x2 ) 
2
s2
1
2
1
2
2
2
2
Shape?
2
s s 
  
n1 n2 

df 
2
2
2
2
1  s1 
1  s2 
  
 
n1  1  n1  n2  1  n2 
2
1
2
2
An estimate of the degrees of
freedom is
min(n1 − 1, n2 − 1).
df
s12 s22

n1 n 2

0
( x1  x2 )  ( 1  2 )
SE ( x1  x2 )
4.6 Comparing Estimates:
Comparing Means
Bound on the error of estimation:
 t 2 , df

n1  s12 
n2  s22
1 
  1 

N
n
N

1  1

2  n2
or

n1  s12 
n2  s22
 z 1 
  1 

N1  n1 
N 2  n2

4.6 Comparing Estimates: Comparing
Means (Special Case, Seldom Used)
Assume  12   22   2 .
Pooled estimate of common variance:
2
2
(
n

1)
s

(
n

1)
s
1
2
2
s 2p  1
;
n1  n2  2
V ( x  y )  V ( x )  V ( y ) 
t
( x1  x2 )  ( 1  2 )
s
2
p
n1

s
2
p
2
n1

2
n2
BOE:
, df  n1  n2  2
t 2 , n1  n2  2
2

n1  s p 
n2
1


1




N
n

1  1
 N2
or
n2


n1  s ˆ
n2  s
ˆ
V ( x )  1 
 ;V ( y )   1 

N
n
N

1  1

2  n2
2
p
2
p
2

n1  s p 
n2
z 1 

1



N
n

1  1
 N2
 sp

 n2
2
 sp

 n2
2
4.6 Comparing Estimates: Comparing
Proportions, Two Cases
Difference between
two polls

Difference of proportions
between 2 independent polls
Differences within a single
poll question

Comparing proportions for a
single poll question, horse-race
polls (dependent proportions)
4.6 Comparing Estimates: Comparing
Proportions in Two Independent Polls
pˆ1 estimates pop. proportion p1 in poll #1
pˆ 2 estimates pop. proportion p2 in poll #2
Polls are independent
V ( pˆ1  pˆ 2 )  V ( pˆ1 )  V ( pˆ 2 )
p1 (1  p1 ) p2 (1  p2 )


n1
n2
4.6 Comparing Estimates: Comparing
Proportions in Two Independent Polls
Vˆ ( pˆ1  pˆ 2 )  Vˆ ( pˆ1 )  Vˆ ( pˆ 2 )

n1  pˆ1 (1  pˆ1 ) 
n2  pˆ 2 (1  pˆ 2 )
 1 
 1 


 N1  n1  1
 N 2  n2  1

n1  pˆ1 (1  pˆ1 ) 
n2  pˆ 2 (1  pˆ 2 )
BOE : z 1 
 1 


 N1  n1  1
 N 2  n2  1
4.6 Comparing Estimates: Comparing
Dependent Proportions in a Single Poll
 Multinomial Sampling Situation
– Typically 3 or more choices in a poll
( p1  p2 )  ( p1  p2 ) 2
V ( pˆ1  pˆ 2 ) 
n
p1 p2
 V ( pˆ1 )  V ( pˆ 2 )  2
n
pˆ1 pˆ 2
ˆ
ˆ
ˆ
V ( pˆ1  pˆ 2 )  V ( pˆ1 )  V ( pˆ 2 )  2
n 1

n  pˆ1 (1  pˆ1 ) 
n  pˆ 2 (1  pˆ 2 )
pˆ1 pˆ 2
 1 
 1 
2


n 1
 N  n 1
 N  n 1
Worksheet

http://packpoll.com/
End of Chapter 4
Download