Sampling Methods and Sampling Distributions

advertisement
Sampling Methods
and
Sampling Distributions
Learning Objectives

Explain Types of Samples

Describe the Properties of Estimators

Explain Sampling Distribution

Describe the Relationship between
Populations & Sampling Distributions

State the Central Limit Theorem

Solve Probability Problems Involving
Sampling Distributions
Sampling
Methods
Types of Samples
Type of
Sample
Non
Probability
Probability
Simple
Random
Judgement
Quota
Chunk
Systematic
Stratified
Cluster
Simple Random Sample
1. Each Population Element
Has an Equal Chance of
Being Selected
2. Selecting 1 Subject Does
Not Affect Selecting
Others
3. May Use Random
Number Table, Lottery,
‘Fish Bowl’
Random Number Table
Column
Row
00000
12345
00001
67890
11111
12345
11111
67890
01
49280
88924
35779
00283
02
61870
41657
07468
08612
03
43898
65923
25078
86129
Types of Samples
Type of
Sample
Non
Probability
Probability
Simple
Random
Judgement
Quota
Chunk
Systematic
Stratified
Cluster
Systematic Sample
1. Items of population
arranged in some
way- alphabetically,
by date received
2.Every kth Element Is
Selected After a
Random Start within
the First k Elements
3. Used in Telephone
Surveys
© 1984-1994 T/Maker Co.
Types of Samples
Type of
Sample
Non
Probability
Probability
Simple
Random
Judgement
Quota
Chunk
Systematic
Stratified
Cluster
Stratified Sample
All Students
1. Divide Population
into Subgroups
Mutually Exclusive
Collectively Exhaustive
At Least 1 Common
Characteristic of
Interest
Commuters
2. Select Simple Random
Samples from Subgroups
Residents
Sample
Types of Samples
Type of
Sample
Non
Probability
Probability
Simple
Random
Judgement
Quota
Chunk
Systematic
Stratified
Cluster
Cluster Sample

Divide Population
into Clusters

Companies (Clusters)
If Managers
are Elements, then
Companies are Clusters

Select Clusters
Randomly

Survey All or a
Random Sample of
Elements in Cluster
Sample
Types of Samples
Type of
Sample
Non
Probability
Probability
Simple
Random
Judgement
Quota
Chunk
Systematic
Stratified
Cluster
Nonprobability Samples
1. Judgment
Use Experience to Select Sample
e.g., Test Markets
2. Quota
Similar to Stratified Sampling Except
No Random Sampling
3. Chunk (Convenience)
Use Elements Most Available
Errors Due to Sampling

Sampling Error - occurs because
sample is taken instead of census
 Errors
are due to chance
 Equally likely to be too high or too low
 Improve by increasing sample size

Nonsampling Error - Bias
A
directional error
 Can not be reduced by increasing
sample size
Sampling Distributions
Statistical Methods
Statistical
Methods
Descriptive
Statistics
Inferential
Statistics
Inferential Statistics

Involves



Estimation
Hypothesis
Testing
Purpose

Make Decisions
about Population
Characteristics
Population?
Inference Process
Estimates
& Tests
Sample
Statistic
(`X, P )
Population
Sample
Estimators
1. Random Variables Used to Estimate a
Population Parameter
-Sample Mean, Sample Proportion,
Sample Median
2. Sample Mean is an Estimator of
Population Mean m
If
X
= 3 then 3 Is the Estimate of m
3. Theoretical Basis Is Sampling
Distribution
Properties of Mean

Unbiasedness


Efficiency


Mean of Sampling Distribution Equals Population
Mean
Sample Mean Comes Closer to Population Mean
Than Any Other Unbiased Estimator
Consistency

As Sample Size Increases, Variation of Sample Mean
from Population Mean Decreases
Unbiasedness
P(`X)
Unbiased
Biased
A
C
mx= mx A
mx C
`X
Efficiency
P(`X)
Sampling
Distribution
of Mean
B
Sampling
Distribution
of Median
A
mx
`X
Consistency
P(`X)
Larger
Sample
Size
B
Smaller
Sample
Size
A
mx
`X
Sampling Distribution

Theoretical Probability Distribution

Random Variable is Sample Statistic

Sample Mean, Sample Proportion, etc.

Results from Drawing All Possible
Samples of a Fixed Size

List of All Possible [`X, P(`X) ] Pairs

Sampling Distribution of Mean
Developing
Sampling Distributions

Suppose There’s a
Population ...

Population Size, N = 4

Random Variable, X,
Is # Errors in Work

Values of X: 1, 2, 3, 4

Uniform Distribution
X
(# of errors)
1
2
3
4
10
10
m
 2.5
4
m
(X -m)
(X - m)2
2.5
2.5
2.5
2.5
-1.5
-0.5
0.5
1.5
2.25
0.25
0.25
2.25
5

5
 112
.
4
Population Mean and Standard Deviation
Population Characteristics
Summary Measures
Population Distribution
N
mx 
 Xi
i 1
N
 2.5
N
x 
.3
.2
.1
.0
 (X i - m x)
i 1
N
1
2
.
 112
2
3
4
Inference Process
Estimates
& Tests
Sample
Statistic
(`X, Ps )
Population
Sample
All Possible Samples
of Size n = 2
16 Samples
16 Sample Means
1st 2nd Observation
Obs 1
2
3
4
1st 2nd Observation
Obs 1
2
3
4
1 1,1 1,2 1,3 1,4
1 1.0 1.5 2.0 2.5
2 2,1 2,2 2,3 2,4
2 1.5 2.0 2.5 3.0
3 3,1 3,2 3,3 3,4
3 2.0 2.5 3.0 3.5
4 4,1 4,2 4,3 4,4
4 2.5 3.0 3.5 4.0
Sample With Replacement
Sampling Distribution
of All Sample Means
16 Sample Means
1st 2nd Observation
Obs 1
2
3
4
Sampling
Distribution
X
f
p(X)
1.0
1
1/16
1.5
2
2/16
2 1.5 2.0 2.5 3.0
2.0
3
3/16
3 2.0 2.5 3.0 3.5
2.5
4
4/16
4 2.5 3.0 3.5 4.0
3.0
3
3/16
3.5
2
2/16
4.0
1
1/16
1 1.0 1.5 2.0 2.5
Sampling Distribution
of All Sample Means
16 Sample Means
1st 2nd Observation
Obs 1
2
3
4
1 1.0 1.5 2.0 2.5
2 1.5 2.0 2.5 3.0
3 2.0 2.5 3.0 3.5
4 2.5 3.0 3.5 4.0
Sampling
Distribution
P(`X)
.3
.2
.1
.0
`X
1.0 1.5 2.0 2.5 3.0 3.5 4.0
N
mx 
Xi

i 1
N
X
1.0
1.5
1.5
2.0
2.0
2.0
2.5
2.5
2.5
2.5
3.0
3.0
3.0
3.5
3.5
4.0
40
40 2.5
1.0 + 1.5 + L + 4 .0



16
16
mx
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
2.5
(X- mx) (X- mx)2
-1.5
-1.0
-1.0
-0.5
-0.5
-0.5
0.0
0.0
0.0
0.0
0.5
0.5
0.5
1.0
1.0
1.5
2.25
1.00
1.00
0.25
0.25
0.25
0.00
0.00
0.00
0.00
0.25
0.25
0.25
1.00
1.00
2.25
10.00
Summary Measures of
All Possible Sample Means
N
mx 
 Xi
i 1
N

N
x 
10
. + 15
. + L + 4.0
 (X i - m x )
16
2
i 1
N
2

 2.5
2
10
.
2
.
5
. - 2.5) +L+(4.0 - 2.5)
( - ) + (15
16
2
10

 .79
16
Comparison of Population &
Sampling Distribution
Population
.3
.2
.1
.0
Sampling Distribution
P(X)
1
2
3
4
P(`X)
.3
.2
.1
.0
`X
1 1.5 2 2.5 3 3.5 4
m x  2.5
m x  2.5
.
 x  112
 x  .79
Standard Error of Mean
(Standard Deviation of the Sampling Distribution of Means)

Standard Deviation of All Possible Sample Means,`X
 Measures Scatter in All Sample Means,`X

Less Than Population Standard Deviation
Formula
(Sampling With Replacement)

N
x 
(
i 1
)
Xi - mx
N
2


n
Sampling Distribution of the Sample Means
Summary
mx = mx
x 

n
Sampling is done with replacement
or
Population is infinite
or
n/N < .05
Sampling from
Normal Populations

Population
Distribution
Central Tendency
mx  mx

X = 10
Dispersion
x 

x
n
Sampling With
Replacement
mX = 50
X
Sampling Distribution
n =16
x = 2.5
n =4
x = 5
mX- = 50
X
Standardizing Sampling
Distribution of Mean
Z
X - mx
x
n
Sampling
Distribution

X - mx

x
Standardized
Normal Distribution
z = 1
X
mX
`X
mZ = 0
Z
Thinking Challenge
You’re an operations
analyst for AT&T. Longdistance telephone calls
are normally distribution
with mx = 8 min. & x = 2
min. If you select random
samples of 25 calls, what
percentage of the sample
means would be between
7.8 & 8.2 minutes?

© 1984-1994 T/Maker Co.
Sampling Distribution Solution*
Z
Sampling
Distribution
Z
X - mx
x
n
X - mx
x
n


7.8 - 8
2
25
 -.50
8.2 - 8
2
 .50
Standardized
25
Normal Distribution
Z = 1
`X = .4
.3830
.1915
7.8 8 8.2 `X
.1915
-.50 0 .50
Z
Sampling from
Normal Populations

Population
Distribution
Central Tendency
mx  mx

X = 10
Dispersion
x 

x
n
Sampling With
Replacement
mX = 50
X
Sampling Distribution
n =16
x = 2.5
n =4
x = 5
mX- = 50
X
Sampling from
Non-Normal Populations

Central Tendency
mx  mx

Population
Distribution
X = 10
Dispersion
x 

x
n
Sampling With
Replacement
mX = 50
X
Sampling Distribution
n =30
n=4
x =1.8
x= 5
mX- = 50
X
Central Limit Theorem
For a population with a mean u and a
standard deviation  , the sampling
distribution of the means of all
possible samples of size n generated
from the population will be
approximately normally distributed
assuming that the sample size is
sufficiently large.
Central Limit Theorem
As
sample
size gets
large
enough
( 30) ...
sampling
distribution
becomes
almost
normal.
X
Central Limit Theorem
The sampling distribution of means is
a normal distribution if population is
normally distributed
 Even if population is not normally
distributed, the sampling distribution
of means is approximated by a
normal distribution for large n (n>30)

Central Limit Theorem
As
sample
size gets
large
enough
( 30) ...
sampling
distribution
becomes
almost
normal.
X
Proportions

Categorical Variable (e.g., Gender)

% Population Having a Characteristic

If Two Outcomes, Binomial Distribution


Possess - Don’t Possess Characteristic
Sample Proportion Formula:
P 
X
n

number of successes
sample size
Sampling Distribution
of Proportion

Approximated by
Normal Distribution




n·p  5
n·(1 - p)  5
Mean
mP
Sampling Distribution
P(Ps)
.3
.2
.1
.0
 p
.0
P
.2
.4
.6
.8
Standard Error
P 
p  (1 - p)
n
where p = Population
Proportion
1.0
Standardizing Sampling
Distribution of Proportion
Z@
P - mP
P

P -p
p  (1 - p )
n
Sampling
Distribution
P
Standardized
Normal Distribution
z = 1
mP
P
mZ = 0
Z
Thinking Challenge
You’re
manager of a
bank. 40% of depositors
have multiple accounts.
You select a random
sample of 200 customers.
What is the probability that
the sample proportion of
depositors with multiple
accounts would be
between 40% & 43% ?
© 1984-1994 T/Maker Co.
Solution*
P(.40  P  .43)

n·p  5
n·(1 - p)  5
Z@
P -p
.43 -.40

.87
p  (1 - p )
.40  (1-.40 )
n
Sampling
Distribution
200
Standardized
Normal Distribution
P = .0346
Z = 1
.3078
mP = .40 .43
P
mZ= 0
.87
Z
Sampling from
Finite Populations

Modify Standard Error if Sample Size (n) Is
Large Relative to Population Size (N)
n

> .05·N (or n/N > .05)
Use Finite Population Correction (fpc) Factor
for Standard Errors if n/N > .05
x 
x
n

N-n
N -1
P 
p  (1 - p)
n
(N - n)

(N - 1)
Sampling Distribution of the Sample Means
Summary
mx = mx
x 
x 

Sampling is done with replacement
or
Population is infinite
or
n/N < .05
n
x
n
N -n
N -1
Sampling is without replaacement
and
Population is finite
and
n/N > .05
Thinking Challenge
You’re
manager of a
bank. 40% of all 1000
depositors have multiple
accounts. You select a
random sample of 200
customers. What is the
probability that the
sample proportion of
depositors with multiple
accounts would be
between 40% & 43% ?
© 1984-1994 T/Maker Co.
Solution*
P(.40  P  .43)
Z@
P -p
p  (1 - p ) N - n
n
N -1
Sampling Distribution

.43 -.40
.40  (1-.40 )
200

1000 - 200
.97
1000 - 1
Standardized Distribution
P = .0310
Z = 1
.3340
mP = .40 .43
P
mZ= 0
.97
Z
Selecting a Sample Size
Selecting a Sample Size



The Degree of Cofidence
Selected
The Maximum Allowable Error
The Population Standard
Deviation
Sample Size for Means
 z    z  
n

 
2
 E
  E 
2
2
2
E is the allowable error
z is the z score associated with degree of confidence
 is the population standard deviation
The marketing manager
would like to estimate the
population mean annual
usage of home heating oil
to within 50 gallons of the
true value and desires to be 95% confident of
correctly estimating the true mean. Based on a
previous study taken last year,the marketing
manager feels that the standard deviation can be
estimated as 325 gallons. What is the sample
size need to obtain these results?
z = 1.96
Confidence = 95%
E = 50
 = 325
196
z 
.  (325)
(384
. )(105,625)
n


 162.31
2
2
2500
E
(50)
2
2
2
2
 n 163 homes need to be sampled
Sample Size for Proportions
p  1 - p  z
n
2
E
2
E is the maximum allowable error
z is the z value associated with the degree of confidence
p is the estimated proportion
A political pollister would like to
estimate the proportion of voters who
will vote for the Democratic candidate
in a presidential campaign. The
pollster would like 95% confidence
that her prediction is correct to within
.04 of the true proportion. What
sample size is needed?

Confidence = 95%
E = .04
p = unknown
use p = .5
p(1 - P) z
.5(1-.5)(196
. )
n

 600.25
2
2
E
(.04)
2

n = 601 voters
2
Conclusion

Examined Sampling Methods

Described the Properties of Estimators

Explained Sampling Distribution

Described the Relationship between
Populations & Sampling Distributions

Stated the Central Limit Theorem

Solved Probability Problems Involving
Sampling Distributions
End of Chapter
Any blank slides that follow are
blank intentionally.
Download