Sampling and Sampling
Distributions
Sumeetha Sharma
Sampling Methods
Technical Terminology
• An element is an object on which a measurement is
taken.
• A population is a collection of elements about which
we wish to make an inference.
• Sampling units are nonoverlapping collections of
elements from the population that cover the entire
population.
Technical Terms
• A sampling frame is a list of sampling units.
• A sample is a collection of sampling units drawn from
a sampling frame.
• Parameter: numerical characteristic of a population
• Statistic: numerical characteristic of a sample
Errors of nonobservation
•
The deviation between an estimate from an ideal
sample and the true population value is the
sampling error.
•
Almost always, the sampling frame does not
match up perfectly with the target population,
leading to errors of coverage.
Errors of observation
• These errors can be classified as due to
the interviewer, respondent, instrument, or
method of data collection.
Why sample?
• The population of interest is usually
too large to attempt to survey all of
its members.
• A carefully chosen sample can be
used to represent the population.
– The sample reflects the characteristics
of the population from which it is drawn.
Probability versus
Nonprobability
• Probability Samples: each member of the population
has a known non-zero probability of being selected
– Methods include simple random sampling, systematic
sampling, stratified sampling and cluster sampling.
• Nonprobability Samples: members are selected from
the population in some nonrandom manner
– Methods include convenience sampling, judgment sampling,
and quota sampling.
Simple Random Sampling
Simple Random sampling is the purest form of
probability sampling.
• Each member of the population has an equal
and known chance of being selected.
• When there are very large populations, it is often
‘difficult’ to identify every member of the
population, so the pool of available subjects
becomes biased.
Systematic Sampling
• Systematic sampling is often used instead of
random sampling. It is also called an Nth name
selection technique.
• After the required sample size has been calculated,
every Nth record is selected from a list of population
members.
• As long as the list does not contain any hidden
order, this sampling method is as good as the
random sampling method.
• Its only advantage over the random sampling
technique is simplicity (and possibly cost
effectiveness).
Stratified Sampling
• Stratified sampling is commonly used probability
method that is superior to random sampling because it
reduces sampling error.
• A stratum is a subset of the population that share at
least one common characteristic; such as males and
females.
– Identify relevant stratums and their actual representation in
the population.
– Random sampling is then used to select a sufficient
number of subjects from each stratum.
– Stratified sampling is often used when one or more of the
stratums in the population have a low incidence relative to
the other stratums.
Cluster Sampling
• Cluster Sample: a probability sample in which each
sampling unit is a collection of elements.
• Effective under the following conditions:
– A good sampling frame is not available or costly, while a
frame listing clusters is easily obtained
– The cost of obtaining observations increases as the distance
separating the elements increases
• Examples of clusters:
– City blocks – political or geographical
– Housing units – college students
– Hospitals – illnesses
– Automobile – set of four tires
Convenience Sampling
• Convenience sampling is used in exploratory
research where the researcher is interested in getting
an inexpensive approximation.
• The sample is selected because they are convenient.
• It is a nonprobability method.
– Often used during preliminary research efforts to get an
estimate without incurring the cost or time required to select a
random sample
Judgment Sampling
• Judgment
sampling
nonprobability method.
is
a
common
• The sample is selected based upon judgment.
– an extension of convenience sampling
• When using this method, the researcher must
be confident that the chosen sample is truly
representative of the entire population.
Quota Sampling
• Quota sampling is the nonprobability equivalent of
stratified sampling.
– First identify the stratums and their proportions as
they are represented in the population
– Then convenience or judgment sampling is used to
select the required number of subjects from each
stratum.
Sample Size?
• The more heterogeneous a population is, the larger
the sample needs to be.
• Depends on topic – frequently it occurs?
• For probability sampling, the larger the sample size,
the better.
• With nonprobability samples, not generalizable
regardless – still consider stability of results
Sampling Distribution
Introduction
• In real life calculating parameters of
populations is prohibitive because
populations are very large.
• Rather than investigating the whole
population, we take a sample, calculate a
statistic related to the parameter of interest,
and make an inference.
• The sampling distribution of the statistic is
the tool that tells us how close is the statistic
to the parameter.
Sample Statistics as Estimators
of Population Parameters
• A sample statistic is a
numerical measure of a
summary characteristic
of a sample.
A population parameter
is a numerical measure of
a summary characteristic
of a population.
• An estimator of a population parameter is a sample
statistic used to estimate or predict the population
parameter.
• An estimate of a parameter is a particular numerical
value of a sample statistic obtained through
sampling.
• A point estimate is a single value used as an
estimate of a population parameter.
Estimators
• The sample mean, X , is the most common
estimator of the population mean,
• The sample variance, s2, is the most common
estimator of the population variance, 2.
• The sample standard deviation, s, is the most
common estimator of the population standard
deviation, .
• The sample proportion, p̂, is the most common
estimator of the population proportion, p.
Sampling Distribution of X
• The sampling distribution of X is the
probability distribution of all possible values
the random variable X may assume when a
sample of size n is taken from a specified
population.
Sampling Distribution of the Mean
• An example
– A die is thrown infinitely many times. Let X
represent the number of spots showing on
any throw.
– The probability distribution of X is
x
1 2 3 4 5 6
p(x) 1/6 1/6 1/6 1/6 1/6 1/6
E(X) = 1(1/6) +
2(1/6) + 3(1/6)+
………………….= 3.5
V(X) = (1-3.5)2(1/6) +
(2-3.5)2(1/6) +
…………. …= 2.92
Throwing a dice twice – sampling
distribution of sample mean
• Suppose we want to estimate
from the mean x of a sample of
size n = 2.
• What is the distribution of x ?
Throwing a die twice – sample
mean
Sample
1
2
3
4
5
6
7
8
9
10
11
12
1,1
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6
Mean Sample
Mean
1
13
3,1
2
1.5
14
3,2
2.5
2
15
3,3
3
2.5
16
3,4
3.5
3
17
3,5
4
3.5
18
3,6
4.5
1.5
19
4,1
2.5
2
20
4,2
3
2.5
21
4,3
3.5
3
22
4,4
4
3.5
23
4,5
4.5
4
24
4,6
5
Sample
25
26
27
28
29
30
31
32
33
34
35
36
Mean
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6
3
3.5
4
4.5
5
5.5
3.5
4
4.5
5
5.5
6
Sample
1
2
3
4
5
6
7
8
9
10
11
12
1,1
1,2
1,3
1,4
1,5
1,6
2,1
2,2
2,3
2,4
2,5
2,6
Mean Sample
Mean
1
13
3,1
2
1.5
14
3,2
2.5
2
15
3,3
3
2.5
16
3,4
3.5
3
17
3,5
4
3.5
18
3,6
4.5
1.5 x 19
4,1
2.5
x
2
20
4,2
3
2.5
21
4,3
3.5
3
22
4,4
4
3.5
23
4,5
4.5
4
24
4,6
5
Sample
25
26
27
28
29
30 2
31 x
32
33
34
35
36
Mean
5,1
5,2
5,3
5,4
5,5
5,6
6,1
6,2
6,3
6,4
6,5
6,6
3
3.5
4
4.5
2 5
x 5.5
3.5
4
4.5
5
5.5
6
The distribution of x when n = 2
Note : =
and =
2
E( x) =1.0(1/36)+
1.5(2/36)+….=3.5
6/36
5/36
V(X) = (1.0-3.5)2(1/36)+
(1.5-3.5)2(2/36)... = 1.46
4/36
3/36
2/36
1/36
1
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5 6.0
x
Sampling Distribution of the
Mean
n=5
x = 3.5
n = 10
x = 3.5
n = 25
x = 3.5
2x
= .5833 ( = )
5 6
2x
2
x = .2917 ( = )
10
2x
= .1167 ( = )
25
2
x
2
x
Sampling Distribution of the
Mean
n=5
x = 3.5
2x
2
x = .5833 ( = )
5
n = 10
x = 3.5
n = 25
x = 3.5
2x
= .2917 ( = )
10
2x
= .1167 ( = )
25
2
x
2
x
Notice that x2 is smaller than .x2.
The larger the sample size the
2
smaller x . Therefore, x tends
to fall closer to , as the sample
size increases.
Relationships between Population Parameters and
the Sampling Distribution of the Sample Mean
The expected value of the sample mean is equal to the population mean:
E( X ) = =
X
X
The variance of the sample mean is equal to the population variance divided by
the sample size:
V(X) =
2
X
=
2
X
n
The standard deviation of the sample mean, known as the standard error of
the mean, is equal to the population standard deviation divided by the square
root of the sample size:
X
s.e. = SD( X ) = X =
n
The Central Limit Theorem
n=5
0.25
P(X)
0.20
0.15
0.10
0.05
0.00
X
n = 20
P(X)
0.2
0.1
0.0
X
When sampling from a population
with mean and finite standard
deviation , the sampling
distribution of the sample mean will
tend to be a normal distribution with
mean and standard deviation n as
the sample size becomes large
(n >30).
Large n
0.4
0.2
0.1
0.0
-
X
For “large enough” n: X ~ N ( , / n)
2
f(X)
0.3
The Central Limit Theorem Applies to
Sampling Distributions from Any Population
Normal
Uniform
Skewed
General
Population
n=2
n = 30
X
X
X
X
Finding Z for the Sampling
Distribution of the Mean
The Central Limit Theorem
(Example)
Mercury makes a 2.4 liter V-6 engine, used in speedboats. The company’s
engineers believe the engine delivers an average horsepower of 220 HP and
that the standard deviation of power delivered is 15 HP. A potential buyer
intends to sample 100 engines. What is the probability that the sample mean
will be less than 217 HP?
X − 217 −
P ( X 217) = P
n
n
217 − 220
217 − 220
= P Z
= P Z
15
15
10
100
= P ( Z −2) = 0.0228
Student’s t Distribution
If the population standard deviation, , is unknown, replace with
the sample standard deviation, s. If the population is normal, the
resulting statistic:
t = X −
s/ n
has a t distribution with (n - 1) degrees of freedom.
•
•
•
•
•
The t is a family of bell-shaped and
symmetric distributions, one for each
number of degree of freedom.
The expected value of t is 0.
The variance of t is greater than 1, but
approaches 1 as the number of degrees of
freedom increases.
The t distribution approaches a standard
normal as the number of degrees of
freedom increases.
When the sample size is small (<30) we use
t distribution.
Standard normal
t, df=20
t, df=10
Sampling Distributions
Finite Population Correction Factor
If the sample size is more than 5% of the
population size and the sampling is done
without replacement, then a correction needs
to be made to the standard error of the
means.
N −n
x =
•
n
N −1
Sampling Distribution of x
Standard Deviation of x
Finite Population
N −n
x = ( )
n N −1
Infinite Population
x =
n
• A finite population is treated as being
infinite if n/N < .05.
• ( N − n ) / ( N − 1) is the finite correction factor.
• x is referred to as the standard error of the
mean.
Sampling Distribution of the Sample Mean
• The amount of soda pop in each bottle is normally
distributed with a mean of 32.2 ounces and a
standard deviation of 0.3 ounces.
• Find the probability that a carton of four bottles will
have a mean of more than 32 ounces of soda per
bottle.
• Solution
– Define the random variable as the mean amount of soda per
bottle.
x − 32 − 32 .2
P( x 32 ) = P(
)
0.9082
x
.3 4
= P( z −1.33 ) = 0.9082
x = 32
x = 32 = 32.2
x = 32.2
Sampling Distribution of the
Sample Mean
• Example
– Dean’s claim: The average weekly income of
M.B.A graduates one year after graduation is
$600.
– Suppose the distribution of weekly income has a
standard deviation of $100. What is the
probability that 25 randomly selected graduates
have an average weekly income of less than
$550?
– Solution
x − 550 − 600
P( x 550) = P(
x
100
25
)
= P( z −2.5) = 0.0062
Sampling Distribution Example
Sampling Distribution Example
• During any hour in a large department
store, the average number of shoppers is
448, with a standard deviation of 21
shoppers. What is the probability that a
random sample of 49 different shopping
hours will yield a sample mean between
441 and 446 shoppers?
• (Ans: 0.2415 or 24.15%)
Thank you