Chapter 7:
Sampling and
Sampling
Distributions
Learning Objectives
LO1
Contrast sampling to census and differentiate among
different methods of sampling, which include simple,
stratified, systematic, and cluster random sampling;
and convenience, judgment, quota, and snowball
nonrandom sampling, by assessing the advantages
associated with each.
LO2
Describe the distribution of a sample’s mean using
the central limit theorem, correcting for a finite
population if necessary.
LO3
Describe the distribution of a sample’s proportion
using the z formula for sample proportions.
Reasons for Sampling
• Sampling is used for gathering useful information about a
population
• Sampling can provide information in a timely and convenient
form
• Sampling can save time and money.
• For given resources, sampling is more efficient and can
broaden the scope of a study.
• The research process sometimes requires destructing
product; sampling can reduce the cost of destroying product.
• If accessing the population is impossible; sampling is the only
option.
Reasons for Taking a Census
• When it is essential to eliminate the possibility that by chance
a random sample may not be representative of the
population.
• When sampling errors have fatal consequences, a census is
required for the safety of the consumer.
Population Frame
• A list, map, directory, or other sources used to identify
and locate the population
• A list , map, or directory such as a school list, trade
association list, telephone directory, or even a list sold
by list brokers is called a frame
• The frame should ideally be a one-to-one
correspondence with the population, but may have a
Gap due to over registration or under-registration
Population Frame
• Over registration: the frame contains all members of the target
population and some additional elements
– Example: using Bell Montreal telephone registry as a listing of
residences with Bell telephones in Montreal
• Under registration: the frame does not contain all members of
the target population.
– Example: using the chamber of commerce membership directory as the
frame for a target population of all businesses.
Random Versus Nonrandom Sampling
• Random sampling
– A chance process mechanism used to select some units of the
population
– Every unit of the population have the same probability of being
included in the sample.
– Eliminates bias in the selection process
– Also called probability sampling
• Nonrandom Sampling
– Every unit of the population does not have the same probability of
being included in the sample.
– Open to selection bias
– Not an appropriate data gathering technique for use in most statistical
methods presented in this text
– Also known as nonprobability sampling
Basic Random Sampling Techniques
• Four Basic Sampling Techniques
• Simple Random Sampling
• Systematic Random Sampling
• Stratified Random Sampling
• Cluster (or Area) Sampling
Simple Random Sample
• The most elementary sampling technique
• The basis for developing other random sampling
methods
• Use random number generator to select units
• Random numbers: a sequence of numbers that lack
any pattern
• Number or code each frame unit from 1 to N.
• Easier to perform for small populations
• Cumbersome for large populations
• Seldom used in practice
Application of the
Simple Random Sample Technique
• Uses a random number table or a random number
generator.
• Each unit of the frame is numbered from 1 to N
• Each unit of frame has an equal chance of being
selected to sample
• Use random number table to select n distinct numbers
from N or between 1 and N, inclusively
• Does not guarantee that sample is representative of
the population
Simple Random Sampling:
Random Number Generator Table
Simple Random Sample:
Sample Members Selected
01 Acceleware Corp.
02 Apption Software
03 Auctionwire Inc.
04 Audability Inc.
05 b5media Inc.
06 Bond Consulting
Group
07 Cadre Staffi ng Inc.
08 Direct Sales Force Inc.
09 Diversified Brands
2005 Inc.
10 Eagle Wake Ltd./
Ticket Gold
11 EFT Canada Inc.
12 Filemobile Inc.
13 Hutton Forest
Products Inc.
14 KMA Contracting Inc.
15 League Assets Corp.
16 Lettuce Eatery (Freshii
Inc.)
17 LOGiQ3 Inc.
18 MedicLINK Systems
Ltd.
19 Mortgagebrokers.com
Holdings Inc.
20 Rapido Trains Inc.
21 Pacesetter Directional
and Performance
Drilling Ltd.
22 PrecisionERP Inc.
23 Scalar Decisions Inc.
24 Siamons International
Inc.
25 Simcoe Canada Land
Development Inc.
26 Stiris Research Inc.
27 Sweetspot.ca Inc.
28 TAG Recruitment
Group Inc.
29 Unity Telecom Corp.
30 Vortex Mobile (Vortxt
Interactive Inc.)
• Population Size = N = 30
• Sample Size = n = 6
Simple Random Sample:
Numbered Population Frame
Use Excel’s RANDBETWEEN function to generate a random sample size of 6.
Stratified Random Sample
• Population is divided into nonoverlapping subpopulations called
strata.
• Internally, sub-populations should be as homogeneous as
possible;
• Externally, they should contrast with each other.
• A random sample is selected from each stratum.
• Potential for reducing sampling error
• Proportionate: the percentage of the sample taken from each
stratum is proportionate to the percentage that each stratum is
within the population
• Disproportionate: proportions of the strata within the sample
are different than the proportions of the strata within the
population
Stratified Random Sample:
Population of FM Radio Listeners
Systematic Sampling
• Convenient and relatively easy to
administer
• Population elements are an
ordered sequence (at least,
conceptually).
• The first sample element is selected
randomly from the first k
population elements.
• Thereafter, sample elements are
selected at a constant interval, k,
from the ordered sequence frame.
Problems With Systematic Sampling
• When used with alphabetic ordered set, it is no better
than simple random sampling and therefore does not
guarantee representative samples.
• The sample becomes nonrandom when the data is
subject to periodicity
Systematic Sampling: Example
• Frame: Scott’s National manufacturers of Canada Directory listing
N= 105,000 manufacturers in alphabetic order
• Sample n = 1,000
• k = 105,000/1,000 = 105
• First sample element randomly selected from the first 105
manufacturers.
• Assume the 5th purchase order was selected from random tables:
the first element is the manufacturer coded 5
• Subsequent sample elements k+5, 2k+5, etc: , 110, 215, 320, . .
.until 1,000 manufacturers are selected.
Cluster or Area Sampling
• The population is divided into nonoverlapping and
internally homogeneous clusters or areas.
• Each cluster is a miniature, or microcosm, of the population.
• A subset of the clusters is selected randomly for the sample.
• If the number of elements in the subset of clusters is larger
than the desired value of n, these clusters may be
subdivided to form a new set of clusters and subjected to a
random selection process.
Cluster Sampling
• Advantages
–
–
–
–
More convenient for geographically dispersed populations
Reduced travel costs to contact sample elements
Simplified administration of the survey
Unavailability of sampling frame prohibits using other random
sampling methods
• Disadvantages
– Statistically less efficient when the cluster elements are similar
– Costs and problems of statistical analysis are greater than for
simple random sampling.
Two-Stage-Cluster Sampling
• In cluster sampling, sometimes the clusters are too large, and
a second set of clusters is taken from each original cluster.
– This technique is called two-stage sampling.
– Canadian Example: divide Canada into clusters of cities; then divide
the cities into clusters of blocks; and randomly select individual houses
from the block clusters.
• Advantages:
– Clusters are usually convenient to obtain
– Cost of sampling entire population is reduced due to reduction in
scope of study
Nonrandom Sampling
• Convenience Sampling: sample elements are selected for
the convenience of the researcher
• Judgment Sampling: sample elements are selected by the
judgment of the researcher
• Quota Sampling: sample elements are selected until the
quota controls are satisfied
• Snowball Sampling: survey subjects are selected based on
referral from other survey respondents
Nonsampling Errors
•
•
•
•
•
Data from nonrandom samples are not appropriate for analysis
by inferential statistical methods.
All errors other than sampling errors are nonsampling errors
Sampling error occurs when the sample is not representative
of the population. Sampling errors are unavoidable and usually
not measurable.
Biases may be avoidable and are usually measurable.
Causes of Nonsampling Errors
– Missing data, recording, data entry, and analysis errors
– Poorly conceived concepts, unclear definitions, and defective
questionnaires
– Response errors occur when people do not know, will not say, or
overstate in their answers
– Virtually no statistical method exists to control for nonsampling errors.
Diligence in planning survey and execution required
Sampling Distribution of
• Proper analysis and interpretation of a sample statistic
requires knowledge of its distribution.
• The sample mean is one of the more common statistics used
in inferential statistics. Its underlying probability function and
the inferential process
Distribution
of a Small Finite Population
Suppose a small finite population consists of only
N = 8 numbers:
54
55
59
63
64
68
69
70
Generating the Following Sample Space Taking Samples of
for n = 2 with Replacement
Excel Produced Histogram
of the 64 Sample Means for n=2
Histogram of a
Poisson-Distributed Population, λ =
1.25
Histogram of Sample Means for the Data
In Previous Slide
The Changing Shape of The Distribution of Sample
Means Relative to the Sample Size n
• The previous slides illustrate that as the size of the sample n
increases and as the number of sample increase, the shape of
the sample mean histogram generated by the sampling
distribution becomes more symmetric and smoother looking.
• The next set of slides demonstrate this for the case where
sampling is from a population which has a uniform
distribution in which a = 10 and b= 30
• Note that even for small sample sizes that the distribution of
sample means begin to pile up in the middle
• General Rule: as sample sizes become much larger, the
sample mean distribution begins to approach a normal
distribution and the variation among the means decreases.
Means of 90 Samples (n = 2 to n = 30)
from Uniformly Distributed Distribution
1,800 Randomly Selected Values
from a Uniform Distribution
Means of 60 Samples (n = 2)
from a Uniform Distribution
Means of 60 Samples (n = 5)
from a Uniform Distribution
Means of 60 Samples (n = 30)
from a Uniform Distribution
Central Limit Theorem
∗ Note that the central limit theorem itself does not specify what a “large sample size” is. As a guideline, it is
assumed to be 30 or more, although this does not follow from the central limit theorem itself. The
derivations are beyond the scope of this text and are not shown.
Shapes of the Distribution of Sample Means for 3 Sample
Sizes and the Normal and Uniform Distributions
Distribution of Sample Means
for 3 Sample Sizes and the U-shape and Normal Distributions
Sampling from a Normal Population
• The distribution of sample means is normal for any sample
size.
Z Formula for Sample Means
Tire Store Example in Figure 7.6
Graphic Solution to the Store Example
Demonstration Problem 7.1
For this problem, μ = 448, σ = 21, and n = 49. The
problem is to determine P(441 ≤ x ≤ 446). The following
diagram depicts the problem.
Demonstration Problem 7.1
Sampling from a Finite Population
without Replacement
• In this case, the standard deviation of the distribution of
sample means is smaller than when sampling from an infinite
population (or from a finite population with replacement).
• The correct value of this standard deviation is computed by
applying a finite correction factor to the standard deviation
for sampling from a infinite population.
• If the sample size is less than 5% of the population size, the
adjustment is unnecessary.
Sampling from a Finite Population
• Finite Correction Factor
• Modified Z Formula
Finite Correction Factor
for Selected Sample Sizes
Sampling Distribution of
p
• If research or experiment produces, not measurable, but
countable items such as the frequency with which an
attribute occurs then the sample proportion is often the
statistic of choice
• Example: Take samples of 3 with replacement from a group
of 5 things. Total possible samples is 25 = 32 . If there are only
two attributes or countable outcomes possible (defective and
non-defective), each sample have a certain proportion of
things defective or non-defective.
• There will be 32 possible proportions. And as in the case for
the means of measurable outcomes, these 32 proportions
have a distribution, with parameters that differ from those of
the original population.
Sampling Distribution of p
• Sampling Distribution of the population proportion and its
parameters:
• The Sample Proportion
• The standard deviation of the distribution is
P Q
n
Sampling Distribution of
p
Sampling Distribution is approximately normal if
• n ∙ p > 5 and n ∙ q > 5 where
(p is the population proportion and q = 1 − p)
• The mean of sample proportions for all samples of size n
randomly drawn from a population is p (the population
proportion) and the standard deviation of sample
proportions is
P Q
n
which is sometimes referred to as the standard error of the
proportion.
Z Formula for Sample Proportions
Solution for Demonstration Problem 7.3
COPYRIGHT
Copyright © 2014 John Wiley & Sons Canada, Ltd. All
rights reserved. Reproduction or translation of this work
beyond that permitted by Access Copyright (The Canadian
Copyright Licensing Agency) is unlawful. Requests for
further information should be addressed to the
Permissions Department, John Wiley & Sons Canada, Ltd.
The purchaser may make back-up copies for his or her
own use only and not for distribution or resale. The
author and the publisher assume no responsibility for
errors, omissions, or damages caused by the use of these
programs or from the use of the information contained
herein.