4.2 Random Sampling

advertisement
4.2 Random Sampling: Playing It Safe by Taking Chances
Selecting a sample by chance is the only method guaranteed to be unbiased. This
randomization helps protect against bias and makes inference possible.
probability sample – when each unit in the population has an equal and known probability of
ending up in the sample e.g. simple random sampling.
This section gives an overview of five different methods of sampling.
1. Simple Random Sampling (SRS) – all possible samples of the same size have the same
chance of being chosen.
The simplest way of choosing a SRS is to put all the names of the units in a hat and draw
them at random. In practice this is often not possible, so a better way is outlined below.
Choosing a SRS
1. Make a list of all the units in the population.
2. Number all the units in the list.
3. Use a random number table or random number generator (see Calculator note
4A) to choose units from the numbered list (without replacement) until you have
as many as you want.
How to Use a Random Number Table
1. Number each member of the population.
2. Determine population size (N).
3. Determine sample size (M).
4. Determine starting point in table by dropping your finger on the page with your
eyes closed (Note: In the AP Stats exam you would be given a starting point).
5. Choose a direction in which to read (up to down, left to right, or right to left).
6. Select the first M numbers read from the table whose last X digits are between 0
and N. (If N is a two digit number, then X would be 2; if it is a four digit number,
X would be 4; etc.).
7. Once a number is chosen, generally do not use it again, and also ignore numbers
which are too large.
8. If you reach the end of the table before obtaining your M numbers, pick another
starting point, read in a different direction, use the first X digits, and continue
until done.
1
4.2 Random Sampling: Playing It Safe by Taking Chances
JellyBlubber Colony Activity
SRS is the basic building block for all the other designs. However, few large-scale surveys use
only simple random sampling, because other designs often provide greater accuracy or
efficiency or both.
Example:
A small catering business serves 9 reception centers. The owner wants to interview a sample of
4 clients in detail to find ways to improve services to his/her clients. To avoid bias, the owner
chooses a simple random sample of size 4.
Step 1:
Each reception center is assigned a numerical label 1-9.
1 - Darlene’s Wedding Center
2 - Magic Moments Reception Hall
3 - Rustic Realm Weddings
4 - Romance Gardens
5 - Classic Weddings
6 - Old Time Chapel
7 - Lovers Lane Weddings
8 - Accents-Modern Weddings
9 - Century Falls Reception Center
Step 2:
The owner decides to use a statistical software program to generate 4 numerical labels
between 1 and 9 at random. The software returns the following numbers: 5, 8, 6, 4.
Therefore, the simple random sample to be interviewed in detail will be:
 Classic Weddings (5)
 Accents-Modern Weddings (8)
 Old Time Chapel (6)
 Romance Gardens (4)
2. Stratified Random Sampling – a sample is obtained by separating the population units into
non-overlapping groups, called _______, and then selecting a ________________________
from each stratum. In general, the size of the sample in each strata is taken in proportion to
the size of the stratum.
Stratified random sampling is relatively easy to carry out. The ideal situation for stratified
random sampling is to have all measurements within any one stratum equal (homogenous)
but have differences occurring as we move from stratum to stratum.
Stratification tends to give estimates that are closer to the value for the population than SRS
(see example on page 235).
Stratified Sampling Applet: http://cnx.org/content/m11188/latest/
2
4.2 Random Sampling: Playing It Safe by Taking Chances
Example:
If we use the same example demonstrated in simple random sampling, suppose each of the
schools contract with different vendors to bring food to their cafeteria. We would expect
opinions about cafeteria food to vary widely from school to school. Therefore, it makes sense
to create school strata to sample from. Suppose the schools are as follows:
School 1: 1050 students
School 2: 565 students
School 3: 1554 students
School 4: 306 students
Total students: 1050 + 565 + 1554 + 306 = 3475 students
The administrator wishes to take a sample of 150 students.
Step 1:
The first step is to find the total number of students (3475 above) and calculate the
proportion of students in each stratum.
School 1:
School 2:
School 3:
School 4:
1050 / 3475 = .30
565 / 3475 = .16
1554 / 3475 = .45
306 / 3475 = .09
Step 2:
Next, to select a sample in proportion to the size of each stratum (in this case school),
the following number of students should be randomly selected:
School 1:
School 2:
School 3:
School 4:
150 x .30 = 45
150 x .16 = 24
150 x .45 ~ 67
150 x .09 ~ 14
This tells us that our sample of 150 students should be comprised of:




3
45 students randomly selected from School 1
24 students randomly selected from School 2
67 students randomly selected from School 3
14 students randomly selected from School 4
4.2 Random Sampling: Playing It Safe by Taking Chances
3. Cluster Sampling – a sample is obtained by dividing the population into non-overlapping
groups, called clusters, and then selecting a simple random sample of clusters, and
obtaining data on all units in each cluster.
Cluster sampling generally provides less precision than either simple random sampling or
stratified sampling. This is the main disadvantage of cluster sampling. Cluster sampling is
generally employed because of cost effectiveness or because no adequate sampling frame
is available.
Cluster sampling is less costly than simple or stratified random sampling if the cost of
obtaining a sampling frame is very high or if the cost of obtaining observations increases as
the distance separating the elements increases.
However, cluster sampling may be better than either simple or stratified random sampling if
the measurements within clusters are heterogeneous and the cluster means are nearly
equal. The ideal situation for cluster sampling is, then, to have each cluster contain
measurements as different as possible (heterogeneous) but to have the cluster means
equal. This condition is in contrast to that for stratified random sampling in which strata are
to be homogeneous but stratum means are to differ.
Example:
Suppose that the Department of Agriculture wishes to investigate the use of pesticides by farmers in
England. A cluster sample could be taken by identifying the different counties in England as clusters.
A sample of these counties (clusters) would then be chosen at random, so all farmers in those
counties selected would be included in the sample. It can be seen here then that it is easier to visit
several farmers in the same county than it is to travel to each farm in a random sample to observe
the use of pesticides.
The Difference between Strata and Clusters
Although strata and clusters are both non-overlapping subsets of the population, they differ in
several ways.
 All strata are represented in the sample; but only a subset of clusters are in the sample.
 With stratified sampling, the best survey results occur when elements within strata are
internally homogeneous (the same). However, with cluster sampling, the best results
occur when elements within clusters are internally heterogeneous (different).
4. Two-Stage Cluster Sampling – as for cluster sampling a simple random sample of clusters is
taken from a numbered list of all clusters in the population, but rather than obtaining data
on all individuals in each cluster a simple random sample is taken from each cluster.
This method is useful when it is easier to list clusters than individuals, but still reasonably
easy and sufficient to sample individuals once the clusters are chosen.
4
4.2 Random Sampling: Playing It Safe by Taking Chances
5. Systematic Sampling – a sample is obtained by choosing a random starting point between 1
and k, and then choosing every kth individual.
If the population elements are in random order, systematic sampling is equivalent to simple
random sampling. If the population elements have trends or periodicities, systematic
sampling may be worse than simple random sampling.
Example:
Suppose you want to sample 8 houses from a street of 120 houses. 120/8 = 15, so every 15th
house is chosen after a random starting point between 1 and 15 is selected. If the random
starting point is 11, then the houses selected are 11, 26, 41, 56, 71, 86, 101, and 116.
The diagram below shows how the five different sampling techniques might look for sampling
the blobs in the rectangles.
5
4.2 Random Sampling: Playing It Safe by Taking Chances
Download