4.2 Random Sampling: Playing It Safe by Taking Chances Selecting a sample by chance is the only method guaranteed to be unbiased. This randomization helps protect against bias and makes inference possible. probability sample – when each unit in the population has an equal and known probability of ending up in the sample e.g. simple random sampling. This section gives an overview of five different methods of sampling. 1. Simple Random Sampling (SRS) – all possible samples of the same size have the same chance of being chosen. The simplest way of choosing a SRS is to put all the names of the units in a hat and draw them at random. In practice this is often not possible, so a better way is outlined below. Choosing a SRS 1. Make a list of all the units in the population. 2. Number all the units in the list. 3. Use a random number table or random number generator (see Calculator note 4A) to choose units from the numbered list (without replacement) until you have as many as you want. How to Use a Random Number Table 1. Number each member of the population. 2. Determine population size (N). 3. Determine sample size (M). 4. Determine starting point in table by dropping your finger on the page with your eyes closed (Note: In the AP Stats exam you would be given a starting point). 5. Choose a direction in which to read (up to down, left to right, or right to left). 6. Select the first M numbers read from the table whose last X digits are between 0 and N. (If N is a two digit number, then X would be 2; if it is a four digit number, X would be 4; etc.). 7. Once a number is chosen, generally do not use it again, and also ignore numbers which are too large. 8. If you reach the end of the table before obtaining your M numbers, pick another starting point, read in a different direction, use the first X digits, and continue until done. 1 4.2 Random Sampling: Playing It Safe by Taking Chances JellyBlubber Colony Activity SRS is the basic building block for all the other designs. However, few large-scale surveys use only simple random sampling, because other designs often provide greater accuracy or efficiency or both. Example: A small catering business serves 9 reception centers. The owner wants to interview a sample of 4 clients in detail to find ways to improve services to his/her clients. To avoid bias, the owner chooses a simple random sample of size 4. Step 1: Each reception center is assigned a numerical label 1-9. 1 - Darlene’s Wedding Center 2 - Magic Moments Reception Hall 3 - Rustic Realm Weddings 4 - Romance Gardens 5 - Classic Weddings 6 - Old Time Chapel 7 - Lovers Lane Weddings 8 - Accents-Modern Weddings 9 - Century Falls Reception Center Step 2: The owner decides to use a statistical software program to generate 4 numerical labels between 1 and 9 at random. The software returns the following numbers: 5, 8, 6, 4. Therefore, the simple random sample to be interviewed in detail will be: Classic Weddings (5) Accents-Modern Weddings (8) Old Time Chapel (6) Romance Gardens (4) 2. Stratified Random Sampling – a sample is obtained by separating the population units into non-overlapping groups, called _______, and then selecting a ________________________ from each stratum. In general, the size of the sample in each strata is taken in proportion to the size of the stratum. Stratified random sampling is relatively easy to carry out. The ideal situation for stratified random sampling is to have all measurements within any one stratum equal (homogenous) but have differences occurring as we move from stratum to stratum. Stratification tends to give estimates that are closer to the value for the population than SRS (see example on page 235). Stratified Sampling Applet: http://cnx.org/content/m11188/latest/ 2 4.2 Random Sampling: Playing It Safe by Taking Chances Example: If we use the same example demonstrated in simple random sampling, suppose each of the schools contract with different vendors to bring food to their cafeteria. We would expect opinions about cafeteria food to vary widely from school to school. Therefore, it makes sense to create school strata to sample from. Suppose the schools are as follows: School 1: 1050 students School 2: 565 students School 3: 1554 students School 4: 306 students Total students: 1050 + 565 + 1554 + 306 = 3475 students The administrator wishes to take a sample of 150 students. Step 1: The first step is to find the total number of students (3475 above) and calculate the proportion of students in each stratum. School 1: School 2: School 3: School 4: 1050 / 3475 = .30 565 / 3475 = .16 1554 / 3475 = .45 306 / 3475 = .09 Step 2: Next, to select a sample in proportion to the size of each stratum (in this case school), the following number of students should be randomly selected: School 1: School 2: School 3: School 4: 150 x .30 = 45 150 x .16 = 24 150 x .45 ~ 67 150 x .09 ~ 14 This tells us that our sample of 150 students should be comprised of: 3 45 students randomly selected from School 1 24 students randomly selected from School 2 67 students randomly selected from School 3 14 students randomly selected from School 4 4.2 Random Sampling: Playing It Safe by Taking Chances 3. Cluster Sampling – a sample is obtained by dividing the population into non-overlapping groups, called clusters, and then selecting a simple random sample of clusters, and obtaining data on all units in each cluster. Cluster sampling generally provides less precision than either simple random sampling or stratified sampling. This is the main disadvantage of cluster sampling. Cluster sampling is generally employed because of cost effectiveness or because no adequate sampling frame is available. Cluster sampling is less costly than simple or stratified random sampling if the cost of obtaining a sampling frame is very high or if the cost of obtaining observations increases as the distance separating the elements increases. However, cluster sampling may be better than either simple or stratified random sampling if the measurements within clusters are heterogeneous and the cluster means are nearly equal. The ideal situation for cluster sampling is, then, to have each cluster contain measurements as different as possible (heterogeneous) but to have the cluster means equal. This condition is in contrast to that for stratified random sampling in which strata are to be homogeneous but stratum means are to differ. Example: Suppose that the Department of Agriculture wishes to investigate the use of pesticides by farmers in England. A cluster sample could be taken by identifying the different counties in England as clusters. A sample of these counties (clusters) would then be chosen at random, so all farmers in those counties selected would be included in the sample. It can be seen here then that it is easier to visit several farmers in the same county than it is to travel to each farm in a random sample to observe the use of pesticides. The Difference between Strata and Clusters Although strata and clusters are both non-overlapping subsets of the population, they differ in several ways. All strata are represented in the sample; but only a subset of clusters are in the sample. With stratified sampling, the best survey results occur when elements within strata are internally homogeneous (the same). However, with cluster sampling, the best results occur when elements within clusters are internally heterogeneous (different). 4. Two-Stage Cluster Sampling – as for cluster sampling a simple random sample of clusters is taken from a numbered list of all clusters in the population, but rather than obtaining data on all individuals in each cluster a simple random sample is taken from each cluster. This method is useful when it is easier to list clusters than individuals, but still reasonably easy and sufficient to sample individuals once the clusters are chosen. 4 4.2 Random Sampling: Playing It Safe by Taking Chances 5. Systematic Sampling – a sample is obtained by choosing a random starting point between 1 and k, and then choosing every kth individual. If the population elements are in random order, systematic sampling is equivalent to simple random sampling. If the population elements have trends or periodicities, systematic sampling may be worse than simple random sampling. Example: Suppose you want to sample 8 houses from a street of 120 houses. 120/8 = 15, so every 15th house is chosen after a random starting point between 1 and 15 is selected. If the random starting point is 11, then the houses selected are 11, 26, 41, 56, 71, 86, 101, and 116. The diagram below shows how the five different sampling techniques might look for sampling the blobs in the rectangles. 5 4.2 Random Sampling: Playing It Safe by Taking Chances