BVHS Statistics and Probability Chapter 10--Samples Name ___________________________ Date ________________ Period _____ Sampling Designs- Random Rectangles Activity The Rectangles Sheet has 100 random rectangles on it. They are already numbered so that you can easily sample them by randomly generating numbers. They are of varying size (area). Each box is 1 unit of area. Example: 3 units2: 6 units2: 8 units2: You will be doing FIVE different TYPES of samples. Each time you will be sampling a total of 5 rectangles. 1. Guess Your teacher will display 100 rectangles on the board. Look at the rectangles for a few seconds and write down your guess as to the average area of the rectangles (Each small square is one square unit.) Guess of Average Area: ____________________________ 2. Judgmental Sample Study the rectangles on the front side of the sheet (side #1). After studying the 100 rectangles, select ANY 5 that you believe are representative of the whole population. Let Z be the size (area) of the rectangle. Write the rectangle # that you chose, and the size (area) of your 5 below, then find the mean size ( Z ) Rectangle # z Area (Z) Basic Sampling Concepts A population is the entire group of objects or people about which information is desired. The individual members of the population are called units/subjects once selected in a sample. A frame is a listing of all units/subject in the population. A sample is the subset of the population that is actually examined in order to gather information. The sampling design is the method used to select the sample from the population. The simple random sample (SRS) is a sampling design that gives every possible (combination of units/subjects) sample of a given size the same chance of being selected. Simple random samples may be obtained from the frame by using random numbers to select the units/subjects to be sampled. There are also specialized sampling designs, including stratified sampling and cluster sampling. Failure to use proper probability sampling often results in bias, i.e. in systematic errors in the way the sample represents the population. Voluntary response samples, which involve self-selection, are particularly prone to large bias. Some common problems in sampling human populations are o Undercoverage—in which the list of units, or frame, from which the sample is selected does not include every member of the population, o Non-response bias—in which information is not available for some units selected for the sample, o Response bias—in which respondents give inaccurate information, and o Poorly worded questions—in which the wording of the questions suggests the response desired by the interviewer. 3. Simple Random Sample (SRS) Using the calculator, generate 5 random numbers from 1 to 100 (use randInt(1, 100)). Find the matching rectangle and its size. Record the info below, then find the mean. z Rectangle # Area (Z) 4. Stratified Sample On the back side of the rectangles, the rectangles have been separated into 5 groups (strata) based on size. Each strata now has 20 rectangles of similar size. We want to take a sample in each strata, then combine these to make our total sample. Generate a random number between 1 and 20 (randInt(1,20)), and use that rectangle in Strata #1. Record the # you generated and rectangle size below. Then do the same thing for each of the other strata (generate a new random number between 1 & 20 each time). Record the sizes and find the mean. Strata 1 Strata 2 Strata 3 Strata 4 Strata 5 z Rectangle # Area (Z) A potential advantage of stratified sampling is a more precise estimate of the population mean, μ, than one could find from a simple random sample (SRS) of the same size. This occurs because the stratified sampling reduces the sampling variability since samples taken with a stratum vary less, so our estimates can be more precise. Potential disadvantages are [1] the sampling procedure may be more difficult, because one needs to divide the population into strata before sampling, and [2] the formula for variance is more complicated. 5. Systematic Sample Use the original sheet of rectangles (side #1). Randomly generate a number between 1 and 20. This is the first rectangle you will sample. Write this rectangle # in the first box below. Add 20 to the random number to get the second rectangle you will sample. Continue to add 20 to get the next three rectangles for your sample. This is like having to sample every 20th person who passes you. Ex: You get the random number 6. Then you would inspect rectangles 6, 26, 46, 66, and 86. Record the rectangle numbers, sizes, and then find the mean size. Random Number Rectangle # Area (Z) r z 6. Cluster Sample Here we want to take a sample of a group of rectangles that are near each other. We still want a sample of five rectangles total, and we also still want it to be a random sample. Look back at the original sheet of rectangles (side #1). We can put the rectangles into clusters (groups) of five based on their assigned number. So the first cluster would be rectangles #1-5, the second cluster would be rectangles #6-10, and so on giving us 20 clusters to sample. Let’s choose a cluster for our sample. To do this, choose a random number, r, between 1 and 20. This is the cluster you will use for your sample. Now calculate 5r – 4 , and then 5r. The rectangles with numbers from 5r – 4 to 5r are your cluster. This should be 5 rectangles. Ex: You get the random number 6. 5(6) – 4 = 26 and 5(6) = 30. So this means you look at rectangles #26 – 30. Record the 5 rectangle numbers below, then their size, then the mean. Random Number r z Rectangle # Area (Z) A potential advantage of cluster sampling is its convenience. It may be difficult to find a simple random sample of organisms that are clustered. Potential disadvantages are [1] the units in a cluster may be similar, which leads to a loss of efficiency (i.e. we would need to sample more units to get same level of precision as with SRS, and [2] the formula for variance is more complicated. RECORD the data from your fellow classmates in the table given below. Guesses Judgemental SRS Stratified Systematic Cluster Compare sampling distributions 1. Sketch a dotplot of the class means from Judgmental sampling, SRS, Stratified sampling, Systematic sampling, and Cluster sampling. Guess 2. Discuss similarities and differences regarding shapes and spreads of each dot plot above. 3. Calculate the mean of the sample averages for the guess, judgemental sample and all of the other sampling techniques. Mark this value on each of the dot plots with a star symbol. How do these CENTERS of the distributions of the means compare? 4. Which method do you is the least accurate? Why? 5. Do you think one method is doing a better job? Why? 6. Sampling bias--The actual mean is 7.42. a. Do any of the plots have a center that is very close to the true average? If so, which one(s)? b. Do any of the plots have a center that is larger than the true average? If so, which one(s)? c. Which of the sampling strategies are most biased? Explain your answer.