Statistics 251 – Dr. Uebersax 03 Sampling 1 Student Research Project on Statistics courses: http://goo.gl/UeoRrU 1. Populations and Samples Today our general topic will be how to construct a sample of a population. We'll consider some good methods, and some not-so-good methods. Sampling gets at the essence of inferential statistics: to draw accurate generalizations about a population using a smaller sample. Population characteristics are called parameters. We symbolize these as Greek letters: μ, σ, π Sample characteristics are called sample statistics. We symbolize these as English letters. X , s, p 2. Representative vs. Biased Samples We want our samples to be representative of (i.e., accurately correspond to) populations. Two things limit the representativeness of samples: Sampling variation (random nonrepresentiveness); this is pure random error; it can produce under- or over-estimation, depending on the sample. Bias (systematic nonrepresentiveness); bias is a consistent error, i.e., produces samples that tend to have too much or too little of some characteristics. Sampling variation is reduced by having larger samples. Bias is reduced by using random sampling methods. Statistics 251 – Dr. Uebersax 03 Sampling 2 3. Sample Size Small samples are inherently variable (Example: conducting an opinion survey using only 5 people). The larger the sample size, the better our inferences about the population from which it comes. But larger samples are more expensive. So we need to find a happy medium – not too large, not too small. 4. Random Sampling We minimize sampling bias by using random sampling methods. With simple random sampling (SRS), each object in the population has the same chance of being selected. Example: place everybody's name in a hat and select n names (Iottery method). With stratified random sampling, we divide the population into subgroups (called strata), and randomly sample within each subgroup. Example: We want a random sample of 40 CalPoly students, but also want each class (Fr, So, Jr, Sr) to be equally represented. Therefore we draw separate samples of n = 10 from each of the four classes separately: 10 random Fr, 10 random So, 10 random Jr, 10 random Sr. If we just selected any 40 students at random, we might get, say, only a few Freshmen With weighted random samples, we assign cases in some groups a higher probability of being sampled. Example: suppose we are equally interested in men's and women's opinions, but women are under-represented by 50% in a population; we could then use lottery-style random sampling, but place each woman's name 'in the hat' twice. Systematic samples follow some fixed rule – for example, taking every 10th name from a list. This method is still random, provided we begin sampling from a random location. However, if there are patterns or periodicities in the original list, the sample might be biased. 5. Nonrandom Sampling Nonrandom samples are almost always biased and should be avoided if possible. Sometimes, however, they are a practical alternative to random samplings. Voluntary sampling. Ask people to volunteer. Convenience sampling. Use as sample whoever is convenient (e.g., students on the mall). Statistics 251 – Dr. Uebersax 03 Sampling 6. Using Excel to Generate a Simple Random Sample 1. Open Excel 2. File > Options > Formulas 3. In "Calculation Options" field: Check: Uncheck: Click: OK Manual Recalculate Workbook before saving 4. Enter this formula in a cell: =RANDBETWEEN(1, N), where N is your population size. 5. Copy this formula to n * 1.5 (or more) cells (where n is the sample size) below the first one (this includes some extra random numbers. 6. Press F9 to calculate. 7. Copy values to a different column: Highlight target range Ctrl-C Position cursor in first cell of destination range Ctrl-V When clipboard icon appears, Choose Paste Values; otherwise, on the Home tab, in the Clipboard group, click Paste icon, then Paste Values). 3 Statistics 251 – Dr. Uebersax 03 Sampling 8. Remove duplicate values: Highlight column that contains random number values. Data > Remove Duplicates Homework: Given a population with N = 20, numbered from 1 to 20: Use Excel to generate a random sample with n = 10. Place results in two adjacent columns of Excel and print and bring to class to turn in. Remember to indicate lecture number (03) on homework. 4