STAT 680 BIOSTATISTICS Sampling Methods estimation 1. Census – complete enumeration • Measure every individual / attribute of interest. • Accurate description of the population. • Drawbacks: Only viable with small populations (e.g., reduced # trees) Only cost-effective with high-valued features. 2. Sampling – subset of the population • Accurate description of the population • Drawbacks Requires planning and (most likely) pre-sampling Results are accompanied by a confidence interval 2 Sampling • Sampling – the most used technique to obtain information on some parameters of interest • Examples: – Amounts: average individual weight or height, – Classification proportions: proportion of individuals which are female, or proportion diseased – Totals: total harvestable timber, or total biomass – Associations: relationship of soil nutrition level to plant biomass, or age of tree to lumber yield • Challenges 1. To select units that represent the population of interest (unbiasness) 2. To use these measurements to estimate the population parameter of interest 3. To determine the statistical quality of these estimates 3 Defining the population • Target population: a clearly defined population from which the sample will be drawn • Inferential population: a clearly defined population to which our results will be applied • Sampled population: the collection of all possible observation units that might have been chosen in a sample 4 Defining the sample • Observation unit: an entity on which a measurement is taken (also called an element) • Sample: a subset of the population • Sampling unit: individual items in a sample – An observation unit (a person or a household) – A set of observation units (people from a household) 5 Ideal vs. real sampling situation Target Population Inferential Population Sample Sampled Population Target Population Inferential Population Sampled population Sample 6 example • We are interested in knowing if Tech football fans are satisfied with on campus parking – – – – – – Target population: all current season ticket holders Sample frame: list of the season ticket holders as of 06/2010 Inferential population: all season ticket holders 2009-2010 Observation unit: a household that has season Sample: those who respond and complete the survey Sampled population: those season ticket holders that were on the list as of June 1 and chose to respond 7 Example - graphics Target Population: 2010 season ticket holders Inferential population: ticket holders 06/2010 Sampled Population: People who responded & intend to attend 2010 games Sample Random selected valid respondents 8 sample selection • Random selection – A probability-based selection protocol where each sampling unit has a known positive probability of being selected – Probability of selection need not be equal for each unit, as long as probabilities are known for each unit • Systematic selection – First sampling unit is selected randomly, subsequent units are not – Each sampling unit in the population has the same probability of being selected – The probabilities of different sets of units being included in the sample are not all equal 9 methods of selecting sampling units • Simple random selection (SRS) • Systematic random selection (SyRS) • Stratified random selection (StRS) SRS SyRS StRS 10 When to use SRS • Sampling frame explicitly lists sampling units • Sampling units are identified by a location (an (x,y) pair or a moment in time) • Assumptions Every possible combination of sampling units has an equal and independent chance of being selected. The selection of a particular unit to be sampled is not influenced by the other units that have been selected or will be selected. Samples are either chosen with replacement or without replacement. 11 When to use SyRS • Sampling units are easy to locate. • Sampling follows a pattern. • The initial sampling unit is randomly selected. All other sample units are spaced at uniform intervals throughout the area sampled. • Assumption • There is no pattern in the population 12 When to use STRS • Stratified Random Sampling should be used when: 1. The distribution of items is skewed 2. The variability is two large and separate entities can be identified • Allows to draw a more representative sample. – (i.e., if there are more individuals of a certain type in the population the sample has more of that type and if there are fewer of another type, there are fewer on the later type in the sample) 13