Census - Collects data from every individual in the population Population - The entire group of individual about which we want info - In a statistical study Sample - Part of the population from which we actually collect info - Use info to draw conclusions/make inferences about the entire population - Inference: drawing conclusions about a population on the basis of sample data Creating a Sample Survey 1) Define population of interest - Sampling frame: list of things that you draw a sample from 2) Determine what variable you want to measure 3) Decide how to choose a representative sample Bias - Using a method that will consistently overestimate/underestimate the value that you want to know - Design of a study (can be attributed to flaws/data collection, not always a personal bias) systematically favors certain outcomes - Exam tip: indicate the direction of the bias and explain why Biased Sampling Methods Convenience Sample - Choosing individuals who are easiest to reach - Bias → often results in sample of like-minded people Voluntary Response Sample - Choosing individuals who voluntarily respond to a general appeal/invitation - Bias → people w/strong opinions (often in same direction) are most likely to respond - Internet, write-in, and call-in opinion polls Personal Choice - Creates bias - Interviewers choose in CS, individuals choose in VRS - Combat this problem by relying on chance (random = due to chance) Random Sampling - The use of chance to select a sample - Central principle of statistical sampling Simple Random Sample (SRS) - An SRS of n individuals → chosen from the population so that every set of n individuals has an equal chance of being selected Choosing an SRS - Label: Assign a numerical label to every individual in a population - Use Table D to select random numbers - Table D: table of random digits - Or generate random integers on a calculator/use the hat method Stratified Random Sample - Involves sampling important groups (strata) separately → combined to form one stratified random sample - Ex. dividing HS by grade level, dividing districts by income level, etc 1) Classify population into groups of similar individuals (strata) 2) Select an SRS from each stratum 3) Combine to form the full sample Cluster Sample - More often for convenience - Clusters: mirror characteristics of population, contain variety 1) Classify population into clusters (groups near each other) 2) Select an SRS of clusters (choose a couple of entire clusters out of all of them) 3) Combine selected clusters into a sample Multistage Sample - Two or more methods combined Margin of Error - Sets bounds on the size of the likely error - Tells us how much sampling variability to expect - Results from random samples fall within Errors - Errors in sample surveys can introduce bias - Two main sources: sampling errors and nonsampling errors Sampling Errors - Use of bad sampling methods - Undercoverage: when some groups in the population are left out of the process of choosing the sample - Sampling frame should list all inds in pop (not often available) - Ex. calling landline telephone numbers → excludes people w/only cell phone or no phone at all, visiting households → excludes students in dorms, etc Nonsampling Errors - Occurs when individuals chosen for the sample can’t be contacted or refuses to participate - Can only occur after a sample has been selected - Different from voluntary response (inds already selected) Response Bias - Systematic pattern of incorrect responses - People falsely tell interviewers that they voted bc it’s a social expectation - Race/gender of interviewer can affect responses - Forcing respondents to recall past events can lead to inaccurate info Wording of Questions - Confusing/leading questions can introduce strong bias and change outcomes - Order of questions matters too Observational Study - Observes individuals and measures variables of interest but doesn’t attempt to influence responses - Compare groups, examine relationships between variables, describe groups/situations Experiment - Deliberately imposes treatment on individuals to measure their responses - Determine whether the treatment causes a change in the response - Can help understand cause and effect Confounding/Lurking Variable - Variable that isn’t explanatory/response but can influence the RV Confounding - When two variables are associated in a way that their effects on a RV can’t be distinguished from one another - Observational studies of the effect of one V on another often fail bc of confounding Treatment - Specific condition applied to individuals - Can be a combination of Vs if there are multiple EVs Experimental Units - Smallest collection of individuals to which treatments are applied - Subjects: when EUs are humans Factors - EVs in an experiment - When studying the joint effects of factors, each treatment is formed by combining a specific value (level) of each factor EV vs. Treatment - Combinations of levels of EVs/factors form treatments Random Assignment - Experimental units are assigned to treatments at random - Solution to problem of bias Comparative Experimental Design - Compares two treatments - Random when EUs are assigned to treatments by chance Completely Randomized Design - Treatments are assigned to all the EUs by chance - Happens after participants have been selected - Can compare any number of treatments - Difficult to make each group the same size Control Group - Provides a baseline for comparing the effects of other treatments Principles of Experimental Design 1) Comparison - 2 or more treatments that you can compare the results from 2) Control - Control for confounding variables that might affect responses - Use comparative design and ensure that the only systematic difference between the groups is the treatment administered - Compare to a placebo - Treatment with natural option for EV 3) Random Assignment - Use chance to assign individuals to treatment groups - Helps reduce effects of confounding variables that you can’t control - Forms groups of EUs that should be similar before the treatments are applied - As a result, differences should be due to treatment or chance 4) Replication - Using enough EUs to distinguish a difference in the effects of the treatments from chance variation Placebo Effect - Response to a dummy treatment Double-Blind - Neither subjects nor those who interact with them/measure the RV know which treatment a subject received Single-Blind - Individuals who interact with subjects/measure the RV don’t know which treatment a subject received - Sometimes, it isn’t possible for subjects to not know Statistically Significant - An observed effect so large that it would rarely occur by chance - A statistically significant association in data from a well-designed experiment implies causation After participants have been selected: Completely Randomized Design - See above Block - A group of EUs that are known before the experiment to be similar in some way that is expected to affect the response to the treatments - Formed based on important and unavoidable sources of variability (confounding variables) Randomized Block Design - Random assignment of EUs to treatments is carried out separately within each block - Control what you can, block what you can’t, and randomize to create comparable groups Matched Pairs Design - A type of randomized block design - Create blocks by matching pairs of similar EUs - Use chance to decide which member of a pair gets the first treatment - The other member gets the second treatment - Random assignment of subjects to treatments is done within each matched pair - Sometimes, a “pair” consists of just one EU that gets both treatments, one after the other - EU serves as own control → order of treatment scan influence response Establishing Cause & Effect - Well-designed experiment w/randomized treatments and statistically significant results EV occurs before RV Correlation between EV and RV