Foundations of Research 11 Research Sampling. What is the target population for your study? Sampling: Probability Non-Probability Run this as a PowerPoint Show © Dr. David J. McKirnan, 2014 The University of Illinois Chicago McKirnanUIC@gmail.com Do not use or reproduce without permission “Slide show” “run show”. Click through by pressing any key. Focus & think about each point; do not just passively click. Foundations of Research Research Sampling. Define your target population What group do you want to generalize to? What is your sampling frame? Who is / is not a member of the group? 22 Foundations of Research The target population Any study assesses only a sample of the population. We always must generalize from our sample to the larger population. Research often addresses a specific population or subpopulation. Even the census does not enroll 100% of Americans. Our definition of the population to target is an important step. The size and breadth of the population we are generalizing to can affect the Internal or External validity of the study. 33 Foundations of Research The overall flow of sampling decisions. We begin with a decision about who we are interested in. We then make decisions about who is in the target population and how to recruit them. From there we collect our sample, representing a very small % of the larger population. We get results of our experiment or study within our sample. General population or targeted sub-population Sampling frame The sample Inference Study results …and attempt to infer what the entire target population must be like. 44 Foundations of Research Sampling; the population We may design our study to inform us about a very general population. So, a cognitive-neuroscience study may test hypotheses about the brain generally. General population or targeted sub-population Sampling frame Often we study a subgroup, e.g. women, a class of medical patients, the homeless... The sample Many studies compare segments of Inference the general population, e.g., African-Americans & Caucasians. Study results This more narrow focus makes generalization more clear. Psychology studies purport to generalize broadly, but by enrolling only college students or survey volunteers actually generalize to a very small portion of the population. 55 Foundations of Research 66 The Sampling Frame Sampling frame refers to several elements of our study; General population or targeted sub-population • What do we already know about our population? Do we have census or other Sampling frame data? • What criteria do we use to determine who is or is not a member of a target population? If we are studying homelessness, who “counts” as homeless? The sample Inference Study results • How do we contact and recruit participants? Where may a representative sample of our target population be reached? How do we actually approach and enroll them? Foundations of Research 77 Sampling; assignment There are many ways for us to sample our target population(s). The main distinction is Probability vs. Non-Probability sampling. We will address that later. Within our sample we may use Blocking Variables to compare different segments, e.g., age or ethnic groups. For experiments we assign participants to groups, typically using Random Assignment. General population or targeted sub-population Sampling frame The sample Inference Study results Randomized Block Assignment may, for example, randomize within ethnic blocks, to ensure that the same proportion of African-Americans, Caucasians and Latinos are in each group. Foundations of Research 88 Sampling: population inferences In Inferential Research we are not interested in simply describing or analyzing our sample. We use our results to Infer the characteristics of the larger population we sampled from. The quality of our inference is shaped by factors such as how Reliable and Valid our measures are. General population or targeted sub-population Sampling frame The sample Inference From a sampling perspective, Study results Statistical Power is a key element. Power refers to whether we had enough participants to adequately test our hypothesis. With too few participants we may not be able to tell an important effect from simple chance results. We will discuss this in the statistics section. Foundations of Research Whom do you want to generalize to? 99 Mammals Humans All Western people Sampling a broader population (i.e., larger sampling frame) increases external validity. All Americans Young Americans College students This College This class Sampling a more specific or smaller frame generally increases internal validity. Foundations of Research 10 10 Who do you want to generalize to? Samples typically represent targeted sub-populations Demographic or ‘status’ groups; Ethnicity, income or educational groups… Geography; e.g., urban dwellers… Medical / clinical groups; people with a specific diagnosis or condition Behavioral groups Registered voters Home owners Ever used marijuana… Targeting specific groups increases Internal validity by decreasing the complexity of the sample. …but may lessen External validity by narrowing the focus. Groups defined by self-identification or subjective state Views oneself as “highly likely to vote…”. Above a ‘cut point’ on a stress or depression scale “Conservatives” vs. “Liberals” Foundations of Research Who is a group member? Are you between 14 and 30 and have a computer or smart phone available? A = Yes B = No 11 11 Foundations of Research Who is a group member? Do you use Facebook or other social media 5 times a week or more? A = Yes B = No C = Not sure – lost count. 12 12 Foundations of Research Who is a group member? Are you a “Facebook user”? A = Yes B = No C = Not sure – let me Facebook that. 13 13 Foundations of Research Who is a group member? Do you live in Pilson, Humboldt Park or other neighborhood that is mostly Latino? A = Yes B = No C = Maybe – I’m not sure 14 14 Foundations of Research Who is a group member? Do you speak Spanish? A = Yes B = No C =¿cuál era la pregunta? 15 15 Foundations of Research Who is a group member? Are you Latino? A = Yes B = No C = Maybe – I’m not sure 16 16 17 17 Foundations of Research Whom do you want to generalize to: who is in the group? Once we choose our population group, we must decide on criteria for membership… To sample sample social social media media users users do I use a … To Rough demographic criterion? Behavioral criterion (which behavior?) Self-identification? To sample “Latinos”… Is geographic status specific enough? Is Spanish language the defining characteristic? Can / must one call oneself “Latino” (even if you do not speak Spanish…)? Clearer and narrower group criteria increases Internal validity by making the sample more homogeneous. Foundations of Research 18 18 Whom do you want to generalize to: who is in the group? Once we choose our sampling group, we must decide on criteria for membership… To sample social media users do I use a … Demographic criterion? Some of these criteria are easier to reliably measure Behavioral criterion (which behavior?) than others; Self-identification? To sample “Latinos”… Demographic variables are often available in census data Is geographic status specific enough? Behavioral or subjective Is Spanish language the defining characteristic? criteria require direct assessment, and can be less reliable. Can / must one call oneself “Latino” (even if you do not speak Spanish…)? Foundations of Research Whom do you want to generalize to: who is in the group? 19 19 Once we choose our sampling group, we must decide on criteria for membership… To sample social media users do I use a … To Rough demographic criterion? Of course different criteria may yield very Behavioral criterion (which behavior?) different samples. Self-identification? Our choice of sampling criteria must be based on sample “Latinos”… our theory, hypothesis, Is geographic status specific enough? or research question. Is Spanish language the defining characteristic? Can / must one call oneself “Latino” (even if you do not speak Spanish…)? Foundations of Research Whom is in the group: Sampling criteria Demographic or ‘status’ marker Behavioral criterion Subjective / selfidentification Who is a “Latino”? Neighborhood residence? Spanish speaking? Cultural practices? Self-description? # Hours registered Describes occupation as ‘student’ Who is a “Student”? Lives on a campus Who is “gay” or “lesbian”? Lives same-sex 2person household? Sexual or other patterns? Self-identification as gay / lesbian? Pattern of behaviors and feelings? Describes self as “depressed”? Who is “depressed”? Received a diagnosis from MH professional ? Presents at Doctor’s office for general malaise? 20 20 Foundations of Research 21 21 Sampling criteria Demographic or ‘status’ marker Behavioral Subjective / selfidentification Who is “Latino”? Neighborhood residence? Spanish speaking? Cultural practices? Self-description? # Hours registered Describes occupation as ‘student’ Who is a “Student”? Lives on a campus Who is “gay” or “lesbian”? Lives same-sex 2person household? Sexual or other patterns? Self-identification as gay / lesbian? Who is “depressed”? Each criteria may meet the goals of a particular Received a diagnosis Pattern of behaviors hypothesis or empirical question. Describes self as from MH professional ? and feelings? “depressed”? Of course different choices may lead to very Presents at Doctor’s different samples office for general Some criteriamalaise? are easy to assess but may be only approximate Others may require relatively difficult assessments 22 22 Foundations of Research Whom do you want to generalize to: Your “Sampling Frame”. What is known about your larger population? Are there Census or survey data? E.g., are there “population” data on depressed people? Do we know the demographic profiles of Facebook users? Data about your target population will help you determine how well your sample represents that population. What is its size, sub-groups, location…. Where / how can I best recruit members of the population? Will some sub-groups require different recruitment methods than others? Will different recruitment methods be biased in favor of some subgroups? Internet surveys may be biased against older people. Studies that use monetary incentives pull for poorer people. Foundations of Research Overview: From research question to sample What is the research question? Are we describing some natural process? …testing a theory? Target population Sampling frame What is the population of interest? What population does your research address? Whom do you want to generalize to? The sample Category of participant criterion? Demographic or “Status” criteria? Behavioral criterion? Self-Identification, attitudes or beliefs? Operational definition of enrollment criteria? Inclusion criteria Requirements Characteristicsfor that rule a participant eligibility; definition out, of e.g., mental illness, previous your severe population of interest. Exclusion Criteria: exposure to the experiment… e.g., “young adult”, 20/20 vision, alcohol Actual recruitment? user… Concrete (operational) processes to recruit and enroll participants. 23 23 Foundations of Research From theory to sample: Asthma among African-Americans. Study structure & research question: EXAMPLE Adherence to a medication regimen is key to health among people with asthma. Medication adherence is generally low, particularly among AfricanAmerican adolescents, who have high rates of asthma. Self-determination theory proposes that autonomous motivation (being self-directed), self-confidence, and relatedness (family routines & parental support) underlie adherence. This study tests the hypothesis that three variables comprising self-determination theory will be associated with patients’ adherence to medications. Because young African-Americans have a significant health burden from asthma, the study focuses on them. Bruzzese, J., Idalski C, Lam, P, Deborah A.; Naar-King, S. (2014) Adherence to asthma medication regimens in urban African American adolescents: Application of self-determination theory. Health Psychology, Vol. 33.5 (May 2014): 461-464. Article here. 24 24 Foundations of Research From theory to sample: Asthma among African Americans. 25 25 Population of interest? Young African-Americans who suffer from poorly controlled asthma. EXAMPLE Category of participant criterion? Demographic or Status criteria Self-Identification / attitudes not a criterion in this study. African-American adolescents Participating in long-term asthma control study. Behavioral criterion Poorly controlled asthma. Operational definition of enrollment criteria? “Adolescent”: Age 10 – 18. “Poorly controlled”: At least one asthma-related hospitalization or two asthma-related emergency department visits in the last 12 months. Actual recruitment? N = 162 participants recruited from the hospital’s outpatient immunology clinic after an asthma-related clinic visit or hospitalization Foundations of Research Results Having asthma regulation EXAMPLE embedded in the family routine was the only predictor of medication adherence. Multiple regression analysis (all variables are tested simultaneously) 26 26 Foundations of Research Click Having a broad population helps with… A = Avoiding confounds. B = External validity. C = Internal validity. D = Specificity of the design. 27 27 Foundations of Research Click Having a more narrow population helps with… A = Avoiding confounds. B = External validity. C = Internal validity. D = Specificity of the design. 28 28 Foundations of Research Click It is not true that demographic, behavioral or subjective sample criterion… A = Must be based on the theory or hypothesis you are testing. B = Typically leads to the same sample characteristics. C = Requires different measures to screen participants D = Can substantially affect the results of your study. 29 29 Foundations of Research Click What is a sampling frame? A = Sample of the different stimuli that will be used in the experiment. B = The decision to use a behavioral versus a self-identification or subjective criteria for group membership. C = The list of sub-populations we plan to study. D = Census, survey or other data about the target population that allows us to know if our sample is representative. 30 30 Foundations of Research Research sampling Defining your target population Probability & Non-Probability sampling methods. 31 31 Foundations of Research Major forms of sampling Probability (Random) Sampling Recruit (or select) participants to maximize the representativeness of the sample to a known population. Uses some form of random selection. Requires that each member of the population has a known (often equal) probability of being selected. Most externally valid approach to sampling general populations Non-Probability Sampling Use available samples for convenience, or targeted outreach to unusual or small populations. Selection may be either systematic or haphazard, but is not random. Often the most externally valid approach to unusual, small, or extreme groups, or groups where little is known. When used only for convenience it is the least externally valid. 32 32 Foundations of Research 33 33 Watch that word ‘random’! Participant Selection Sample Random Selection or a Random Sample refer to how we recruit participants; who is in the sample. Participant Assignment Experimental Experimental Treatment or Procedures Manipulation Results Group A Procedure Treatment Outcome Group B Procedure Control Outcome (Group C) (Procedure ) (Alternate Treatment?) (Outcome) Random Assignment is how we (should) assign participants to different groups. Foundations of Research Probability / Random Sampling 34 34 • Core feature: all members of the study population have an equal (or known) chance of being sampled • Procedure: Choose participants in a systematic, random fashion. • e.g., every 100th student ID, • Every 1000th person on a voter registration record. • Random digit dialing for telephone surveys. • Advantages: eliminates obvious biases of convenience sampling • Limitations: • May under-sample unusual / hard to reach participants • Some may be unavailable in, e.g., telephone lists, computer files. Foundations of Research Basic Forms of random sampling 35 35 • Simple Random Sampling: All members of the population have about equal chance of selection. • Multi-Stage: Randomly select population units (census tracts, households, schools..), then randomly select individuals within unit. • Stratified or Cluster: Random selection within: • Population blocks; E.g., gender (randomly select 50 women and 50 men), Ethnicity, (…25 African-Americans, Caucasians, Latinos, Asians, etc.) • Venues or events; E.g., randomly select 40 men & women at Lollapalooza… Foundations of Research Simple Random sampling 36 36 Objective: Attempts to truly represent the general population; absolute minimal selection bias. Procedure: Recruitment method where all members of the population have ≈ chance of being selected: Examples: Gallup polls or surveys using random digit dialing “Long form” of the census to a small % of U.S. households Advantages: Most representative sampling frame for the general population Disadvantages: Any recruitment method excludes some people (no telephone, no stable address, etc.). Very expensive for face-to-face (non-telephone or internet) data collection. Foundations of Research Multi-stage random sampling Objectives: Develop a focused & efficient random sample. Use random sampling “stages” to reach hidden, stigmatized or other hard to reach groups. Simple random sampling is optimal. However, it is biased when relying on telephone, internet or similar contact methods. A simple random sample for face-to-face interviews or recruitment for a larger research study (e.g., a public health intervention) is prohibitively expensive. 37 37 Foundations of Research 38 38 Multi-stage random sampling Selecting a random sample of college students: Rather than trying to randomly sample the entire population of students, we narrow our approach. Randomly select subpopulation blocks. General population Universities Classes Randomly select smaller groups within them… Randomly select students within the target classes... To arrive at a efficient random sample. Students Final sample Foundations of Research 39 39 Multi-stage random sampling “Real World” example: NIDA* household surveys of drug use. Randomly select a moderate # of census tracts. Randomly select modest # of households within each target tract… Interview the first adult who answers the phone in each household... To arrive at a efficient random sample. *National Institute on Drug Abuse General population Step 1. Step 2. Step 3 Final sample Foundations of Research Stratified or cluster sampling Objective: Represent every key segment of the population. Procedure: Decide which population segments are important; E.g., ethnic groups, Geographic areas, Self-identification. This decision is based on your hypothesis or empirical question. Randomly select from each segment. Proportionate: Sampling fraction from each segment should approximate the overall population. This is the distribution of ethnicity in the U.S. A stratified sample would randomly select from each ethnic group to approximate this distribution. 40 40 Foundations of Research 41 41 Stratified or cluster sampling Objective: Represent every key segment of the population. Procedure: Decide which population segments are important; Randomly select from each segment. Proportionate: Sampling fraction from each segment should approximate the overall population. Dis-Proportionate: Over-sampling population groups to ensure you have large enough samples of small groups. This estimates the distribution of legal vs. illegal immigrants. To directly compare groups we over-sample illegal immigrants. Foundations of Research Probability Sampling overview 42 42 Core features: Random selection of participants from the population Most externally valid approach. Assumes: A clear sampling frame. Summary All segments of the population are available. Variations: Each member of the entire population has an ≈ chance of being selected. Simple b. Select segments of the population, e.g., census tracts, registered voters. Each segment member has an ≈ chance of being selected. Cluster a. b. Identify clusters, e.g. sports fans at a sports bar. Each cluster member has an ≈ chance of being selected. Stratified a. b. Identify strata, e.g., ethnic groups, gender, age groups.. Randomly select a proportion of each strata. Multi-stage a. Foundations of Research Non-Probability Sampling 43 43 Useful for populations that: Cannot be randomly sampled; “hidden” or difficult to reach. No sampling frame available, such as census data, describing its size, composition, etc. Examples: drug users, recent immigrants, gay men… Likely to misrepresent the population It may be difficult or impossible to detect this misrepresentation. Can be over-sensitive to incentives: paying participants attracts more poor people. “Respondent Driven” sampling (RDS) allows for “targeted” population estimates. Foundations of Research Types of Non-probability Samples Haphazard Modal instance Venue – time / space Multi-frame Snowball / Respondent driven Web Heterogeneity 44 44 Foundations of Research Haphazard Sampling “Man on the street”; recruiting the most easily available participants. Literally recruiting in public places for, e.g., brief interview College psychology majors. Medical / therapy clients in a clinic. Often used for quick interview / attitude studies on current topics. Advantage: Participants are readily available. Problem: No evidence for representativeness. 45 45 Foundations of Research 46 46 Modal Instance Sampling Recruit a “typical case” of a target population. A member of a target population group; …a ‘typical’ college student with debt. These methods often used An injection drug user, homeowner, etc. A person affected by a major event; A 9/11 survivor... A witness to a natural disaster. by journalists. New York Times: “down low” sexual patterns in AfricanAmerican men. Often used to describe an event or way of life, or to generate hypotheses for later research. Advantage: Direct, personal description. Problems: Potentially strong self-selection bias in who volunteers for such personal disclosure Social desirability responding or biased recall may compromise accuracy of answers. Foundations of Research Targeted Multi-frame Sampling Sample a specific, hard to reach group… ..no census or similar data for sampling frame. 47 47 No clear “population blocks” to use in a multi-stage sample, Population spread among many venues or locations… ... more or less sensitive to different recruitment approaches. Multiple (convenience) sampling “frames”: Direct outreach. Newsletters, internet lists, chat rooms Organizations or meeting places. Most common & valid convenience sample Foundations of Research 48 48 Targeted Multi-frame Sampling Sample a specific, hard to reach group No census or similar data for sampling frame. Uses multiple (convenience) sampling “frames”. Example: recruiting gay men for HIV research We use multiple ways to approach men or for them to contact us. Multi-frame sample Direct outreach in bars, clubs, street. Community events (festivals…) Newspaper ads. Flyers in bars & stores. Medical clinics. “Snowball” / word of mouth. Foundations of Research Targeted Multi-frame Sampling Sample a specific, hard to reach group No census or similar data for sampling frame. Uses multiple (convenience) sampling “frames”. Different people are available through different “frames”. This approach recruits a broad cross-section of the population. One of the more common forms of convenience sampling. It also allows us to “test” different sample venues. In my HIV research: Riskier men came from bar or street outreach; Minorities responded more to personal contact such as snowball recruitment. 49 49 Foundations of Research Snowball / Network sampling. An initial set of participants are paid to recruit others, who are paid to recruit still others, etc. Targeted sampling: participants use eligibility criteria to recruit others. Recruit network of “linked” people; A network of, e.g., drug users, gamblers, sexual partnerships… Population of an organization or school… Problem: Sensitive to incentives! Advantage: Access unusual or “hidden” people related by a common behavior or venue. Using participants to recruit others they know may yield a more representative sample. Researchers can collect data on who recruits whom to examine the structure of social networks 50 50 Foundations of Research Example of social network sampling: Bearman et al., Romantic ties among adolescents 51 51 With a number of smaller chains And a small % in 2 to 4 person chains A substantial majority of students are in an extended, linked chain of relationships. From sampling perspective, several “seeds” access most of the population Findings suggest a clear potential for STI transmission. Foundations of Research Web Sampling. Typically highly targeted samples: Gay / bisexual men…; Adolescents…; “Gamers”… Typically access through existing venues: Users of specific web sites; List-serves, e-mail lists; Active recruitment in “chat rooms”. Problem: Inherent bias in computer literacy. Advantage: Cheap large national sample; Access unusual or “hidden” people who reach others via internet. 52 52 Foundations of Research Heterogeneity Sampling. • Sample every sector of a population -- at least several of everyone -- without worrying about proportions. • At least some members of each geographic area • …ethnic group • …behavioral group (voters & non-voters…) • Assume that a few people are a good proxy for the group. Examples: focus groups or qualitative interviews about products, social issues... Problem; Cannot be sure a few people really represent their sub-group. Advantage: At least some representation of all sub-groups. 53 53 Foundations of Research Click A probability sample is… A = Based on some form of random selection. B = Always representative of the population C = Best for any population D = Is usually easier to collect than other sample approaches. 54 54 Foundations of Research Click A Gallup poll or telephone survey is a… A = Simple random sample. B = Multi-stage random sample. C = Social network or “snowball” sample. D = Haphazard sample. 55 55 Foundations of Research Click Respondent-driven sampling, where target people recruit people like them, is a… A = Simple random sample. B = Multi-stage random sample. C = Social network or “snowball” sample. D = Haphazard sample. 56 56 Foundations of Research Click My distributing a survey to this class is a… A = Simple random sample. B = Multi-stage random sample. C = Social network or “snowball” sample. D = Haphazard sample. 57 57 Foundations of Research Click Selecting every 100th registered voter and contacting them for a survey is a… A = Simple random sample. B = Multi-stage random sample. C = Social network or “snowball” sample. D = Haphazard sample. 58 58 Foundations of Research Click Randomly selecting classes across the university, than sampling each 3rd person, is a… A = Simple random sample. B = Multi-stage random sample. C = Social network or “snowball” sample. D = Haphazard sample. 59 59 Foundations of Research Click A non-probability sample… A = Is perfectly OK if you have limited resources. B = Just consists of grabbing the most convenient possible participants. C = Is never adequate to generalize from. D = Can be best for hard to reach or unusual participants. 60 60 Foundations of Research Sampling overview 61 61 Who do you want to generalize to? Summary Who is the target population? broad – external validity narrow – internal validity How do you decide who is a member? demographic / behavioral criteria? subjective / attitudinal? What do you know about the population already – what is the “sampling frame”. Is a Probability or random sample possible? “Hidden” population? Socially undesirable research topic? Easily available via telephone, door-to-door? Sampling frame adequate to choose selection method? Foundations of Research Overview, 2 Summary Types of Non-probability Samples: Haphazard, Modal instance, Venue – time / space, Multi-frame, Snowball / Respondent driven, Web, Quota, Heterogeneity. 62 62 Foundations of Research 63 63 Overview, 3 Probability sampling Most externally valid Summary Assumes: Clear sampling frame Population is available Less externally valid for hidden groups. Non-probability sampling Less externally valid High “convenience” Best when: No clear sampling frame Hidden / avoidant population.