Foundations of Research 1 12. Research Sampling. What is the target population for your study? Probability & Non-Probability sampling methods. Run this as a PowerPoint Show “Slide show” “run show”. Click through by pressing any key. Focus & think about each point; do not just passively click. © Dr. David J. McKirnan, 2014 The University of Illinois Chicago McKirnanUIC@gmail.com Do not use or reproduce without permission Foundations of Research 12. Research Sampling. 2 Define your target population What group do you want to generalize to? What is your sampling frame? Who is / is not a member of the group? Shutterstock Foundations of Research Sampling Any study assesses only a sample of the population. Even the census does not enroll 100% of Americans. We always must generalize from our sample to the larger population. Research often addresses a specific population or subpopulation. Our definition of the population to target is an important step. There are many different ways we may collect a research sample. The size and breadth of the population we are generalizing to can affect the Internal or External validity of the study. We will cover these topics here. 3 Foundations of Research The overall flow of sampling decisions. We begin with a decision about who we are interested in. General population or targeted sub-population We then make decisions about who is in the target population and how to recruit them. From there we collect our sample, typically a very small % of the larger population. We get results of our experiment or study within our sample. Sampling frame The sample Inference Study results …and attempt to infer what the entire target population must be like. 4 Foundations of Research Sampling; the population We may design our study to inform us about a very general population. So, a cognitive-neuroscience study may test hypotheses about the brain generally. General population or targeted sub-population Sampling frame Often we study a subgroup, e.g. women, a class of medical patients, the homeless... The sample Many studies compare segments of Inference the general population, e.g., African-Americans & Caucasians. Study results This more narrow focus makes generalization more clear. Many Psychology studies purport to generalize broadly, but by enrolling only college sophomores actually generalize to a very small portion of the population. 5 Foundations of Research 6 The Sampling Frame Sampling frame refers to several elements of our study; General population or targeted sub-population • What do we already know about our population? Do we have census or other Sampling frame data? • What criteria do we use to determine who is or is not a member of a target population? If we are studying homelessness, who “counts” as homeless? The sample Inference Study results • How do we contact and recruit participants? Where may a representative sample of our target population be reached? How do we actually approach and enroll them? Foundations of Research 7 Sampling; assignment There are many ways for us to sample our target population(s). The main distinction is Probability (or random) vs. Non-Probability sampling. We will address that later. General population or targeted sub-population Sampling frame Within our sample we may use Blocking Variables to compare different segments, e.g., age or ethnic groups. The sample For experiments we assign participants to groups, typically using Random Assignment. Study results Inference Randomized Block Assignment may, for example, randomize within ethnic blocks, to ensure that the same proportion of African-Americans, Caucasians and Latinos are in each group. Foundations of Research 8 Sampling: population inferences In Inferential Research we are not interested in simply describing or analyzing our sample. We use our results to Infer the characteristics of the larger population we sampled from. The quality of our inference is shaped by factors such as how Reliable and Valid our measures are. General population or targeted sub-population Sampling frame The sample Inference From a sampling perspective, Study results Statistical Power is a key element. Power refers to whether we had enough participants to adequately test our hypothesis. With too few participants we may not be able to tell an important effect from simple chance results. We will discuss this in the statistics section. Foundations of Research Who do you want to generalize to? 9 Mammals Humans All Western people Sampling a broader population (i.e., larger sampling frame) increases external validity. All Americans Young Americans College students This College This class Sampling a more specific or smaller frame generally increases internal validity. Foundations of Research 10 Who do you want to generalize to? Samples typically represent targeted sub-populations Demographic or ‘status’ groups; Ethnicity, income or educational groups… Geography; e.g., urban dwellers… Medical / clinical groups; people with a specific diagnosis or condition Behavioral groups Registered voters Home owners Ever used marijuana… Targeting specific Click for test. groups increases Internal validity by decreasing the complexity of the sample. …but may lessen External validity by Shutterstock.com narrowing the focus. Shutterstock.com Groups defined by self-identification or subjective state Views oneself as “highly likely to vote…”. Above a ‘cut point’ on a stress, depression, or alcohol use scale “Conservatives” vs. “Liberals” Foundations of Research Research samples & validity EXAMPLE Clinical drug trials illustrate the conflict between internal v. external validity in sampling. People with diverse symptoms and backgrounds see physicians for depression. To enhance internal validity drug researchers use exclusion criteria to select only participants who fit a specific definition of depression Zimmerman et al. suggest that too many exclusion criteria compromises the validity of this research area. (click image for article) Zimmerman, M.l, Mattia, J.I., & Posternak, M.A. (2002). Are Subjects in Pharm-acological Treatment Trials of Depression Representative of Patients in Routine Clinical Practice? Am J Psychiatry, 159, 469–473. 11 Foundations of ResearchExclusion criteria & validity EXAMPLE The study begins with a large N of people self-referred for depression They exclude those with serious mental illness, drug abuse or personality disorder… …whose symptoms are not severe enough, are suicidal, or who have other affective disorders.. …whose symptoms are too recent OR too long-standing… …and end up with a small, carefully selected sub-set of patients (8.4% of general depression patients). 12 EXAMPLE Foundations of ResearchExternal vs. internal validity in sampling Applying rigorous study selection criteria for drug trials excludes the great majority of routine depression patients. Rigorous participant selection for internal validity seriously compromises external validity in these studies. This leaves the actual usefulness of anti-depressant (and other) medications for the general population in doubt. To be useful research must balance the need for careful subject selection with the need for representativeness 13 Foundations of Research 14 Who is a group member? Are you between 14 and 30 and have a computer or smart phone available? A = Yes B = No imgur.com, public domain Foundations of Research Who is a group member? Do you use Facebook or other social media 5 times a week or more? A = Yes B = No C = Not sure – lost count. Is Facebook making us lonely? Click. 15 Foundations of Research Who is a group member? Are you a “Facebook user”? A = Yes B = No C = Not sure – let me Facebook that. 16 Foundations of Research 17 Who is a group member? A = Yes B = No C = Maybe – I’m not sure Click for a Washington Post story; Who is Latino? Shutterstock.com Do you live in a neighborhood or town that is mostly Latino? Foundations of Research Who is a group member? Do you speak Spanish? A = Yes B = No C =¿cuál era la pregunta? 18 Foundations of Research Who is a group member? Are you Latino? A = Yes B = No C = Maybe – I’m not sure 19 Foundations of Research 20 Define the target population Who do you want to generalize to: who is in the group? Once we choose our sampling group, we must decide on criteria for membership… To sample social media users do I use a … Rough demographic criterion? Behavioral criterion (which behavior?) Self-identification? To sample “Latinos”… Clearer and narrower group criteria increases Internal validity by making the sample more homogeneous. Is geographic status specific enough? Is Spanish language the defining characteristic? Can / must one call oneself “Latino” (even if you do not speak Spanish…)? Shutterstock Foundations of Research 21 Define the target population Who do you want to generalize to: who is in the group? Once we choose our sampling group, we must decide on criteria for membership… To sample social media users do I use a … Rough demographic criterion? Some of these criteria are easier to reliably measure Behavioral criterion (which behavior?) than others; Self-identification? To sample “Latinos”… Demographic variables are often available in census data Is geographic status specific enough? Behavioral or subjective Is Spanish language the defining characteristic? criteria require direct assessment, and can be less reliable. Can / must one call oneself “Latino” (even if you do not speak Spanish…)? Foundations of Research Define the target population 22 Who do you want to generalize to: who is in the group? Once we choose our sampling group, we must decide on criteria for membership… To sample social media users do I use a … To Rough demographic criterion? Of course different criteria may yield very Behavioral criterion (which behavior?) different samples. Self-identification? Our choice of sampling criteria must be based on sample “Latinos”… our theory, hypothesis, Is geographic status specific enough? or research question. Is Spanish language the defining characteristic? Can / must one call oneself “Latino” (even if you do not speak Spanish…)? Foundations of Research 23 Sampling criteria Demographic or ‘status’ marker Behavioral Subjective / selfidentification Who is a “Latino”? Neighborhood residence? Spanish speaking? Cultural practices? Self-description? # Hours registered Describes occupation as ‘student’ Who is a “Student”? Lives on a campus Who is “gay” or “lesbian”? Lives same-sex 2person household? Sexual or other patterns? Self-identification as gay / lesbian? Pattern of behaviors and feelings? Describes self as “depressed”? Who is “depressed”? Received a diagnosis from MH professional ? Presents at Doctor’s office for general malaise? Foundations of Research 24 Sampling criteria Demographic or ‘status’ marker Behavioral Subjective / selfidentification Who is “Latino”? Neighborhood residence? Spanish speaking? Cultural practices? Self-description? # Hours registered Describes occupation as ‘student’ Who is a “Student”? Lives on a campus Who is “gay” or “lesbian”? Lives same-sex 2person household? Sexual or other patterns? Self-identification as gay / lesbian? Who is “depressed”? Each criteria may meet the goals of a particular Received a diagnosis Pattern of behaviors hypothesis or empirical question. Describes self as from MH professional ? and feelings? “depressed”? Of course different choices may lead to very Presents at Doctor’s different samples office for general Some criteriamalaise? are easy to assess but may be only approximate Others may require relatively difficult assessments 25 Foundations of Research Who do you want to generalize to: Your “Sampling Frame”. What is known about your larger population? Are there Census or survey data? E.g., are there “population” data on depressed people? Do we know the demographic profiles of Facebook users? Data about your target population will help you determine how well your sample represents that population. What is its size, sub-groups, location…. Where / how can I best recruit members of the population? Will some sub-groups require different recruitment methods than others? Will different recruitment methods be biased in favor of some subgroups? Internet surveys may be biased against older people. Studies that use monetary incentives pull for poorer people. Foundations of Research Overview: From research question to sample What is the research question? Are we describing some natural process? …testing a theory? General population or targeted sub-population Sampling frame What is the population of interest? What population does your research address? Whom do you want to generalize to? The sample Category of participant criterion? Demographic or “Status” criteria? Behavioral criterion? Self-Identification, attitudes or beliefs? Operational definition of enrollment criteria? Inclusion & Specific measures that define who does / does not qualify for enrollment. Exclusion Criteria: Actual recruitment? Concrete (operational) processes to recruit and enroll participants. 26 Foundations of Research From theory to sample: Asthma among African-Americans. Study structure & research question: EXAMPLE Adherence to a medication regimen is key to health among people with asthma. Medication adherence is generally low, particularly among AfricanAmerican adolescents, who have high rates of asthma. Self-determination theory proposes that autonomous motivation (being self-directed), self-confidence, and relatedness (family routines & parental support) underlie adherence. This study tests the hypothesis that three variables comprising self-determination theory will be associated with patients’ adherence to medications. Because young African-Americans have a significant health burden from asthma, the study focuses on them. Bruzzese, J., Idalski C, Lam, P, Deborah A.; Naar-King, S. (2014) Adherence to asthma medication regimens in urban African American adolescents: Application of self-determination theory. Health Psychology, Vol. 33.5 (May 2014): 461-464. Article here. 27 Foundations of Research From theory to sample: Asthma among African Americans. 28 Population of interest? Young African-Americans who suffer from poorly controlled asthma. EXAMPLE Category of participant criterion? Demographic or Status criteria African-American adolescents Self-Identification / attitudes Poorly controlled asthma. not a criterion in this study. Behavioral criterion Already participating in long-term asthma control study. Operational definition of enrollment criteria? “Adolescent”: Age 10 – 18. “Poorly controlled”: At least one asthma-related hospitalization or two asthma-related emergency department visits in the last 12 months. Actual recruitment? N = 162 participants recruited from the hospital’s outpatient immunology clinic after an asthma-related clinic visit or hospitalization Foundations of Research 29 Results EXAMPLE Having asthma regulation embedded in the family routine was the only predictor of medication adherence. Multiple regression analysis (all variables are tested simultaneously) Shutterstock.com Foundations of Research 30 Research sampling Defining your target population Probability & Non Probability sampling methods. Shutterstock.com Foundations of Research Major forms of sampling Probability (Random) Sampling Recruit (or select) participants to maximize the representativeness of the sample to a known population. Uses some form of random selection. Requires that each member of the population has a known (often equal) probability of being selected. Most externally valid approach to sampling general populations Non-Probability Sampling Use available samples for convenience, or targeted outreach to unusual or small populations. Selection may be either systematic or haphazard, but is not random. Often the most externally valid approach to unusual, small, or extreme groups, or groups where little is known. When used only for convenience it is the least externally valid. 31 Foundations of Research 32 Watch that word ‘random’! Participant Selection Sample Random Selection or a Random Sample refer to how we recruit participants; who is in the sample. Participant Assignment Experimental Experimental Treatment or Procedures Manipulation Results Group A Procedure Treatment Outcome Group B Procedure Control Outcome (Group C) (Procedure ) (Alternate Treatment?) (Outcome) Random Assignment is how we (should) assign participants to different groups. Foundations of Research Probability / Random Sampling 33 • Core feature: all members of the study population have an equal (or known) chance of being sampled • Procedure: Choose participants in a systematic, random fashion. • e.g., every 100th student ID, • Every 1000th person on a voter registration record. • Random digit dialing for telephone surveys. • Advantages: eliminates obvious biases of convenience sampling • Limitations: • May under-sample unusual / hard to reach participants • Some may be unavailable in, e.g., telephone lists, computer files. Foundations of Research Basic Forms of random sampling • Simple Random Sampling: Select a specific % of a target population; all members of population have about equal chance of selection. • Multi-Stage: Randomly select population units (census tracts, households, schools..), then randomly select individuals within unit. • Stratified: Random within population sub-blocks, e.g., gender (randomly select 50 women and randomly select 50 men), ethnicity, etc. • Cluster: Random within (potentially convenience) clusters, e.g., specific locations or “venues”, events, times of day, etc. 34 Foundations of Research Simple Random sampling Objective: Attempts to truly represent the general population; absolute minimal selection bias. Procedure: Recruitment method where all members of the population have ≈ chance of being selected: Examples: Polls or surveys using random digit dialing. 35 Foundations of Research Simple Random sampling Objective: Attempts to truly represent the general population; absolute minimal selection bias. Procedure: Recruitment method where all members of the population have ≈ chance of being selected: Polls (e.g., Gallup) or surveys using random digit dialing. Examples: Dialing random phone numbers eliminates any selection bias on the part of the researcher; calls are made by a computer. Everyone who has a phone has an = probability of being selected. Of course not everyone has a phone, many people block calls not in their contact list, and so on. These and other demographic trends induce an unknown degree of sampling bias. 36 Foundations of Research Simple Random sampling Objective: Attempts to truly represent the general population; absolute minimal selection bias. Procedure: Recruitment method where all members of the population have ≈ chance of being selected: Examples: Polls (e.g., Gallup) or surveys using random digit dialing. “American Community Survey”; census based random sample of households. The census is designed to assess literally every American household. A small sample (2.5%) of census respondents receive a longer survey of demographics and social patterns. Despite being a mandatory, highly systematic sample, not everyone completes the either the census or for the sub-sample. Both the poor or homeless and the very wealthy can be difficulty to contact. 37 Foundations of Research Simple Random sampling Objective: Attempts to truly represent the general population; absolute minimal selection bias. Procedure: Recruitment method where all members of the population have ≈ chance of being selected: Polls Examples: Census, “American Community Survey”… Advantages: Most representative sampling frame for the general population Disadvantages: Any recruitment method excludes some people (no telephone, no stable address, etc.). Very expensive for face-to-face (non-telephone or internet) data collection. 38 Foundations of Research Multi-stage random sampling Objectives: Develop a focused & efficient random sample. Use random sampling “stages” to reach hidden, stigmatized or other hard to reach groups. Simple random sampling is optimal. However, it is biased when relying on telephone, internet or similar contact methods. A simple random sample for face-to-face interviews or recruitment for a larger research study (e.g., a public health intervention) is prohibitively expensive. 39 Foundations of Research 40 Multi-stage random sampling Objective: Focused & efficient random sample. General population Procedure: Use successive levels of random selection to narrow the sample. 1st level random selection From the General Population, randomly select sub-population blocks. 2nd level random selection From those blocks randomly select smaller groups… And smaller groups withing those blocks... To arrive at a efficient random sample. 3rd level random selection Final sample Foundations of Research 41 Multi-stage random sampling Selecting a random sample of college students: Rather than trying to randomly sample the entire population of students, we narrow our approach. Randomly select subpopulation blocks. General population Universities Classes Randomly select smaller groups within them… Randomly select students within the target classes... To arrive at a efficient random sample. Students Final sample Foundations of Research 42 Multi-stage random sampling “Real World” example: NIDA* household surveys of drug use. Randomly select a moderate # of census tracts. Randomly select modest # of households within each target tract… Interview the first adult who answers the phone in each household... To arrive at a efficient random sample. *National Institute on Drug Abuse General population Step 1. Step 2. Step 3 Final sample Foundations of Research Multi-stage random sampling Multi-stage sampling is also useful for “hard to reach” population where we have no sampling frame: For the student or drug study we know what the population looks like: We have census data or university & class lists. This gives us known population blocks that we can randomize around. For a “hidden” population where we have no sampling frame… We do not know the size or geographic distribution of the group. We do not have simple population blocks available. …we must often resort to convenience sampling in venues (locations, events) where the population may be available. This can lead to substantial bias due to non-random selection. 43 Foundations of Research 44 Multi-stage random sampling How to we randomize within the constraints of convenience (venue) sampling? Population of unknown size & distribution. This example is from the “CITY” study of HIV risk among youth. Randomize Venues In each of 4 cities, randomly select bars, clubs & other venues attended by young gay men.* Randomly select days & times to recruit in selected venues. Randomize Day & Time Randomly select people Final quasi Randomly approach every random sample 4th person who enters the venue for an interview * Use qualitative interviews / direct observation to determine which venues are relevant. Foundations of Research Multi-stage random sampling “CITY” study of HIV risk among youth. Many important research questions involve people who would not be found in simple random selection methods. With “hidden” or stigmatized groups sampling must be very targeted. Targeted sampling can be biased toward members who are easiest to contact. This approach adds random selection to what ordinarily would be a (biased) convenience sample. 45 Foundations of Research Multi-stage random sampling “CITY” study of HIV risk among youth. This approach adds random selection to what ordinarily would be a (biased) convenience sample. Of course the sample is biased by larger issues: Not all gay youth attend venues relevant to them. Concern over being identified as gay may lead some # of men to refuse the study. Despite its limitations, this example shows how creative sampling approaches can provide a less biased sample of a difficult group to reach. 46 Foundations of Research Stratified or cluster sampling Objective: Represent every key segment of the population. Procedure: Decide which population segments are important; E.g., ethnic groups, Geographic areas, Self-identification. This decision is based on your hypothesis or empirical question. Randomly select from each segment. Proportionate: Sampling fraction from each segment should approximate the overall population. This is the distribution of ethnicity in the U.S. A stratified sample would randomly select from each ethnic group to approximate this distribution. 47 Foundations of Research 48 Stratified or cluster sampling Objective: Represent every key segment of the population. Procedure: Decide which population segments are important; Randomly select from each segment. Proportionate: Sampling fraction from each segment should approximate the overall population. Dis-Proportionate: Over-sampling population groups to ensure you have large enough samples of small groups. This estimates the distribution of legal vs. illegal immigrants. To directly compare groups we over-sample illegal immigrants. Foundations of Research Probability Sampling overview 49 Core features: Random selection of participants from the population Most externally valid approach. Assumes: A clear sampling frame. Summary All segments of the population are available. Variations: Each member of the entire population has an ≈ chance of being selected. Simple b. Select segments of the population, e.g., census tracts, registered voters. Each segment member has an ≈ chance of being selected. Cluster a. b. Identify clusters, e.g. sports fans at a sports bar. Each cluster member has an ≈ chance of being selected. Stratified a. b. Identify strata, e.g., ethnic groups, gender, age groups.. Randomly select a proportion of each strata. Multi-stage a. Foundations of Research Non-Probability Sampling 50 Useful for populations that: Cannot be randomly sampled; “hidden” or difficult to reach. No sampling frame available, such as census data, describing its size, composition, etc. Examples: drug users, recent immigrants, gay men… Likely to misrepresent the population It may be difficult or impossible to detect this misrepresentation. Can be over-sensitive to incentives: paying participants attracts more poor people. “Respondent Driven” sampling (RDS) allows for “targeted” population estimates. Foundations of Research Types of Non-probability Samples Haphazard Modal instance Venue – time / space Multi-frame Snowball / Respondent driven Web Quota Heterogeneity 51 Foundations of Research Haphazard Sampling “Man on the street”; recruiting the most easily available participants. Literally recruiting in public places for, e.g., brief interview College psychology majors. Medical / therapy clients in a clinic. Often used for quick interview / attitude studies on current topics. Advantage: Participants are readily available. Problem: No evidence for representativeness. 52 Foundations of Research Modal Instance Sampling Recruit a “typical case” of a target population. A member of a target population group; …a ‘typical’ college student with debt. An injection drug user, homeowner, etc. A person affected by a major event; A 9/11 survivor... A witness to a natural disaster. Often used to describe an event or way of life, or to generate hypotheses for later research. Advantage: Direct, personal description. Problems: Potentially strong self-selection bias in who volunteers for such personal disclosure “Social desirability responding” – presenting oneself in a positive light – or biased recall may compromise accuracy of answers. 53 Foundations of Research Haphazard & Modal Instance Sampling Both Haphazard & Modal Instance Sampling are used for Case Studies. Researchers elicit in-depth personal accounts of an event or life pattern. Multiple Case Studies interview a set of individuals who express the phenomenon, e.g., members of a key group… In the Qualitative Research module we will discuss how in-depth interviews can be analyzed. These methods, plus direct observation, are often used by journalists. For an interesting example see a New York Times discussion of the “down low” phenomenon among African-American, bisexually active men. 54 Foundations of Research Venue and Time / Space Sampling 55 Assumes that population group members are well represented in particular places & times (“venues”). Used to sample a specific, well-defined, often hard to reach group. Venue sampling uses “Intercept” methods to reach participants. Outreach workers use a standard recruitment script to approach potential participants. Data may be collected on site via a brief interview, such as shopping mall intercepts. Often the contact is used to collect (or distribute) contact information for later participation Time / Space randomization lessens bias due to choice of venue: Randomly approach different venues at different times Randomly select participants within the venue (e.g., every 4th person…) These strategies must be based on a clear epidemiological or theory question. Foundations of Research 56 Example of Venue sampling Recruiting gay or bisexual men for HIV research can present challenges. Example Simple recruitment methods for any targeted population include: • Newspaper or internet ads, David J McKirnan • Flyers in health clinics or popular stores.. Many gay/bisexual men – particularly younger & minority men – do not respond to simple methods… • …due to distrust or disenfranchisement from the health care system, • Unwillingness to disclose sexual orientation in other than ‘gay friendly’ settings, • The perception that research is irrelevant to them or may harm them. • Direct personal contact within gay/bisexual venues such as clubs can help break down barriers to recruitment. Foundations of Research Outreach / venue sampling Project MIX was a national safer sex intervention study sponsored by the Centers for Disease Control. Example It had a complex sampling frame: • ⅓ each African-American, Latino & Caucasian, • ½ HIV infected / uninfected within each ethnic group. Outreach workers recruited in multiple venues; • Bars and clubs • Public areas; parks and neighborhoods where men congregated • Community events, private ‘house’ parties, etc. Outreach workers were indigenous, i.e. gay/bisexual men from the community. • They were able to approach men in the target groups. • They could explain the study and foster trust & cooperation. 57 Foundations of Research Outreach lead sheet 58 To broaden the sample, recruitment stimuli show different ethnicities, and cite a range of potentially eligible behaviors… Images used with permission, David McKirnan, Project MIX Principle Investigator. Foundations of Research Targeted Multi-frame Sampling Often used to sample a specific, hard to reach group… ..with no census or similar data available to develop a sampling frame. No clear “population blocks” to use in a multi-stage sample, The population spread among many venues or locations… ... and population segments are more or less sensitive to any specific recruitment approach. Based on preliminary qualitative work (interviews, direct observations…) we develop multiple sampling “frames”: Direct outreach. Newsletters, internet lists, chat rooms Organizations or meeting places. Most common & valid convenience sample 59 Foundations of Research 60 Targeted Multi-frame Sampling Sample a specific, hard to reach group No census or similar data for sampling frame. Uses multiple (convenience) sampling “frames”. Example: recruiting gay men for HIV research We use multiple ways to approach men or for them to contact us. Multi-frame sample Direct outreach in bars, clubs, street. Community events (festivals…) Newspaper ads. Flyers in bars & stores. Medical clinics. “Snowball” / word of mouth. Foundations of Research Targeted Multi-frame Sampling 61 Each sampling “frame” is a convenience approach – we typically cannot randomly select participants. By using multiple frames we can recruit a broad cross-section of the population. This also allows us to “test” different sampling approaches or venues. In HIV research: Do riskier men come from one type of sampling frame? E.g. bar venues… Are there ethnic or other differences in the participants who are recruited in one type of venue or another… These data can help us better understand or sample and study results. Foundations of Research Snowball / “respondent Driven” Sampling 62 Early participants are paid to recruit others, who recruit others, etc. Choice of seeds. Form of targeted sampling: Recruit network of “linked” people tracked by referrals Problem: Eligibility criteria Sensitive to incentives! Advantage: Access unusual or “hidden” people related by a common behavior. With enough “generations” of links can well represent a target population. Often part of multi-frame approach. With RDS can show “chain” of referrals / links. Useful for people who mistrust research or where personal contact is necessary for recruitment (HIV, drug use). Portrays “chain” of influence or, e.g., infectious disease. Foundations of Research 63 An initial set of participants are recruited according to a specific set of inclusion / exclusion criteria: Snowball / Respondent Driven sampling (RDS). Characteristics of the initial participants – the “seeds” of the network – help determine who subsequent participants will be. The seeds may be, e.g., injection drug users (inclusion criterion) who do not live outside a specific geographic area (exclusion criterion). Initial participants are paid to recruit others, who are paid to recruit still others … all using the same in(ex)clusion criteria. Over “waves” of recruitment RDS recruits a network of “linked” people; A network of, e.g., musicians, gamblers, sexual partnerships… Population of an organization or school… With enough waves of recruitment, Snowball / RDS can produce accurate population estimates for a given subpopulation. Foundations of Research 64 Snowball / Respondent Driven sampling (RDS). Snowball / RDS can also provide insights into links among participants: Since participants recruit each other, we can track who is linked to whom. Basic study measurements can help us determine the characteristics of people who are linked (e.g., sexually) with each other. We can also assess people who are “nodes” in social networks: who knows (recruits) a lot v. few people? What influence do those people have? Problem: Sensitive to incentives; Advantage: Access unusual or “hidden” people related by a common behavior or venue. Using participants to recruit others they know may yield a more representative sample. Researchers can collect data on who recruits whom to examine the structure of social networks Foundations of Research RDS coupon examples 65 These are examples of cards used in an RDS recruitment among injection drug users. Initial “seeds” are interviewed, then given 5 cards to distribute to people who meet the eligibility requirements. 2nd “wave” participants receive the same card set. Participants get $30 for the study interview, and $20 for each person they refer. Heckathorn, D.D. & Magnani, R. (2004). Snowball and RespondentDriven Sampling. In: Behavioral Surveillance Surveys: Guidelines for Repeated Behavioral Surveys in Populations at Risk of HIV Foundations of Research RDS; chain description Heckathorn, D.D. & Magnani, R. (2004). Snowball and Respondent-Driven Sampling. In: Behavioral Surveillance Surveys: Guidelines for Repeated Behavioral Surveys in Populations at Risk of HIV. 66 Foundations of Research Example of social network sampling: Bearman et al., Romantic ties among adolescents 67 With a number of smaller chains And a small % in 2 to 4 person chains A substantial majority of students are in an extended, linked chain of relationships. From sampling perspective, several “seeds” access most of the population Findings suggest a clear potential for STI transmission. Bearman, P. et al., American Journal of Sociology, Volume 110 Number 1 (July 2004): 44–91. Click image for article. Foundations of Research Non-Probability methods Quota Sampling Similar to cluster sampling, except you cannot randomly sample each Select people non-randomly according to quotas population segment. Must have clear theory / research question to pick relevant population characteristic(s). Proportional quota sampling • Represent major characteristics of a population. If gender is important, and the proportion of women :: men in your population = 65% :: 35%, the sample must meet that quota. Non-proportional quota sampling • Sample enough members of each group to test hypothesis, even if the sample is not proportional. (e.g., recruit 50 women & 50 men, even though the real proportion is 65::35). • Helps assure that you have good representation of smaller population groups. 68 Foundations of Research Non-Probability methods Web sampling Typically highly targeted samples Gay / bisexual men… Adolescents… “Gamers”… Typically access through existing venues: Users of specific web sites List-serves, e-mail lists Active recruitment in “chat rooms” Problem: Inherent bias in computer literacy(?) Advantage: Cheap large national sample Access unusual or “hidden” people who reach others via internet 69 Foundations of Research Non-Probability methods; Heterogeneity Sampling 70 • Sample every sector of a population -- at least several of everyone -- without worrying about proportions. • At least some members of each geographic area • …ethnic group • …behavioral group (voters & non-voters…) • Assume that a few people are a good proxy for the group. Examples: focus groups or qualitative interviews about products, social issues... Problem; Cannot be sure a few people really represent their sub-group. Advantage: At least some representation of all subgroups. Foundations of Research Click A probability sample is… A = Based on some form of random selection. B = Always representative of the population C = Best for any population D = Is usually easier to collect than other sample approaches. 71 Foundations of Research Click A Gallup poll or telephone survey is a… A = Simple random sample. B = Multi-stage random sample. C = Social network or “snowball” sample. D = Haphazard sample. 72 Foundations of Research Click Respondent-driven sampling, where target people recruit people like them, is a… A = Simple random sample. B = Multi-stage random sample. C = Social network or “snowball” sample. D = Haphazard sample. 73 Foundations of Research Click My distributing a survey to this class is a… A = Simple random sample. B = Multi-stage random sample. C = Social network or “snowball” sample. D = Haphazard sample. 74 Foundations of Research Click Selecting every 100th registered voter and contacting them for a survey is a… A = Simple random sample. B = Multi-stage random sample. C = Social network or “snowball” sample. D = Haphazard sample. 75 Foundations of Research Click Randomly selecting classes across the university, than sampling each 3rd person, is a… A = Simple random sample. B = Multi-stage random sample. C = Social network or “snowball” sample. D = Haphazard sample. 76 Foundations of Research Click A non-probability sample… A = Is perfectly OK if you have limited resources. B = Just consists of grabbing the most convenient possible participants. C = Is never adequate to generalize from. D = Can be best for hard to reach or unusual participants. 77 Sampling overview Foundations of Research 78 Who do you want to generalize to? Summary Who is the target population? broad – external validity narrow – internal validity How do you decide who is a member? demographic / behavioral criteria? subjective / attitudinal? What do you know about the population already – what is the “sampling frame”. Is a Probability or random sample possible? “Hidden” population? Socially undesirable research topic? Easily available via telephone, door-to-door? Sampling frame adequate to choose selection method? Foundations of Research Overview, 2 Summary Types of Non-probability Samples Haphazard Modal instance Venue – time / space Multi-frame Snowball / Respondent driven Web Quota Heterogeneity 79 Foundations of Research 80 Overview, 3 Probability sampling Most externally valid Summary Assumes: Clear sampling frame Population is available Less externally valid for hidden groups. Non-probability sampling targeted / multi-frame snowball quota, etc. Less externally valid High “convenience” Best when: No clear sampling frame Hidden / avoidant population.