Sampling. - Psychology 242, Research Methods in Psychology

advertisement
Foundations of
Research
1
12. Research Sampling.
What is the target population
for your study?
Probability & Non-Probability
sampling methods.
Run this as a PowerPoint Show

“Slide show”  “run show”.

Click through by pressing any key.

Focus & think about each point; do not
just passively click.
© Dr. David J. McKirnan, 2014
The University of Illinois Chicago
McKirnanUIC@gmail.com
Do not use or reproduce without
permission
Foundations of
Research
12. Research Sampling.
2

 Define your target
population
 What group do you
want to generalize to?
 What is your sampling
frame?
 Who is / is not a
member of the group?
Shutterstock
Foundations of
Research

Sampling
Any study assesses only a sample of the population.

Even the census does not enroll 100% of Americans.


We always must generalize from our sample to the larger
population.
Research often addresses a specific population or subpopulation.

Our definition of the population to target is an important step.

There are many different ways we may collect a research
sample.

The size and breadth of the population we are generalizing
to can affect the Internal or External validity of the study.

We will cover these topics here.
3
Foundations of
Research
The overall flow of sampling decisions.
We begin with a decision
about who we are
interested in.
General population or
targeted sub-population
We then make decisions about who
is in the target population and how
to recruit them.
From there we collect our sample,
typically a very small % of the larger
population.
We get results of our experiment or
study within our sample.
Sampling frame
The sample
Inference
Study results
…and attempt to infer what the entire target population
must be like.
4
Foundations of
Research
Sampling; the population
We may design our study to
inform us about a very
general population.
So, a cognitive-neuroscience study
may test hypotheses about the
brain generally.
General population or
targeted sub-population
Sampling frame
Often we study a subgroup, e.g.
women, a class of medical patients,
the homeless...
The sample
Many studies compare segments of
Inference
the general population, e.g.,
African-Americans & Caucasians.
Study results
This more narrow focus makes
generalization more clear.
Many Psychology studies purport to generalize broadly, but by
enrolling only college sophomores actually generalize to a very
small portion of the population.
5
Foundations of
Research
6
The Sampling Frame
Sampling frame refers to
several elements of our
study;
General population or
targeted sub-population
• What do we already know about
our population?
 Do we have census or other
Sampling frame
data?
• What criteria do we use to
determine who is or is not a
member of a target population?

If we are studying
homelessness, who “counts” as
homeless?
The sample
Inference
Study results
• How do we contact and recruit participants?

Where may a representative sample of our target population be reached?

How do we actually approach and enroll them?
Foundations of
Research
7
Sampling; assignment
There are many ways for us
to sample our target
population(s).
The main distinction is Probability
(or random) vs. Non-Probability
sampling. We will address that
later.
General population or
targeted sub-population
Sampling frame
Within our sample we may use
Blocking Variables to compare
different segments, e.g., age or
ethnic groups.
The sample
For experiments we assign
participants to groups, typically
using Random Assignment.
Study results
Inference
Randomized Block Assignment may, for example, randomize within
ethnic blocks, to ensure that the same proportion of African-Americans,
Caucasians and Latinos are in each group.
Foundations of
Research
8
Sampling: population inferences
In Inferential Research we
are not interested in simply
describing or analyzing our
sample.
We use our results to Infer the
characteristics of the larger
population we sampled from.
The quality of our inference is
shaped by factors such as how
Reliable and Valid our measures
are.
General population or
targeted sub-population
Sampling frame
The sample
Inference
From a sampling perspective,
Study results
Statistical Power is a key element.
Power refers to whether we had enough participants to adequately test
our hypothesis.

With too few participants we may not be able to tell an important effect from
simple chance results. We will discuss this in the statistics section.
Foundations of
Research
Who do you want to generalize to?
9
Mammals
Humans
All Western people
Sampling a broader
population (i.e.,
larger sampling frame)
increases external
validity.
All Americans
Young Americans
College students
This College
This class
Sampling a more
specific or
smaller frame
generally
increases
internal
validity.
Foundations of
Research
10
Who do you want to generalize to?
Samples typically represent targeted sub-populations


Demographic or ‘status’ groups;

Ethnicity, income or educational groups…

Geography; e.g., urban dwellers…



Medical / clinical groups; people with a
specific diagnosis or condition
Behavioral groups

Registered voters

Home owners

Ever used marijuana…
Targeting specific
Click for test.
groups increases
Internal validity by
decreasing the
complexity of the
sample.
…but may lessen
External validity by
Shutterstock.com
narrowing the focus.
Shutterstock.com
Groups defined by self-identification or subjective state

Views oneself as “highly likely to vote…”.

Above a ‘cut point’ on a stress, depression, or alcohol use scale

“Conservatives” vs. “Liberals”
Foundations of
Research
Research samples & validity
EXAMPLE
Clinical drug trials illustrate the
conflict between internal v.
external validity in sampling.

People with diverse symptoms and
backgrounds see physicians for
depression.

To enhance internal validity drug
researchers use exclusion criteria to
select only participants who fit a
specific definition of depression

Zimmerman et al. suggest that too
many exclusion criteria compromises
the validity of this research area. (click
image for article)
Zimmerman, M.l, Mattia, J.I., & Posternak, M.A. (2002). Are Subjects in Pharm-acological Treatment
Trials of Depression Representative of Patients in Routine Clinical Practice? Am J Psychiatry, 159,
469–473.
11
Foundations of
ResearchExclusion
criteria & validity
EXAMPLE
The study begins with a large N of
people self-referred for depression
They exclude those with serious
mental illness, drug abuse or
personality disorder…
…whose symptoms are not severe
enough, are suicidal, or who have
other affective disorders..
…whose symptoms are too recent OR
too long-standing…
…and end up with a small, carefully
selected sub-set of patients (8.4% of
general depression patients).
12
EXAMPLE
Foundations of
ResearchExternal
vs. internal validity in sampling

Applying rigorous study selection
criteria for drug trials excludes the
great majority of routine depression
patients.

Rigorous participant selection for
internal validity seriously
compromises external validity in
these studies.

This leaves the actual usefulness of
anti-depressant (and other)
medications for the general
population in doubt.

To be useful research must balance
the need for careful subject selection
with the need for representativeness
13
Foundations of
Research
14
Who is a group member?
Are you between 14 and 30 and have a
computer or smart phone available?
A = Yes
B = No
imgur.com, public domain
Foundations of
Research
Who is a group member?
Do you use Facebook or other social
media 5 times a week or more?
A = Yes
B = No
C = Not sure – lost count.
Is Facebook making us lonely?
Click.
15
Foundations of
Research
Who is a group member?
Are you a “Facebook user”?
A = Yes
B = No
C = Not sure – let me
Facebook that.
16
Foundations of
Research
17
Who is a group member?
A = Yes
B = No
C = Maybe – I’m not sure
Click for a Washington Post story; Who is Latino?
Shutterstock.com
Do you live in a neighborhood or
town that is mostly Latino?
Foundations of
Research
Who is a group member?
Do you speak Spanish?
A = Yes
B = No
C =¿cuál era la pregunta?
18
Foundations of
Research
Who is a group member?
Are you Latino?
A = Yes
B = No
C = Maybe – I’m not sure
19
Foundations of
Research
20
Define the target population
Who do you want to generalize to: who is in the group?

Once we choose our sampling group, we must
decide on criteria for membership…


To sample social media users do I use a …

Rough demographic criterion?

Behavioral criterion (which behavior?)

Self-identification?
To sample “Latinos”…



Clearer and
narrower group
criteria increases
Internal validity
by making the
sample more
homogeneous.
Is geographic status specific enough?
Is Spanish language the defining
characteristic?
Can / must one call oneself “Latino” (even
if you do not speak Spanish…)?
Shutterstock
Foundations of
Research
21
Define the target population
Who do you want to generalize to: who is in the group?

Once we choose our sampling group, we must
decide on criteria for membership…

To sample social media users do I use a …




Rough demographic criterion?
Some of these criteria are
easier to reliably measure
Behavioral criterion (which behavior?)
than others;
Self-identification?
To sample “Latinos”…



 Demographic variables
are often available in
census data
Is geographic status specific enough? Behavioral or subjective
Is Spanish language the defining
characteristic?
criteria require direct
assessment, and can be
less reliable.
Can / must one call oneself “Latino” (even if
you do not speak Spanish…)?
Foundations of
Research
Define the target population
22
Who do you want to generalize to: who is in the group?

Once we choose our sampling group, we must
decide on criteria for membership…

To sample social media users do I use a …




To



Rough demographic criterion?
 Of course different
criteria may yield very
Behavioral criterion (which behavior?)
different samples.
Self-identification?
 Our choice of sampling
criteria must be based on
sample “Latinos”…
our theory, hypothesis,
Is geographic status specific enough? or research question.
Is Spanish language the defining
characteristic?
Can / must one call oneself “Latino” (even if
you do not speak Spanish…)?
Foundations of
Research
23
Sampling criteria
Demographic or
‘status’ marker
Behavioral
Subjective / selfidentification
Who is a “Latino”?
Neighborhood
residence?
Spanish speaking?
Cultural practices?
Self-description?
# Hours registered
Describes occupation
as ‘student’
Who is a “Student”?
Lives on a campus
Who is “gay” or “lesbian”?
Lives same-sex 2person household?
Sexual or other
patterns?
Self-identification as
gay / lesbian?
Pattern of behaviors
and feelings?
Describes self as
“depressed”?
Who is “depressed”?
Received a diagnosis
from MH professional ?
Presents at Doctor’s
office for general
malaise?
Foundations of
Research
24
Sampling criteria
Demographic or
‘status’ marker
Behavioral
Subjective / selfidentification
Who is “Latino”?
Neighborhood
residence?
Spanish speaking?
Cultural practices?
Self-description?
# Hours registered
Describes occupation
as ‘student’
Who is a “Student”?
Lives on a campus
Who is “gay” or “lesbian”?
Lives same-sex 2person household?
Sexual or other
patterns?
Self-identification as
gay / lesbian?
Who is “depressed”?
 Each criteria may meet the goals of a particular
Received a diagnosis
Pattern
of behaviors
hypothesis
or empirical
question. Describes self as
from MH professional ?
and feelings?
“depressed”?
 Of course different
choices may lead to very
Presents at Doctor’s
different samples
office for general
 Some criteriamalaise?
are easy to assess but may be only
approximate
 Others may require relatively difficult assessments
25
Foundations of
Research
Who do you want to generalize to: Your “Sampling Frame”.

What is known about your larger population?

Are there Census or survey data?
 E.g., are there “population” data on depressed people?
 Do we know the demographic profiles of Facebook users?


Data about your target population will help you determine how
well your sample represents that population.
What is its size, sub-groups, location….
 Where / how can I best recruit members of the population?
 Will some sub-groups require different recruitment methods than
others?
 Will different recruitment methods be biased in favor of some subgroups?
 Internet surveys may be biased against older people.
 Studies that use monetary incentives pull for poorer people.
Foundations of
Research
Overview: From research question to sample
What is the research question?
 Are we describing some natural process?
 …testing a theory?
General population or
targeted sub-population
Sampling frame
What is the population of interest?
 What population does your research address?
 Whom do you want to generalize to?
The sample
Category of participant criterion?
 Demographic or “Status” criteria?
 Behavioral criterion?
 Self-Identification, attitudes or beliefs?
Operational definition of enrollment criteria?
 Inclusion &
Specific measures that define who does /
does not qualify for enrollment.
Exclusion Criteria:
Actual recruitment?
 Concrete (operational) processes to recruit and enroll participants.
26
Foundations of
Research
From theory to sample: Asthma
among African-Americans.
Study structure & research question:
EXAMPLE
 Adherence to a medication regimen is key to health among people
with asthma.
 Medication adherence is generally low, particularly among AfricanAmerican adolescents, who have high rates of asthma.
 Self-determination theory proposes that autonomous motivation
(being self-directed), self-confidence, and relatedness (family routines &
parental support) underlie adherence.
 This study tests the hypothesis that three variables comprising
self-determination theory will be associated with patients’
adherence to medications.
 Because young African-Americans have a significant health
burden from asthma, the study focuses on them.
Bruzzese, J., Idalski C, Lam, P, Deborah A.; Naar-King, S. (2014) Adherence to asthma medication regimens in
urban African American adolescents: Application of self-determination theory. Health Psychology, Vol. 33.5 (May
2014): 461-464. Article here.
27
Foundations of
Research
From theory to sample: Asthma among African Americans.
28
Population of interest?
 Young African-Americans who suffer from poorly controlled asthma.
EXAMPLE
Category of participant criterion?
 Demographic or Status criteria
 African-American adolescents  Self-Identification / attitudes
 Poorly controlled asthma.
not a criterion in this study.
 Behavioral criterion
 Already participating in long-term asthma control study.
Operational definition of enrollment criteria?
 “Adolescent”: Age 10 – 18.
 “Poorly controlled”: At least one asthma-related hospitalization or two
asthma-related emergency department visits in the last 12 months.
Actual recruitment?
 N = 162 participants recruited from the hospital’s outpatient
immunology clinic after an asthma-related clinic visit or
hospitalization
Foundations of
Research
29
Results
EXAMPLE
 Having asthma regulation
embedded in the family
routine was the only
predictor of medication
adherence.
Multiple regression analysis (all variables are tested simultaneously)
Shutterstock.com
Foundations of
Research
30
Research sampling
 Defining your target
population
 Probability & Non
Probability sampling
methods.
Shutterstock.com
Foundations of
Research
Major forms of sampling
Probability (Random) Sampling
 Recruit (or select) participants to maximize the representativeness of
the sample to a known population.
 Uses some form of random selection.
 Requires that each member of the population has a known (often
equal) probability of being selected.
 Most externally valid approach to sampling general populations
Non-Probability Sampling
 Use available samples for convenience, or targeted outreach to
unusual or small populations.
 Selection may be either systematic or haphazard, but is not random.
 Often the most externally valid approach to unusual, small, or extreme
groups, or groups where little is known.
 When used only for convenience it is the least externally valid.
31
Foundations of
Research
32
Watch that word ‘random’!
Participant
Selection
Sample
Random
Selection or a
Random Sample
refer to how we
recruit
participants; who
is in the sample.
Participant
Assignment
Experimental
Experimental
Treatment or
Procedures Manipulation
Results
Group A 
Procedure 
Treatment 
Outcome
Group B 
Procedure 
Control 
Outcome
(Group C) 
(Procedure ) 
(Alternate 
Treatment?)
(Outcome)
Random
Assignment is
how we (should)
assign
participants to
different groups.
Foundations of
Research
Probability / Random Sampling
33
• Core feature: all members of the study population have an equal (or
known) chance of being sampled
• Procedure: Choose participants in a systematic, random fashion.
• e.g., every 100th student ID,
• Every 1000th person on a voter registration record.
• Random digit dialing for telephone surveys.
• Advantages: eliminates obvious biases of convenience sampling
• Limitations:
• May under-sample unusual / hard to reach participants
• Some may be unavailable in, e.g., telephone lists, computer
files.
Foundations of
Research
Basic Forms of random sampling
• Simple Random Sampling: Select a specific % of a target
population; all members of population have about equal chance
of selection.
• Multi-Stage: Randomly select population units (census tracts,
households, schools..), then randomly select individuals within unit.
• Stratified: Random within population sub-blocks, e.g., gender
(randomly select 50 women and randomly select 50 men), ethnicity, etc.
• Cluster: Random within (potentially convenience) clusters, e.g.,
specific locations or “venues”, events, times of day, etc.
34
Foundations of
Research
Simple Random sampling
Objective: Attempts to truly represent the general population; absolute
minimal selection bias.
Procedure: Recruitment method where all members of the population
have ≈ chance of being selected:

Examples:
Polls or surveys using random digit dialing.
35
Foundations of
Research
Simple Random sampling
Objective: Attempts to truly represent the general population; absolute
minimal selection bias.
Procedure: Recruitment method where all members of the population
have ≈ chance of being selected:

Polls (e.g., Gallup) or surveys using random
digit dialing.
Examples:
 Dialing random phone numbers eliminates any selection bias on the
part of the researcher; calls are made by a computer.
 Everyone who has a phone has an = probability of being selected.
 Of course not everyone has a phone, many people block calls not in
their contact list, and so on.
 These and other demographic trends induce an unknown degree of
sampling bias.
36
Foundations of
Research
Simple Random sampling
Objective: Attempts to truly represent the general population; absolute
minimal selection bias.
Procedure: Recruitment method where all members of the population
have ≈ chance of being selected:

Examples:

Polls (e.g., Gallup) or surveys using random
digit dialing.
“American Community Survey”; census based random sample of households.
 The census is designed to assess literally every American household.
 A small sample (2.5%) of census respondents receive a longer survey
of demographics and social patterns.
 Despite being a mandatory, highly systematic sample, not everyone
completes the either the census or for the sub-sample.
 Both the poor or homeless and the very wealthy can be difficulty to
contact.
37
Foundations of
Research
Simple Random sampling
Objective: Attempts to truly represent the general population; absolute
minimal selection bias.
Procedure: Recruitment method where all members of the population
have ≈ chance of being selected:

Polls
Examples:

Census, “American Community Survey”…
Advantages:
Most representative sampling frame for the general population
Disadvantages:
Any recruitment method excludes some people (no telephone,
no stable address, etc.).
Very expensive for face-to-face (non-telephone or internet)
data collection.
38
Foundations of
Research
Multi-stage random sampling
Objectives:
Develop a focused & efficient random sample.
Use random sampling “stages” to reach
hidden, stigmatized or other hard to reach
groups.
 Simple random sampling is optimal.
 However, it is biased when relying on telephone, internet or
similar contact methods.
 A simple random sample for face-to-face interviews or
recruitment for a larger research study (e.g., a public health
intervention) is prohibitively expensive.
39
Foundations of
Research
40
Multi-stage random sampling
Objective: Focused & efficient
random sample.
General population
Procedure: Use successive
levels of random selection
to narrow the sample.
1st level random selection
 From the General
Population, randomly select
sub-population blocks.
2nd level random selection
 From those blocks
randomly select smaller
groups…
 And smaller groups withing
those blocks...
 To arrive at a efficient
random sample.
3rd level random
selection
Final sample
Foundations of
Research
41
Multi-stage random sampling
Selecting a random sample of
college students:
Rather than trying to randomly
sample the entire
population of students, we
narrow our approach.
 Randomly select subpopulation blocks.
General population
Universities
Classes
 Randomly select smaller
groups within them…
 Randomly select students
within the target classes...
 To arrive at a efficient
random sample.
Students
Final sample
Foundations of
Research
42
Multi-stage random sampling
“Real World” example: NIDA*
household surveys of drug
use.
 Randomly select a
moderate # of census
tracts.
 Randomly select modest #
of households within each
target tract…
 Interview the first adult who
answers the phone in each
household...
 To arrive at a efficient
random sample.
*National Institute on Drug Abuse
General population
Step 1.
Step 2.
Step 3
Final sample
Foundations of
Research
Multi-stage random sampling
Multi-stage sampling is also useful for “hard
to reach” population where we have no
sampling frame:
 For the student or drug study we know
what the population looks like:
 We have census data or university & class
lists.
 This gives us known population blocks that
we can randomize around.
 For a “hidden” population where we have no sampling frame…
 We do not know the size or geographic distribution of the group.
 We do not have simple population blocks available.
 …we must often resort to convenience sampling in venues
(locations, events) where the population may be available.
 This can lead to substantial bias due to non-random selection.
43
Foundations of
Research
44
Multi-stage random sampling
How to we randomize within
the constraints of
convenience (venue)
sampling?
Population of unknown size &
distribution.
This example is from the
“CITY” study of HIV risk
among youth.
Randomize Venues
 In each of 4 cities,
randomly select bars, clubs
& other venues attended by
young gay men.*
 Randomly select days &
times to recruit in selected
venues.
Randomize Day & Time
Randomly select
people
Final quasi Randomly approach every
random sample
4th person who enters the
venue for an interview
* Use qualitative interviews / direct observation
to determine which venues are relevant.
Foundations of
Research
Multi-stage random sampling
“CITY” study of HIV risk among youth.
 Many important research questions
involve people who would not be found
in simple random selection methods.
 With “hidden” or stigmatized groups
sampling must be very targeted.
 Targeted sampling can be biased toward
members who are easiest to contact.
 This approach adds random selection to
what ordinarily would be a (biased)
convenience sample.
45
Foundations of
Research
Multi-stage random sampling
“CITY” study of HIV risk among youth.
 This approach adds random selection to
what ordinarily would be a (biased)
convenience sample.
 Of course the sample is biased by larger
issues:
 Not all gay youth attend venues relevant to them.
 Concern over being identified as gay may lead
some # of men to refuse the study.
 Despite its limitations, this example shows how creative
sampling approaches can provide a less biased sample of a
difficult group to reach.
46
Foundations of
Research
Stratified or cluster sampling
Objective: Represent every key segment of the population.
Procedure:
 Decide which population segments are important;
 E.g., ethnic groups,
 Geographic areas,
 Self-identification.
This decision is based on your hypothesis
or empirical question.
 Randomly select from each segment.
 Proportionate: Sampling fraction from each segment should
approximate the overall population.


This is the distribution of ethnicity
in the U.S.
A stratified sample would randomly
select from each ethnic group to
approximate this distribution.
47
Foundations of
Research
48
Stratified or cluster sampling
Objective: Represent every key segment of the population.
Procedure:
Decide which population segments are important;
Randomly select from each segment.
 Proportionate: Sampling fraction from each segment should
approximate the overall population.
 Dis-Proportionate: Over-sampling population groups to ensure
you have large enough samples of small groups.
This estimates the distribution
of legal vs. illegal immigrants.
To directly compare groups we
over-sample illegal immigrants.
Foundations of
Research
Probability Sampling overview
49
Core features:

Random selection of participants from the population

Most externally valid approach.

Assumes:  A clear sampling frame.
Summary
 All segments of the population are available.
Variations:
Each member of the entire population has an ≈ chance of being
selected.
 Simple
b.
Select segments of the population, e.g., census tracts,
registered voters.
Each segment member has an ≈ chance of being selected.
 Cluster
a.
b.
Identify clusters, e.g. sports fans at a sports bar.
Each cluster member has an ≈ chance of being selected.
 Stratified
a.
b.
Identify strata, e.g., ethnic groups, gender, age groups..
Randomly select a proportion of each strata.
 Multi-stage
a.
Foundations of
Research
Non-Probability Sampling
50
Useful for populations that:
Cannot be randomly sampled; “hidden” or difficult to reach.
No sampling frame available, such as census data, describing its
size, composition, etc.
 Examples: drug users, recent immigrants, gay men…
Likely to misrepresent the population
 It may be difficult or impossible to detect this misrepresentation.
 Can be over-sensitive to incentives: paying participants attracts more
poor people.
 “Respondent Driven” sampling (RDS) allows for “targeted” population
estimates.
Foundations of
Research
Types of Non-probability Samples

Haphazard

Modal instance

Venue – time / space

Multi-frame

Snowball / Respondent driven

Web

Quota

Heterogeneity
51
Foundations of
Research
Haphazard Sampling
“Man on the street”; recruiting the most easily available
participants.



Literally recruiting in public places for, e.g., brief interview

College psychology majors.

Medical / therapy clients in a clinic.
Often used for quick interview / attitude studies on
current topics.
Advantage: Participants are readily available.
Problem:
No evidence for representativeness.
52
Foundations of
Research

Modal Instance Sampling
Recruit a “typical case” of a target population.


A member of a target population group;

…a ‘typical’ college student with debt.

An injection drug user, homeowner, etc.
A person affected by a major event;

A 9/11 survivor...

A witness to a natural disaster.
 Often used to describe an event or way of life, or to
generate hypotheses for later research.
Advantage: Direct, personal description.
Problems:
Potentially strong self-selection bias in who volunteers
for such personal disclosure
“Social desirability responding” – presenting oneself in
a positive light – or biased recall may compromise
accuracy of answers.
53
Foundations of
Research

Haphazard & Modal Instance Sampling
Both Haphazard & Modal Instance Sampling are used for
Case Studies.

Researchers elicit in-depth personal accounts of an event or
life pattern.

Multiple Case Studies interview a set of individuals who
express the phenomenon, e.g., members of a key group…
 In the Qualitative Research module we will discuss how in-depth
interviews can be analyzed.

These methods, plus direct observation, are often used by
journalists.
 For an interesting example see a New York Times discussion of the
“down low” phenomenon among African-American, bisexually active
men.
54
Foundations of
Research
Venue and Time / Space Sampling
55

Assumes that population group members are well represented in
particular places & times (“venues”).

Used to sample a specific, well-defined, often hard to reach group.

Venue sampling uses “Intercept” methods to reach participants.



Outreach workers use a standard recruitment script to approach potential
participants.

Data may be collected on site via a brief interview, such as shopping mall
intercepts.

Often the contact is used to collect (or distribute) contact information for
later participation
Time / Space randomization lessens bias due to choice of venue:

Randomly approach different venues at different times

Randomly select participants within the venue (e.g., every 4th person…)
These strategies must be based on a clear epidemiological or
theory question.
Foundations of
Research
56
Example of Venue sampling
Recruiting gay or bisexual men for
HIV research can present
challenges.
Example
 Simple recruitment methods for any
targeted population include:
• Newspaper or internet ads,
David J McKirnan
• Flyers in health clinics or popular stores..
 Many gay/bisexual men – particularly younger & minority
men – do not respond to simple methods…
• …due to distrust or disenfranchisement from the health care
system,
• Unwillingness to disclose sexual orientation in other than
‘gay friendly’ settings,
• The perception that research is irrelevant to them or may
harm them.
• Direct personal contact within gay/bisexual venues such as
clubs can help break down barriers to recruitment.
Foundations of
Research
Outreach / venue sampling
 Project MIX was a national safer sex
intervention study sponsored by the
Centers for Disease Control.
Example
 It had a complex sampling frame:
• ⅓ each African-American, Latino & Caucasian,
• ½ HIV infected / uninfected within each ethnic group.
 Outreach workers recruited in multiple venues;
• Bars and clubs
• Public areas; parks and neighborhoods where men congregated
• Community events, private ‘house’ parties, etc.
 Outreach workers were indigenous, i.e. gay/bisexual men
from the community.
• They were able to approach men in the target groups.
• They could explain the study and foster trust & cooperation.
57
Foundations of
Research
Outreach lead sheet
58
To broaden the sample, recruitment
stimuli show different ethnicities,
and cite a range of potentially
eligible behaviors…
Images used with permission, David McKirnan,
Project MIX Principle Investigator.
Foundations of
Research
Targeted Multi-frame Sampling

Often used to sample a specific, hard to reach group…

..with no census or similar data available to develop a
sampling frame.



No clear “population blocks” to use in a multi-stage sample,

The population spread among many venues or locations…

... and population segments are more or less sensitive to any
specific recruitment approach.
Based on preliminary qualitative work (interviews, direct
observations…) we develop multiple sampling “frames”:

Direct outreach.

Newsletters, internet lists, chat rooms

Organizations or meeting places.
Most common & valid convenience sample
59
Foundations of
Research
60
Targeted Multi-frame Sampling

Sample a specific, hard to reach group

No census or similar data for sampling frame.

Uses multiple (convenience) sampling “frames”.
Example: recruiting gay men for HIV research
We use multiple ways to approach men or for them to contact us.
Multi-frame sample

Direct outreach in
bars, clubs, street.

Community events
(festivals…)

Newspaper ads.



Flyers in bars &
stores.
Medical clinics.
“Snowball” / word of
mouth.
Foundations of
Research
Targeted Multi-frame Sampling
61
 Each sampling “frame” is a convenience approach – we typically
cannot randomly select participants.
 By using multiple frames we can recruit a broad cross-section of
the population.
 This also allows us to “test” different sampling approaches or
venues.
 In HIV research:
 Do riskier men come from one type of sampling frame? E.g. bar
venues…
 Are there ethnic or other differences in
the participants who are recruited in one
type of venue or another…
 These data can help
us better understand
or sample and study
results.
Foundations of
Research

Snowball / “respondent Driven” Sampling
62
Early participants are paid to recruit others, who
recruit others, etc.
Choice of seeds.

Form of targeted sampling:

Recruit network of “linked” people tracked by referrals
Problem:
Eligibility criteria
Sensitive to incentives!
Advantage: Access unusual or “hidden” people related
by a common behavior.




With enough “generations” of links can well represent a
target population.
Often part of multi-frame approach.

With RDS can show “chain” of referrals / links.
Useful for people who mistrust research or where personal
contact is necessary for recruitment (HIV, drug use).
Portrays “chain” of influence or, e.g., infectious disease.
Foundations of
Research




63
An initial set of participants are recruited according to a specific
set of inclusion / exclusion criteria:


Snowball / Respondent Driven sampling (RDS).
Characteristics of the initial participants – the “seeds” of the network
– help determine who subsequent participants will be.
The seeds may be, e.g., injection drug users (inclusion criterion) who
do not live outside a specific geographic area (exclusion criterion).
Initial participants are paid to recruit others, who are paid to recruit
still others … all using the same in(ex)clusion criteria.
Over “waves” of recruitment RDS recruits a network of “linked”
people;

A network of, e.g., musicians, gamblers, sexual partnerships…

Population of an organization or school…
With enough waves of recruitment, Snowball / RDS can
produce accurate population estimates for a given subpopulation.
Foundations of
Research

64
Snowball / Respondent Driven sampling (RDS).
Snowball / RDS can also provide insights into links among
participants:

Since participants recruit each other, we can track who is linked to
whom.

Basic study measurements can help us determine the
characteristics of people who are linked (e.g., sexually) with each
other.

We can also assess people who are “nodes” in social networks:
who knows (recruits) a lot v. few people?
 What influence do those people have?

Problem: Sensitive to incentives;

Advantage: Access unusual or “hidden” people related by a
common behavior or venue.

Using participants to recruit others they know may yield a more
representative sample.

Researchers can collect data on who recruits whom to examine
the structure of social networks
Foundations of
Research
RDS coupon examples
65
 These are examples of
cards used in an RDS
recruitment among
injection drug users.
 Initial “seeds” are
interviewed, then given 5
cards to distribute to
people who meet the
eligibility requirements.
 2nd “wave” participants
receive the same card
set.
 Participants get $30 for
the study interview, and
$20 for each person they
refer.
Heckathorn, D.D. & Magnani, R. (2004). Snowball and RespondentDriven Sampling. In: Behavioral Surveillance Surveys: Guidelines for
Repeated Behavioral Surveys in Populations at Risk of HIV
Foundations of
Research
RDS; chain description
Heckathorn, D.D. & Magnani, R. (2004). Snowball and Respondent-Driven Sampling. In: Behavioral
Surveillance Surveys: Guidelines for Repeated Behavioral Surveys in Populations at Risk of HIV.
66
Foundations of
Research
Example of social network sampling:
Bearman et al., Romantic ties among adolescents
67
With a number of
smaller chains
And a small % in 2 to
4 person chains


A substantial majority of students are in an
extended, linked chain of relationships.
From sampling
perspective, several
“seeds” access
most of the
population
Findings suggest a
clear potential for
STI transmission.
Bearman, P. et al., American Journal of
Sociology, Volume 110 Number 1 (July
2004): 44–91. Click image for article.
Foundations of
Research

Non-Probability methods
Quota Sampling
Similar to cluster sampling, except
you cannot randomly sample each
Select people non-randomly
according to quotas
population segment.
 Must have clear theory / research question to pick
relevant population characteristic(s).
 Proportional quota sampling
• Represent major characteristics of a population. If gender is
important, and the proportion of women :: men in your
population = 65% :: 35%, the sample must meet that quota.
 Non-proportional quota sampling
• Sample enough members of each group to test hypothesis,
even if the sample is not proportional. (e.g., recruit 50 women &
50 men, even though the real proportion is 65::35).
• Helps assure that you have good representation of smaller
population groups.
68
Foundations of
Research

Non-Probability methods
Web sampling
Typically highly targeted samples
 Gay / bisexual men…
 Adolescents…
 “Gamers”…

Typically access through existing venues:
 Users of specific web sites
 List-serves, e-mail lists
 Active recruitment in “chat rooms”
Problem: Inherent bias in computer literacy(?)
Advantage:  Cheap large national sample
 Access unusual or “hidden” people who reach
others via internet
69
Foundations of
Research
Non-Probability methods;
Heterogeneity Sampling
70
• Sample every sector of a population -- at least several of
everyone -- without worrying about proportions.
• At least some members of each geographic area
• …ethnic group
• …behavioral group (voters & non-voters…)
• Assume that a few people are a good proxy for the
group.
Examples: focus groups or qualitative interviews about products,
social issues...
Problem; Cannot be sure a few people really represent
their sub-group.
Advantage: At least some representation of all subgroups.
Foundations of
Research
Click
A probability sample is…
A = Based on some form of random selection.
B = Always representative of the population
C = Best for any population
D = Is usually easier to collect than other sample approaches.
71
Foundations of
Research
Click
A Gallup poll or telephone survey is a…
A = Simple random sample.
B = Multi-stage random sample.
C = Social network or “snowball” sample.
D = Haphazard sample.
72
Foundations of
Research
Click
Respondent-driven sampling, where target
people recruit people like them, is a…
A = Simple random sample.
B = Multi-stage random sample.
C = Social network or “snowball” sample.
D = Haphazard sample.
73
Foundations of
Research
Click
My distributing a survey to this class is a…
A = Simple random sample.
B = Multi-stage random sample.
C = Social network or “snowball” sample.
D = Haphazard sample.
74
Foundations of
Research
Click
Selecting every 100th registered voter and
contacting them for a survey is a…
A = Simple random sample.
B = Multi-stage random sample.
C = Social network or “snowball” sample.
D = Haphazard sample.
75
Foundations of
Research
Click
Randomly selecting classes across the
university, than sampling each 3rd person, is a…
A = Simple random sample.
B = Multi-stage random sample.
C = Social network or “snowball” sample.
D = Haphazard sample.
76
Foundations of
Research
Click
A non-probability sample…
A = Is perfectly OK if you have limited resources.
B = Just consists of grabbing the most convenient possible
participants.
C = Is never adequate to generalize from.
D = Can be best for hard to reach or unusual participants.
77
Sampling overview
Foundations of
Research
78
Who do you want to generalize to?
Summary



Who is the target population?

broad – external validity

narrow – internal validity
How do you decide who is a member?

demographic / behavioral criteria?

subjective / attitudinal?
What do you know about the population already – what
is the “sampling frame”.
Is a Probability or random sample possible?

“Hidden” population?

Socially undesirable research topic?

Easily available via telephone, door-to-door?

Sampling frame adequate to choose selection method?
Foundations of
Research
Overview, 2
Summary
Types of Non-probability Samples








Haphazard
Modal instance
Venue – time / space
Multi-frame
Snowball / Respondent driven
Web
Quota
Heterogeneity
79
Foundations of
Research
80
Overview, 3
Probability sampling
 Most externally valid
Summary
 Assumes:
 Clear sampling frame
 Population is available
 Less externally valid for
hidden groups.
Non-probability sampling
 targeted / multi-frame
 snowball
 quota, etc.
 Less externally valid
 High “convenience”
 Best when:
 No clear sampling frame
 Hidden / avoidant
population.
Download