Uploaded by Simachew Bayabil

Sampling

advertisement
SAMPLING
Abraham L.(MPHE)
OBJECTIVES
At the end of the session, participants are expected to:
• Identify and define the population(s) to be studied.
• Identify and describe common methods of sampling.
• Discuss problems of bias that should be avoided when
selecting a sample.
• Compute sample size for different study designs
• Decide on the sampling method(s) and sample size(s) most
appropriate for the research design you are developing.
4/30/2022
2
Outline
• Population
• Sampling techniques
• Sample size determination
4/30/2022
3
Brain storming
 Define the following terms?
• Target population
• Source population
• Study population
• Participant population
• Sampling frame
• Sampling unit
• Study unit
4/30/2022
4
Definition of terms
• Population in scientific research refers to the material of the
study, whether it is human subjects, animals or inanimate objects.
• Target/Reference population
– The population about which the researcher wants to draw
conclusions
– The population of interest for implementing public health
actions
– Example: all children in a town
• Source population
– All or accessible subset of the target population, from which
the sample is drawn for particular study
– Example, all children attending schools as proxy for all
children living in the town
4/30/2022
5
Definition…
• Study population
– Study populations is part of the population from whom you
would collect the data
– Sampled individuals who fulfill inclusion and exclusion
criteria
– Example: sampled school children who fulfill the eligibility
criteria
• Participant population
– Refers to eligible members of the sample who actually are
investigated
– Non-respondents in the sample population are not
participants
– Example: sampled students who have been studied
4/30/2022
6
Definition…
• Sampling frame
– List of all the sampling units from which sample is drawn
• Sampling Unit
― Smallest unit from which sample can be selected, or
― The unit of selection in the sampling process
• Sampling fraction
― The ratio of the number of units in the sample to the number
of units in the reference population (n/N).
4/30/2022
7
Definition…
• Study unit or Observation unit: is the unit from which data are
actually collected. Example: a student
• The way we define our study population and our study unit depends
on the problem we want to investigate and on the objectives of the
study
Problem
Source population
Study population
Study unity
Malnutrition related All children 6-24 Children 6-24 months One child between 6
to weaning in district months of age in of age in X district and 24 months in
X
district X
who
fulfill
the district X
eligibility criteria
High droup-out rates All primary schools Selected
primary One primary school
in primary schools in in district Y
schools in district Y
in district Y
district Y
Inappropriate record
keeping
of
hypertensive patients
registered in hospital
Z
4/30/2022
All records of Records
of One record of a
hypertensive
hypertensive patients hypertensive patient
patients in hospital in hospital Z
registered in hospital
Z
Z
8
What is sampling?
• Sampling is the process of selecting a number of study units
from entire population of interest.
• When we draw a sample from a population we will be
confronted with the following questions:
 What is the group of people (study population) from
which we want to draw a sample?
 How many people do we need in our sample?
 How will these people be selected?
4/30/2022
9
Why Sampling?
• Cost in terms of money, time and manpower
• Accessibility
• Utility
– e.g. to do diagnostic laboratory test you don’t draw the whole of patient’s
blood.
4/30/2022
10
Advantages of sampling
• Feasibility: Sampling may be the only feasible method of
collecting information
• Reduced cost: Sampling reduces demands on resource such as
finance, personnel, and material.
• Greater accuracy: Sampling may lead to better accuracy of
collecting data
• Greater speed: Data can be collected and summarized more
quickly
4/30/2022
11
Disadvantages of sampling
• There is always a sampling error
• Sampling may create a feeling of discrimination within the
population
• Not advisable where every unit in the population is legally
required to have a record
• Minority and smallness in number of sub-groups often render
study to be suspected
• Sampling bias
4/30/2022
12
Error in sampling
• No sample is the exact mirror image of the population
1.
4/30/2022
Sampling error (chance/ Random error): Errors introduced
due to errors in the selection of a sample.
Can not be avoided or totally eliminated
13
1. Sampling error…
• The chance and random variation in variables that occurs when
any sample is selected from the population
• Sampling error is to be expected
• To avoid sampling error, a census of the entire population must
be taken
• To control/minimize sampling error, increase sample size and
use various sampling methods.
4/30/2022
14
2. Non-sampling error (systematic error)
• In the design or conduct of a sampling procedure which results in
distortion of the sample
•
So that it is no longer representative of the reference population
 Observational error
 Respondent error
 Lack of preciseness of definition
 Error in editing and tabulation of the data
• It can be eliminated or reduced by careful design and conduct of
the study, not by increasing the sample size
4/30/2022
15
Sampling techniques/methods
• Refers to ‘how the sampled population will be selected from the
study population?’
• Clearly define study population and study unit
– Study population – individuals, households, institutions,
records
– Study units – one individual, single household, institution or
record
• Types: probability and non-probability
– Probability – quantitative studies
– Non-probability – qualitative studies
4/30/2022
16
Probability sampling methods
• Any method of sampling that utilizes some form of random
selection.
• Involves random selection of a sample
• Every sampling unit has a known and non-zero probability of
selection into the sample
• Involves the selection of a sample from a population based on
chance
4/30/2022
17
Probability sampling…
• Probability sampling is:
 more complex,
 more time-consuming and
 usually more costly than non-probability sampling
• However, because study samples are randomly selected and their
probability of inclusion can be calculated:
 reliable estimates can be produced and
inferences can be made about the population.
4/30/2022
18
Probability sampling…
• There are several different ways in which a probability sample
can be selected
• The method chosen depends on a number of factors, such as
the available sampling frame
how spread out the population is
how costly it is to survey members of the population
Homogeneity of the target population
4/30/2022
19
Types of probability sampling methods
1. Simple random sampling
2. Systematic random sampling
3. Stratified random sampling
4. Cluster sampling
5. Multi-stage sampling
4/30/2022
20
Simple random sampling(SRS)
• The required number of individuals are selected at random
from the sampling frame, a list or a database of all individuals in
the population
• Each member of a population has an equal chance of being
included in the sample.
• To use a SRS method:
 Make a numbered list of all the units in the population
Each unit should be numbered from 1 to N (where N is
the size of the population)
 Select the required number.
4/30/2022
21
Simple random sampling…
• The randomness of the sample is ensured by:
Use of “lottery’ methods
Table of random numbers
Computer programs
4/30/2022
22
Simple random sampling…
• Limitations of SRS
Requires a sampling frame.
Difficult if the reference population is dispersed.
Minority subgroups of interest may not be selected
4/30/2022
23
Systematic random sampling
• Sometimes called interval sampling
• Selection of individuals from the sampling frame systematically
rather than randomly
• Individuals are taken at regular intervals down the list
• The starting point is chosen at random
4/30/2022
24
Systematic random sampling
• Important if the reference population is arranged in some order:
Order of registration of patients
Numerical number of house numbers
Student’s registration books
• Taking individuals at fixed intervals (every kth) based on the
sampling fraction
4/30/2022
25
Steps in systematic random sampling
• Number the units in the population from 1 to N
• Decide on the n (sample size) that you want
• k = N/n = the interval size
• Randomly select an integer between 1 to k
• Then, take every kth unit
4/30/2022
26
Systematic random sampling…
• E.g.-to select 100 students from 1200, first calculate sampling
interval-1200 divided by 100=12. Then randomly select the first
student and finally pick every 12 th student until 100 students are
selected.
• Advantage
Easier and less time consuming
• Limitations
Risk of bias
Difficult to use when a cyclic repetition is inherent in the
sampling frame.
4/30/2022
27
Stratified random sampling
• It is done when the population is known to be have
heterogeneity with regard to some factors and those factors
are used for stratification
• A method of probability sampling in which the population is
divided into different subgroups and samples are selected from
each subgroup
• These subgroups are homogeneous and mutually exclusive groups
called strata
• A population can be stratified by any variable that is available for
all units prior to sampling (e.g., age, sex, province of residence,
income, profession, etc.).
4/30/2022
28
Stratified random sampling…
• Divide the population into non-overlapping groups (i.e., strata)
N1, N2, N3, ... Ni, such that N1 + N2 + N3 + ... + Ni = N.
• A separate sample is taken independently from each stratum
depending on the type of allocation
• Elements within each strata are homogeneous, but are
heterogeneous across strata.
• A simple random or a systematic sample is taken from each
strata
4/30/2022
29
Why stratification?
• It can make the sampling strategy more efficient
• A larger sample is required to get a more accurate estimation if a
characteristic varies greatly from one unit to the other
•
For example, if every person in a population had the same
salary, then a sample of one individual would be enough to get a
precise estimate of the average salary.
4/30/2022
30
Why stratification?
• If you use a SRS approach in the whole population without
stratification, the sample would need to be larger than the total
of all stratum samples to get an estimate of total income with
the same level of precision.
• Stratified sampling ensures an adequate sample size for subgroups in the population of interest
• When a population is stratified, each stratum becomes an
independent population and you will need to decide the sample
size for each stratum
4/30/2022
31
Stratified random sampling…
• There are different sample allocation methods in order to select
sample from each strata:
1. Proportional allocation: allocating sampling proportional to
the total population of each strata using the formula:
ni = n* Ni
N
Where, n=total sample size to be selected
– N=total population
– Ni = total population of each strata
– ni=sample size from each strata
2. Equal allocation: allocating equal sample for each strata
4/30/2022
32
Cluster sampling
• Usually, it is too expensive to carry out SRS
Population may be large and scattered
Complete list of the study population unavailable
Travel costs can become expensive if interviewers have to
survey people from one end of the country to the other
(most widely used to reduce the cost)
• The clusters should be homogeneous, unlike stratified sampling
where the strata are heterogeneous
4/30/2022
33
Cluster sampling…
• A cluster sample is a simple random sample of groups or
clusters of elements
• Useful method when it is difficult or costly to develop a
complete list of the population members or when the population
elements are widely dispersed geographically
• Cluster sampling may increase sampling error due to
similarities among cluster members
4/30/2022
34
Stratification Vs Clustering
Stratification
• Dived population into groups
different each other: Sex, age,
race, reidence
• Sample randomly from each
group(strata)
• Less error compared to
simple random
• More expensive to obtain
stratification
information
before sampling
4/30/2022
Clustering
• Dived
population
into
comparable groups: schools,
cities
• Sample randomly some of
groups (clusters)
• More error compared to
simple random
• Reduces costs to sample only
some areas or organization
35
Multi-stage sampling
•
•
•
•
•
•
It is the combination of different sampling methods
Carried out in stages
Used in very large and diverse populations
The method used in most community based studies
This type of sampling requires at least two stages
The primary sampling unit (PSU) is the sampling unit in the first
sampling stage.
• The secondary sampling unit (SSU) is the sampling unit in the
second sampling stage, etc.
4/30/2022
36
Multi-stage sampling…
Woreda
PSU
Kebele
SSU
Sub-kebele
TSU
HH
4/30/2022
37
Non-probability sampling
• Non probability sampling does not involve random selection
• Independent of the rationale of probability theory
• In non-probability sampling, every item has an unknown
chance of being selected
• In non-probability sampling, there is an assumption that there is
an even distribution of a characteristic of interest within the
population
• This is what makes the researcher believe that any sample would
be representative and because of that, results will be accurate.
4/30/2022
38
Non-probability sampling…
• Reliability cannot be measured in non-probability sampling; the
only way to address data quality is to compare some of the
survey results with available information about the population
• They are quick, inexpensive and convenient
• When unfeasible to conduct probability sampling
4/30/2022
39
Types of non-probability sampling
1.
Convenience or haphazard sampling
2.
Volunteer sampling
3.
Judgment sampling
4.
Quota sampling
5.
Snowball sampling
4/30/2022
40
Convenience
• Convenience sampling is sometimes referred to as haphazard or
accidental sampling.
• It is not normally representative of the target population because
sample units are only selected if they can be accessed easily and
conveniently
• The method is easy to use, but that advantage is greatly offset by
the presence of bias
• It can deliver accurate results when the population is
homogeneous
• E.g.-including all patients visiting OPD in one day to study their
attitude towards family planning
• Drawback-unrepresentative samples
4/30/2022
41
Volunteer sampling
• As the term implies, this type of sampling occurs when people
volunteer to be involved in the study.
• In pharmaceutical trials (drug testing), for example, it would be
difficult and unethical to enlist random participants from the
general public.
• In these instances, the sample is taken from a group of
volunteers
• Sampling voluntary participants as opposed to the general
population may introduce strong biases.
• Often in opinion polling, only the people who care strongly
enough about the subject tend to respond
• The silent majority does not typically respond, resulting in large
selection bias
4/30/2022
42
Judgment Sampling
• It is used when a sample is taken based on certain judgments
about the overall population
• The underlying assumption is that the investigator will select
units that are characteristic of the population
• The critical issue here is objectivity: how much can judgment be
relied upon to arrive at a typical sample?
• Judgment sampling is subject to the researcher's biases and is
perhaps even more biased than haphazard sampling.
4/30/2022
43
Judgment Sampling
• Researchers often use this method in exploratory studies like
pre-testing of questionnaires and focus groups.
• They also prefer to use this method in laboratory settings where
the choice of experimental subjects (i.e., animal, human) reflects
the investigator's pre-existing beliefs about the population.
• One advantage of judgment sampling is the reduced cost and
time involved in acquiring the sample
4/30/2022
44
Quota Sampling
• This is one of the most common forms of non-probability
sampling
• Sampling is done until a specific number of units (quotas) for
various sub-populations have been selected
• In many cases where the population has no suitable frame, quota
sampling may be the only appropriate sampling method
• E.g.-certain number of patients from each religion to assess their
attitude towards family planning
4/30/2022
45
Snowball Sampling
• Used in studies involving respondents who are rare to find.
• To start with, the researcher compiles a short list of sample
units from various sources
• Each of these respondents are contacted to provide names of
other probable respondents.
4/30/2022
46
Sample size determination
• Sample size is the number of study subjects selected to represent
a given study population
• Should be sufficient to represent the characteristics of interest
of the study population
• In estimating a certain characteristic of a population, sample size
calculations are important to ensure that estimates are obtained
with required precision or confidence
4/30/2022
47
Common questions
• “How many subjects should I include in my study?”
• Which variables should be included in sample size calculation?
Should be related to the study’s primary outcome variable
If the study have secondary outcome variables which are
considered important, the sample size should also be
sufficient for the analysis of these variables
4/30/2022
48
Sample size determination
Depends on
• Objective of the study
• Design of the study
• Plan for statistical analysis
• Accuracy of the measurement to be made
• Degree of precision required for generalization
• Degree of confidence
4/30/2022
49
Sample size determination
• Sample size determination techniques
Compute manually using formulae
Use computer soft wares like statcalc of Epi Info, OpenEpi
and STATA
Formulae vary depending on type of design
4/30/2022
50
Sample size – prevalence studies
• For descriptive cross-sectional designs
– Single population proportion estimation formula
– Decide and enter the value of required parameters
• n
𝑝(1−𝑃)
2
=(Za/2)
𝑑2
4/30/2022
51
Sample size – prevalence studies
• n-is minimum sample size
• p-is estimate of the prevalence rate for the population
– From previous studies
– A pilot or preliminary sample
– If not, to come with large sample size; set P=0.5.
• d-is the margin of sampling error tolerated; commonly taken to
be 5% but should be decreased for rare conditions
• Za/2 is the standard normal variable at 1-α % confidence level
and α is mostly 5% i.e. 95% confidence level is used
• N-population size
4/30/2022
52
Exercise – Sample size in prevalence studies
• What sample size do we need to estimate the prevalence of HIV
among residents of Addis Ababa city with 95% confidence so
that the error of estimation is within 5% of its actual value?
– Use Open Epi software to compute the sample size.
4/30/2022
53
Sample size – comparing two proportions
• For comparative cross-sectional and cohort designs (IP)
– two population proportion (RR or PR ratio) estimation
formula
– Decide and enter the value of required parameters
• Confidence level – usually taken to be 95%
• Power – Usually a power of 80% is used
• Ratio of non-exposed to exposed in sample
– 1:1 is statistically efficient
– If exposure is rare increase ratio
• Percent of unexposed with outcome
• Percent of exposed with outcome or
• Risk or prevalence ratio
4/30/2022
54
Exercise – Sample size for comparing two
proportions
• A study is designed to compare the proportion of nurses leaving
health services in urban and rural areas. From available literature
30% and 15% of nurses are estimated to leave services in rural
and urban areas within three years of graduation respectively.
What sample size is required for the study?
4/30/2022
55
Sample size – case-control studies
• For case-control studies
– Formula for case-control (unmatched)
– Decide and enter the value of required parameters
• Confidence level – usually taken to be 95%
• Power – Usually a power of 80% is used
• Ratio of controls to cases in sample
– 1:1 is statistically efficient
– If disease is rare increase ratio
• percent of controls exposed
• percent of cases exposed or
• OR
4/30/2022
56
Exercise
• Suppose you want to compare exposure status between cases
and controls at 95% confidence level and with power of 80%
using a 1:1 ratio of cases to controls while looking for an odds
ratio of 2. You assume the prevalence of exposure in controls to
be 25%. How many sample size do you need?
4/30/2022
57
Other considerations in sample size determination
• Sampling technique
– In complex samples (cluster, multistage) increase the sample
size to account for design effect
– Design effect - ratio variance of estimate derived from a
complex sampling design to the variance of estimate from
simple random sample
– Usually sample size is multiplied by 2 (1.5) in cluster sampling
• Increase – large PSU, many stages, clustered variable
4/30/2022
58
Other considerations in sample size determination
• Non-response
– Add contingency – 10%
• More – sensitive topic, self-administered questionnaire
( up to 30%)
– Response rate for cross-sectional survey should be >85%
• More than one item to be measured
– Use the most important one or the one which gives higher
sample
4/30/2022
59
Other considerations in sample size determination
• Finite population correction formula can be used as needed
• If N (entire population) is less than 10, 000, the required sample
size will be smaller
• In such cases calculate the final sample estimate nfinal by using
the following formula:
where nfinal = the final sample size,
n= initial sample size and
N = total number population
4/30/2022
60
Sample size for other designs
• Qualitative methods – estimate, not determined
• Reading
– Matched case-control study
– Survival analysis
– Repeated measurement cohort studies
4/30/2022
61
Download