Sampling

advertisement
Methodology Glossary Tier 1
Sampling
Obtaining the best estimate
Sampling is the process by which a feature of interest (or parameter) relating to
a group of interest (or population) is estimated, by measuring its value in a
smaller but representative sub group (or sample). The aim of sampling is to
enable estimates or statistics that are as close as possible to the real value in the
population.
Here are some hypothetical examples:
SAMPLE
STATISTIC
POPULATION
PARAMETER
1,000 people
selected randomly
from the electoral
roll
6,000 participants
in the Labour Force
Survey in Scotland
Average peppermint
consumption = 3.44
per week
The electoral roll in
Scotland
True average = 3.39
peppermints per week
Unemployment rate
= 3.6% in Spring
2011
All working age people
in Scotland (April 2001
Census)
Unemployment rate =
3.57 % on Census day
(April 2011)
350 new
admissions to
Glasgow hospitals
in January 2010
Average age = 37
years
All inpatients Glasgow
Hospitals
Average age = 39.
years
Sampling is necessary when it is impossible or impractical to look at every
individual member of the population, which in reality is most of the time.
However, estimates obtained from samples can never perfectly match the true
population parameters because of the information that is missing for the nonsampled population members. The mismatch between sample statistic and the
population parameter is called the sampling error and the aim of a successful
sampling exercise is to minimise this sampling error in order to be confident that
the statistic is a good enough estimate to be useful. If the sampling error is too
great then a statistic may be unreliable or misleading.
Sampling error depends essentially on three things:
(i) Sample size. Increasing sample size makes samples
representative of the population and reduces sampling error.
more
(ii) Variability in the population. High variability in the population will lead to
high variation from sample to sample, i.e. high sampling error; sample
statistics may not be a very reliable guide to the feature of interest, e.g.
peppermint consumption, in the population.
Methodology Glossary Tier 1
(iii) Sampling method. The method by which the members of the sample
are chosen. A good sampling method will provide a sample that is
representative of the population of interest.
Sampling methods
Simple random sampling - the ideal method, where each individual in the
population has the same chance of being selected. While simple random
sampling represents the best theoretical approach, it is not the best method in
every situation.
If a population is strongly grouped in some way and the size of the sample is very
limited (perhaps for reasons of cost, or accessibility), some groups may not be
adequately represented in the sample (e.g. old people, reindeer farms, isolated
communities) and stratified random sampling may be better.
Stratified random sampling - The representation of clearly defined groups in a
relatively small population can be improved by stratified random sampling,
whereby random sampling is undertaken separately within each group (or
stratum) of interest within the population.
The main advantage of the simple random and stratified random methods is that
they minimise sampling error and usually produce the most representative
samples of the population of interest. Furthermore, the calculation of sampling
error (used to judge the reliability of a statistic as an estimate of the
corresponding feature in the population) is relatively easy for simple random and
stratified random samples.
Cluster sampling - This method of sampling is useful when the population is
divided into clusters (such as towns, industries, postcode sectors) and when the
amount of sampling that can be undertaken is limited. Firstly, a representative
sample of clusters is selected and then the sample units (such as households),
are randomly chosen within each chosen cluster. The advantages might also
include reduced travelling costs for the survey data collectors when the clusters
are widely scattered geographically.
Example. The selection of bank employees to take part in a staff training survey.
Under cluster sampling, branches of the bank would be chosen at random and
then individual staff members would be chosen at random from within each of
those branches. This method would remove the need to conduct interviews at, or
gather in survey questionnaires from all branches, with the financial and time
costs that this would entail.
Quota sampling - Quota sampling is most often used when a smaller cost or
faster results are required and it is less important to have a sample which
represents all the population. In quota sampling the selection is not random but is
Methodology Glossary Tier 1
based on pre determined allocations from specified sub-groups of the population.
For example, a researcher may be told to interview 100 males and 100 females
to include 50% aged below 21 years in each case. There is no obligation to
select randomly within each of these groups, only to meet the quota and so the
sample may not be as representative of the population as other sampling
methods. The researcher may decide to interview each person of each age/sex
as they are encountered, or perhaps each second person encountered, or to use
some other convenient criterion (such as how helpful they look!). The nonrandom nature of this method makes it impossible to estimate the sampling error
and calculate confidence intervals, but this may be less important than other
priorities such as speed or targeting a specific group.
Example. A short pilot study to compare the effectiveness of several advertising
campaigns for stair-lifts would not be concerned about the impact of each
campaign on people of all ages but might focus on people aged over 70 years
seen using walking sticks. An interviewer may be given a quota of the speaking
to the first 100 people in this category emerging from each of a city’s shopping
centres, over a one week period. A random selection process would be too time
consuming and unnecessary given the very specific target group.
The goal is to obtain a representative sample of the population of interest within
the practical constraints (usually economic) on the amount of sampling that can
be undertaken. Care must be taken that the analysis results are not extended to
part of a population not represented in the survey.
Further Information
Tier 2 Stratified Random Sampling | Cluster Sampling
Link Office for National Statistics methodology pages
Download