06 Memobust course Sample Selection

advertisement
Sample Selection
Eurostat
Presented by
• Desislava Nedyalkova
• Swiss Federal Statistical Office
The Sample Selection topic
• The topic covers two main subjects which
correspond to two complementary phases in the
process of designing and conducting business
surveys :
– Sample design and selection
– Sample coordination
Overview of the topic (I)
The sample selection part consists of:
• Main theme module which covers the most used
sampling designs in business surveys
• Two method modules:
– Balanced sampling for multi-way stratification
– Subsampling for preliminary estimation
Overview of the topic (II)
The sample coordination part consists of:
• Main theme module on sample coordination
• Three method modules:
– Sample co-ordination using simple random sampling
(SRS) with permanent random numbers (PRNs)
– Sample coordination using Poisson sampling with
permanent random numbers (PRNs)
– Assigning random numbers when co-ordination of
surveys based on different unit types is considered
Sample selection
• Designing a sample in business statistics is a
challenging task (Sigman and Monsour, 1995):
– The population is often skewed.
– Dynamic membership:




Creation of new businesses
Change in structure of businesses
Closed-down businesses
Changes in type or level of activity
– Inter-business relationship.
Stratified sampling I
• Advantages:
– The population can be divided into distinct, independent
subpopulations called strata.
– Leads to more efficient statistical estimates.
– Different sampling techniques, e.g. simple random
sampling, can be used for different subpopulations.
• Disadvantages:
– Requires the selection of relevant stratification variables.
– It is not useful when there are no homogeneous
subgroups.
Stratified sampling II
• Questions:
– How should strata be constructed?
– How should sample size be allocated to strata?
• Optimal conditions for stratification:
– Elements within a stratum are more similar to each
other than to elements in other strata (homogeneous
strata).
– Large variability between strata, good size variable.
– The stratification variables are strongly correlated with
the variables of interest.
Probability proportional to size (pps)
sampling
• Alternative to stratification
• Main characteristics:
– The probability of inclusion of a unit in the sample is
proportional to some numeric size variable (e.g.
turnover, number of employees).
– PPS designs of fixed (sequential Poisson sampling) or
random (Poisson sampling) sample size.
– Easy implementation (e.g., Hartley and Rao, 1962).
– Preferred usage: small samples.
– In business statistics : Price Index Surveys.
Other sampling schemes
• Cut-off sampling (Knaub, 2008)
– Non-probability sampling design where some elements
of the population have no chance of selection.
– Use: in very skewed populations (very many small
businesses and a few large ones).
• Systematic sampling (Cochran, 1977)
• Balanced sampling (Deville & Tillé, 2004)
– The Horvitz-Thompson estimate of the total of the
auxiliary variable is equal to the population total of the
auxiliary variable (design-based approach).
One-way stratification
• Stratified sampling (one-way stratified design):
– Can be used when the objective of the survey is to
produce estimates for subpopulations.
– Planned sample size for each domain.
– May have some drawbacks, especially in structural
business surveys (large-scale surveys).



Overall sample size could be too large for survey’s economic constrains.
Sample allocation may be far from the theoretically desired one.
Strata with only few units can lead to higher response burden.
• An alternative: multi-way stratification (see e.g.,
Falorsi and Righi, 2008).
Multi-way stratification
• Multi-way stratified designs
– Controlled selection methods including methods based
on controlled rounding problem via linear programming
– Methods based on sample coordination
• Theoretical and operative problems for largescale surveys can arise with some of these
methods.
• Balanced sampling by the cube method can
overcome these drawbacks.
Subsampling for preliminary
estimates (I)
• In short-term statistics, preliminary estimates are
demanded from the NSIs (EU Regulation).
• A common approach for dealing with them:
– Efficient estimators based on auxiliary information.
– No explicit definition for a sampling design for
preliminary estimates.
– Usually drawn by a non-probabilistic sample design.
• An alternative overall strategy involving sample
design and estimator definition can be found in
the module on preliminary estimates.
Subsampling for preliminary
estimates (II)
• Given a sample survey, a preliminary estimate is
defined on the basis of a sample of quick
respondents. Main strategy:
– A planned subsample for preliminary estimates: PTS – a
preliminary theoretical sample is drawn.
– Aim: Planned Preliminary Observed Sample (PPOS) as
close as possible to PTS.
– Intensive follow-up of the PTS.
• Design-based or model-based approaches for
defining the PTS.
Sample co-ordination (I)
• Sample overlap between surveys: number of
common units at two different sampling
occasions.
• Independent selection: sample overlaps are not
controlled.
• Negative coordination: aims at spreading the
response burden, sample overlap is minimized.
• Positive coordination: for repeated surveys,
sample overlap is maximized.
Sample co-ordination (II)
• Three main dimensions:
– Sample coordination between surveys.
– Sample coordination over time for the same survey.
– Sample coordination of surveys based on different unit
types.
• Two main types of methods:
– Methods based on PRNs (used by most NSIs).
– Methods based on linear programming (non-PRN
methods) – optimal solution, computationally intensive.
Co-ordination between surveys
• Positive coordination:
– Can facilitate the comparisons between variables of
interest on the micro level.
– Can facilitate the production of comparable and coherent
statistics required by the National Accounts for compiling
the GDP using results drom different economic surveys.
• Negative coordination:
– Depends on the size of the sampling fractions in the
different surveys.
– Very effective mainly for small businesses.
– .
Co-ordination over time
• Panel: a sample measured repeatedly in time (a
period could be a week, a month, a quarter or a
year).
• Positive coordination over time:
– Used to obtain high precision in estimates of change.
– The size of the overlap is random.
– It depends mainly on the sampling design and changes
in the business population.
• Sample rotation: a tool for spreading the
response burden.
Co-ordination of surveys based on
different unit types (I)
• This kind of coordination is used in Australia,
France and Sweden (PRN-methods).
• The business register (BR) generally consists of
different unit types.
• Each business survey uses a unit type in
accordance with the statistics to be produced.
• PRNs should be assigned to each unit type.
Co-ordination of surveys based on
different unit types (II)
Methods for assigning the PRNs:
– PRNs are assigned to each unit type separately.


Advantage: a simple method, samples are independent of each other.
Disadvantage: does not admit co-ordination between surveys using
different unit types.
– PRNs are assigned so that co-ordination of unit types
through their PRNs is possible.



Works well for single-location and single-activity businesses where each
unit in a business receives the same PRN.
For multiple-location and/or multiple-activity businesses: less efficient.
Top-down or bottom-up approach to assign the PRNs (see Lindblom, 2003).
Method: Sample co-ordination using
SRS with PRNs (I)
• The Swedish system for co-ordination of business
samples (SAMU) is based on sequential simple
random sampling without replacement(SRSWOR).
• Sequential SRS (SRSWOR):
– Consider a population U of size N (may be a stratum).
– Each unit is assigned a PRN uniformly distributed over
the interval [0,1].
– Units are sorted in ascending order of their PRNs.
– The first n units in the sorted list are selected in the
sample.
Method: Sample co-ordination using
SRS with PRNs (II)
• Due to the symmetry of the uniform distribution:
– the selection of the last n units in the sorted list also
gives a sequential srswor,
– the selection of the first n units to the left or to the right
of a given point a in [0,1] also yields a srswor (wraparound if not enough units).
• Dynamic population
– New businesses in the frame (births) receive a new PRN.
– Closed-down businesses (deaths) are withdrawn from
the frame.
Method: Sample co-ordination using
SRS with PRNs (III)
• Positive co-ordination
– Over time: on each occasion a new sequential srswor is
drawn from the updated frame (same starting point).
– Of two surveys: same starting point and direction are
used for both surveys.
• Negative co-ordination
– For two surveys: we must choose properly the starting
points and directions, e.g. different starting points and
the same direction.
Method: Sample co-ordination using
SRS with PRNs (IV)
• SAMU allows for positive or negative coordination
when different stratifications are used.
• SAMU has implemented a system of rotation of
samples :
– Each unit in the frame is randomly designated to one of
five rotation groups.
– Random numbers are shifted only in one rotation group
each year (RRC method of Ohlsson, 1992).
Method: Sample co-ordination using
SRS with PRNs (V)
• A somewhat different method is used in France
(Cotton & Hesse, 1992):
– Each unit in the frame receives a uniform random
number in [0,1].
– Units are ordered in ascending order of their RNs.
– A sequential srswor of size n is drawn in the ordered list.
– Negative co-ordination is obtained by permuting the
random numbers so that selected units receive the
largest RNs and non-selected – the smallest. The rank of
the RNs should be respected.
Method: Sample co-ordination using
SRS with PRNs (VI)
• The Cotton & Hesse method:
– Can be used only for negative co-ordination.
– Is based on permutation of the RNs.
– Allows the use of different stratifications when coordinating stratified samples.
– A minimum of the expected overlap between two
successive stratified samples is guaranteed.
– Can be used to co-ordinate sampling units of different
types, e.g. enterprises and establishments.
Method: Sample co-ordination using
Poisson sampling with PRNs (I)
• Implemented at SFSO (Qualité, 2009).
• Extension of the method of Brewer et al. (1972).
• Algorithm:
– For each survey, one defines for each unit a zone of
selection (can be a union of disjoint intervals).
– The total length of the zone of selection corresponds to
the inclusion probability for that unit.
– A unit is selected if its PRN falls within its zone of
selection.
Method: Sample co-ordination using
Poisson sampling with PRNs (II)
• Advantages:
– Theoretically simple and easy to implement.
– Dynamic populations are easily handled.
• Disadvantages:
– The random sample size.
– Previously, at SFSO stratified sampling was used.
– Optimal allocation procedures not need to be modified,
except for small sampling strata because of the risk of
selecting an empty sample.
Example of co-ordination (I)
• We consider the selection of a unit in 6 samples
(PRN equal to 0.42). We have:
– The inclusion probability (pi).
– The desired types of coordination : negative (N) or
positive (P).
– Two panels: samples 1, 3 and 6 are three waves of
panel 1 and samples 2 and 5 are two waves of panel 2.
– Sample 4 is for a survey conducted only once.
– Positive coordination in a panel has a higher priority
than negative coordination with the other samples.
Example of co-ordination (II)
Inclusion probabilities and types of coordination
Coordination
sample
Sample pi
Panel Wave 1
2
3
4
1
0.30 1
1
2
0.20 2
1
N
3
0.40 1
2
P
N
4
0.20
N
N
N
5
0.30 2
2
N
P
N
N
6
0.45 1
3
P
N
P
N
with
5
N
3
2
1
0
Sample
4
5
6
Selection zones
0.0
0.2
0.4
0.6
Zone of selection
0.8
1.0
Discussion
• Sample design and selection:
– The sample design determines a survey’s characteristics
such as cost, variance and respondent burden.
• Sample co-ordination:
– An important tool for spreading the response burden.
– Higher precision in estimates over time.
– A co-ordination system provides a common sampling
frame for all surveys.
• Sample rotation:
– Reducing response burden in periodic surveys.
References (I)
•
•
•
•
•
•
•
•
Brewer, K., Early, L., and Joyce, S. (1972). Selecting several samples
from a single population, Australian Journal of Statistics, 3:231--239.
Cochran, W.G. (1977). Sampling Techniques, Wiley, New York.
Cotton, F. and Hesse, C. (1992b). Tirages coordonnés d'échantillons,
Technical report, INSEE, Paris.
Deville, J.-C. and Tillé, Y. (2004). Efficient balanced sampling: the cube
method, Biometrika, 91:893--912.
Falorsi P. D, Righi P. (2008). A Balanced Sampling Approach for Multi-way
Stratification Designs for Small Area Estimation, Survey Methodology, 34,
223-234.
Hartley, H. and Rao, J. (1962). Sampling with unequal probabilities and
without replacement. Annals of Mathematical Statistics, 33:350--374.
Hesse, C. (1999). Sampling co-ordination: A review by country. Technical
Report E9908, Direction des Statistique d'Entreprises, INSEE, Paris.
References (II)
•
•
•
•
•
Knaub, J.R., Jr. (2008), Cutoff Sampling, In Encyclopedia of Survey
Research Methods (red. P.J. Lavrakas), Sage, London.
Lindblom, A. (2003). SAMU - The system for coordination of frame
populations and samples from the Business Register at Statistics Sweden,
Background Facts on Economic Statistics 2003:3, Statistics Sweden.
Ohlsson, E. (1992). The system for co-ordination of samples from the
business register at Statistics Sweden. R&D report 1992:18, Statistics
Sweden.
Qualité, L. (2009). Unequal probability sampling and repeated surveys.
Ph.D. thesis, University of Neuchâtel, Switzerland
(http://doc.rero.ch/record/12284).
Sigman, R. S. and Monsour, N. J. (1995). Selecting Samples from List
Frames of Businesses, In Cox, B. G. et al., editors, Business Survey
Methods, chapter 8, pages 133—152, Wiley. inc., New York, USA.
Download