Data sources and data compilation methods

advertisement
Data sources and data compilation
methods
Workshop for African countries on the Implementation of International
Recommendations for Distributive Trade Statistics
27-30 May 2008, Addis Ababa, Ethiopia
UNITED NATIONS STATISTICS DIVISION
Trade Statistics Branch
Distributive Trade Statistics Section
Outline of the presentation

Data sources for DTS – statistical
surveys, administrative data sources
and frames

Data compilation methods

Data collection strategy
Data sources for compilation of DTS
Generation of DTS - based on data
collected from numerous sources



Statistical data sources – data are collected
specifically for statistical purposes
Administrative data sources - provide data
created originally for purposes other than
the production of statistical data
Statistical data sources
Statistical surveys



Economic censuses – enumeration of all units in the
population; basis for the establishment of BR; sampling frame
for surveys
Sample surveys – collect responses from a few representative
units scientifically selected from the population
Advantages of statistical surveys vs. administrative
data sources



Planning, execution, data collection and processing
procedures are under the control of the statistical office itself
Respondents have less reason to deliberately misreport the
data as the NSO guarantees confidentiality
Disadvantages





Resource intensity (both financial and manpower)
Increase respondents burden
High non-response rates
Sampling errors
Census of trade units (1)
Types



Part of an economy-wide census, including all economic
activities
Independent census for distributive trade sector/activities only
Advantages



Tend to provide a complete enumeration of units engaged
with trade activity, including units of informal sector at a
particular point of time
Allow collection of DTS in great details that are required at
longer intervals of time
Disadvantages



Limited in terms of data content
Census planning and organization and the subsequent
transformation of census’s basic data into DTS



Time consuming and resource intensive exercise
Costly, imposes a high burden on respondents
Response rates may be reduced thus affecting quality of collected
information
Census of trade units (2)
Recommendations


Conduct of a complete census of trade units is
recommended when:




A particular country does not maintain an up-to-date
business register
There is a significant user interest for detailed statistical
data by geographical area
Censuses should be followed as closely as possible
by periodic (annual, quarterly or monthly) sample
surveys
Censuses of trade units should not be conducted if
there are other ways of collecting and producing
distributive trade statistics of highly enough quality
Sample surveys of trade units (1)
Technique for obtaining data about a large
population of statistical units by selecting
and measuring a limited number of units
(sample) from that population





Conclusions about the total population of units
are made on the basis of the estimates obtained
from the sample
Scientific sample designs should be applied
in order to reduce the risk of a distorted
view of the population
Sample survey technique is a less costly way
of data collection as compared to the census
It may be used in conjunction with a cut-off
point or not
Sample surveys of trade units (2)
Wholesale and retail trade sample
surveys



Rarely restricted to one standard form
Tend to be a combination of forms,
differentiated by periodicity and major
characteristics of trade units


activity, size, legal form, type of operation
and the type of variables
occasionally an extra characteristic, such as
the geographical location of the unit, may
influence the contents of a sample survey
Sample surveys of trade units (3)
Size thresholds



Size of units plays an important role in determining the target
population and, where relevant, the sample population of units
Most of the sample surveys are conducted for units above a certain
size threshold
Reasons for using the threshold




Desire to limit the size of the survey
Reduce the response burden on businesses
Take account of the problems of maintaining registers for smaller
units
Appropriate size threshold




No international recommendation
Decision is left to the judgment of each NSO
May vary between surveys for different trade activities and
periodicity
Countries are encouraged to:



Make periodic assessments of the under coverage of the surveys
due to the thresholds
Include a description of such thresholds in country’s metadata
Types of DTS surveys (1)
Enterprise surveys



Sampling units comprise trade enterprises (or statistical units
belonging to these enterprises)
Assume availability of a sampling frame of trade units


List-based frame – BR or census list
Area-based frame – a sample of areas is selected first, then the
enterprises in it are enumerated
Recommendations


For surveys of distributive trade enterprises, the list-based
enterprise surveys should be generally preferred to areabased surveys



List-based survey is more efficient from a sampling perspective
in terms of sample size and maintenance of the list
Area based sampling is inappropriate for large or medium sized
enterprises that operate in several areas
Area-based enterprise survey approach to be used for
collection of data from small trade enterprises operating in
informal or unorganized segment of the economy
Types of DTS surveys (2)
Household surveys



Households are the sampled, reporting and
observation units – ensures coverage of
production by households enterprises that are too
small to be recorded
Disadvantages of household surveys


Sample is not designed to provide a representative
coverage of trade activities, but on the distribution of
households
Distributions of households and trade activities are
different, as trade activities tend to be concentrated in
commercial and market zones
Recommendations


For coverage of unincorporated household
enterprises which are not recognized as legal
entities separately from their owners
Types of DTS surveys (3)
Mixed household-enterprise surveys





A sample of households is selected and each household is
asked whether any of its members own and operate an
unincorporated enterprise
The list of enterprises thus compiled is used as the basis
for selecting the enterprises from which desired data are
finally collected
In contrast to household surveys they collect information
about enterprises per se, not about the persons in a
household, including their contribution to the enterprises
Disadvantages


Inefficiency of the sample design
Difficulties of handling enterprises with production units in
more than one location
Recommendations


Preferred to household surveys or area-based enterprise
surveys approach for collecting data and estimating the
output of small trade units that are excluded from listbased enterprise surveys
Administrative data sources (ADS) (1)




Set up in response to legislation and/or
regulation
Each regulation results in a register of the
units
Countries should use ADS for statistical
purposes with caution
Privately controlled ADS



Data obtained from private sector data suppliers
Transfer of data from them to NSOs takes the form
of a contract with a payment of a fee
Recommendations

Compilers of DTS should identify and review the
available ADS in their countries and use the most
appropriate of them for compiling DTS
Administrative data sources (2)
Advantages






Complete coverage of units and perceived as low non-response
Avoidance of response burden
Cheaper for NSOs to acquire data from an ADS than to conduct
a survey
Suitable for covering the smallest segment of units population
which contributes relatively little to the estimates but makes up a
substantial percentage of the number of units in the population
Smaller sampling errors than in survey, better accuracy
Disadvantages







Discrepancy between administrative and statistical concepts
Poor integration with other data of the statistical system
Risks with respect to stability
The level of scrutiny to variables that are of statistical interest
may not be satisfactory
Data may become available with unacceptable delay
Legal constraints with respect to access and confidentiality
Business register

Business register (BR) - recommended as the most
appropriate source for deriving sample frame for
distributive trade surveys



Organization and conduct of any enterprise survey of
distributive trade units assumes availability of an adequate
sampling frame
Sampling frame - set of units subject to sampling together with
the details about them that will be used for stratification,
sampling and contact purposes
Statistical business register



Comprehensive list of all enterprises and other units together
with their characteristics that are active in a national economy
A tool for the conduct of statistical surveys as well as a source
for statistics in its own right
Operationalises the selected model of statistical units and
facilitates classification of units according to the agreed
conceptual standards for all surveys
Statistical business register (1)
Establishment


Available administrative registers - starting point for
the establishment of Statistical BR



If only one administrative register is used, the resulting
Statistical BR would likely to be deficient in terms of
coverage and content and would not provide an adequate
sampling frame for subsequent statistical surveys
Countries are encouraged to work towards improvement of
the coverage and content of their Statistical BR by
incorporating data from several administrative sources
Need of a single business number for all enterprises
Maintenance



Should be up-to-date and with satisfactory quality
Should be regularly maintained and updated to take
note of the changes in the enterprise dynamics
Statistical business register (2)
Sources for the establishment and maintenance of
Statistical BR






Economic census - provide the most comprehensive list of
units and links between them in a given country
Administrative data sources - VAT tax and payroll tax
systems, records maintained by the governments for the
administration of unemployment insurance, social security
or other programmes
Feedback from enterprise surveys - provide new information
on contact address changes, closure of business, change in
the economic activity of the unit, etc.
Business register surveys - profiling of enterprises
Other potential sources - information from trade
associations, telephone directories or special listings
prepared by telephone companies, etc.
Profiling of enterprises
Group of Enterprises
- holding company
One establishment
enterprise
Local Unit
establishment
Holding enterprise/
establishment serving
mainly as control
investment unit
Local Unit 1
establishment
Multi-establishment
enterprise
Local Unit 2
establishment
Local Unit 3
ancillary
establishment
Data compilation methods
Process of data compilation





Comprises more than just aggregating the
questionnaire items
Statistical offices perform a number of checks,
validation and statistical procedures to bring the
collected data to the level of the intended
statistical output
DTS respondents - prone to commit errors
while completing a statistical questionnaire
DTS data collected trough statistical surveys
- affected by response and non-response
errors of different kinds
Data validation and editing (1)
Integral part of all types of statistical surveys data
processing operations
Needed to solve problems of missing, invalid or
inconsistent responses
Editing





Systematic examination of collected data for the purpose
of identifying and eventually modifying the inadmissible,
inconsistent and highly questionable or improbable values,
according to predetermined rules
Essential process for assuring quality of the collected
information
Types of editing



Micro editing (input editing) - focuses on the editing of an
individual record or a questionnaire
Macro editing (output editing) – checks are performed on
aggregated data
Data validation and editing (2)
Selective editing




Approach for prioritizing and further reducing
costs of editing
Targets only those of the micro data items or
records that would have a significant impact on
the distributive trade surveys results
Recommended for editing of distributive trade
data
Influential observations



Particular data item responses that have most
significant impact upon the main estimates
Editing efforts should be focused on them
Data validation and editing (3)
Edit checks for detecting errors in
distributive trade data





Routine checks - test whether all
questions have been answered
Validation checks - test whether answers
are permissible
Rational checks - set of checks based on
the statistical analysis of respondent data
Plausibility checks – used to pick up large
random errors
Imputations (1)
Missing data



Encountered in most of the trade surveys
Create problems for data editing
Types of missing data



Item non-response - data for a particular data
item of the questionnaire is missing
Unit non-response - selected unit has not
returned the filled-in questionnaire
Techniques for dealing with missing data


Imputations

Re-weighting
Imputations (2)
Replace one or more erroneous responses
or non-responses in a record with plausible
and internally consistent values
Process of filling gaps and eliminating
inconsistencies
Means of producing a complete and
consistent file containing imputed data





Used mainly for estimating missing data in case
of item non-response
Substitution - used in the case of unit nonresponse


Data from previous available periods of that unit
Data available for that unit from administrative
information
Imputations (3)
Commonly used imputation methods










Subjective treatment
Mean/modal value imputation
Post stratification
Substitution
Cold deck - makes use of a fixed set of values,
which covers all of the data items
Hot deck - replaces each missing value by the
available value from a 'donor', i.e. a similar
participant in the same survey
Nearest-neighbour imputation or distance
function matching
Sequential hot deck imputation
Regression (model based) imputation
Item non-response
Strategies for dealing with item nonresponse




The analysis is confined to the fully
completed forms only as all forms with
missing values are ignored
Not recommended because even the
valid data contained in the partially
complete forms are discarded
Missing data are imputed so that the
data matrix is complete
Unit non-response
Non-response may occur due to:







Non existence of the unit included in the survey
Lack of appreciation of the importance of the data on part of the
respondents
Refusal to respond
Lack of knowledge how to respond
Lack of resources
Non-availability of the desired information
Ways to minimize the non-response




Increase the awareness among respondents about importance of
surveys
Appeal to the respondents to cooperate with the statistical
authorities
Reminders to the non-respondents and resorting to the
enforcement measures laid down in the national legislation
Strategies for dealing with unit non-response



Re-weighting - the sample is re-weighted as to include only the
responding sample units
Various forms of imputations – similarly to those used for item
non-response
Data collection strategy (1)
DTS surveys and/or administrative data sources should
cover all units in the economy engaged in economic
activities within the scope of the distributive trade
sector (Section G of ISIC, Rev.4)


Units of all sizes and types including corporations and
unincorporated (household) units
Data collection strategy


NSOs should develop their own data collection strategy



Ensures a complete coverage of distributive trade activity
Based on an integrated approach covering in principle all
trade units across all class sizes enterprises
Commensurate with their specific statistical and
organizational circumstances
Data collection strategy (2)
Public incorporated enterprises

A directory of such units is available in most of the cases
To be covered on a complete enumeration basis


Private and foreign controlled incorporated
enterprises

Large-scale units


To be covered on a complete enumeration basis if possible
Others



Tend to be significant in numbers but relatively homogenous
To be covered through a sample survey
Small enterprises



Sample surveys - if these are on the statistical BR or
through the use of administrative data (tax returns of small
enterprises)
Fully Integrated Rational Survey Technique (FIRST) - if the
register of unincorporated enterprises is not available
Data collection strategy (3)
Total population of units
engaged in trade
activities
In the Business
Register
(List-frame segment)
Large units
Not in the Business
Register
(Non-list-frame segment)
With fixed
premises
Small units
Public Sector
Private sector
Should be covered
on a complete
enumeration basis
Segment 1:
Large units should be
covered on a complete
enumeration basis
Segment 2:
Remaining units should
be covered through
sample survey
Covered either
through sample
surveys
or
Administrative
Data sources
Without fixed
premises
1
Area frame
2
Should be covered through
sample surveys
FIRST (1)
Survey programme that efficiently capture
comprehensive statistical information from all
distributive trade enterprises operating in an
economy
Application



Requires two basic statistical sets of information

Census enumeration, preferably an economic census - to
establish the complete statistical population of units for
construction of sampling frame and sample selection



Population census – alternative in the absence of economic
census
Supporting documentation on sample areas/enumeration
blocks for the benchmark enumeration
Divides the units into two segments


List-frame segment – comprises of relatively small number of
large units
Non-list-frame segment – includes all remaining units that can
be covered only by an (geographical) area frame approach
FIRST (2)
List-frame segment



Population of units tends to be very heterogeneous in its
size and characteristics
Surveys are drawn from a BR or a directory of units
Non-list-frame segment



First stage - a sample of area units is selected
Second stage - a list of all establishments operating in
each of the selected in the first stage unit is identified



Establishments falling in the scope of DTS are classified by
kind-of-activity
Sample of units is drawn from the listed establishments
Mobile units

All identifiable establishments outside the owners’ home
located in the selected area unit as well as household-based
enterprises located within home - listed by a house-to-house
visit
Distributive trade surveys

Annual surveys


Should provide estimates that cover all wholesale and retail trade
establishments
Comprehensive surveys are not always necessary




Infra-annual surveys (quarterly or monthly)



All establishments above a given cut-off point may be completely
enumerated, while the others may be sampled
All units may receive a survey form, but an abbreviated version may
be used for the small establishments
Estimates for the small establishments may be made from
administrative data or from other statistical inquiries such as mixed
household-enterprise surveys
More restricted coverage than annual surveys
Small establishments - coverage is subject to their significance
and availability of reliable administrative data source
Infrequent surveys (5-10 years)


Used for collection of data on specialised topics or in greater
details
Not appropriate for the purpose of collecting and compiling
structural type DTS
Reference period
Reference period for annual surveys




Data compiled in annual surveys should relate to a 12-month
period
12-month period should preferably be the calendar year
Other options

Data are more readily available on a different fiscal-year basis
for some establishments


Some data items (wages and salaries) have to be collected on both
a fiscal-year and calendar-year basis to facilitate building up
calendar year aggregates
Data for all establishments are available on a fiscal year basis
which become the normal accounting period
Reference period for infra-annual surveys




Corresponding calendar month/quarter - recommended as
the reference period for infra-annual surveys
Other options
Some establishments work in quarterly periods of four, four
and five weeks

NSO should make every efforts to standardize the information
provided in the monthly returns by some estimation procedures
Thank You
Download