Uploaded by viviana de giorgi

Check, edit and imputation of the variable turnover of the Italian Business Register

NTTS 2009
New Techniques and Technologies for Statistics
18-20 February 2009
Check, edit and imputation of the variable
turnover of the Italian Business Register
Speaker: Viviana De Giorgi (Istat)
1
Check, edit and imputation of the variable turnover of
the Italian Business Register
Contents
Keywords
Background and motivations
The procedure
Conclusions
2
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Keywords
administrative sources
outliers detection
data editing
3
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Keywords
ASIA
(Statistical Archive of Active Enterprises) Italian Business
Register.
Vat returns
Administrative source of annual statements of VAT payments,
provided by the Italian Tax Authority.
Sector Studies
Administrative source of the annual survey on business
economic sectors carried out by the Tax Authority.
Istat Business Surveys
1) Annual Survey on Large Enterprises Accounts System
(total)
2) Annual Survey on Small and Medium-sized Enterprises
(sample)
4
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Background
A new need to improve the quality of the business register
In 2005, working on the data of the taxation year 2003, a new need
arises: making the variable turnover available for dissemination.
In fact, until then, data from the VAT returns had been used for
the ASIA units, without any processing, only for internal use.
The study implies:
• analysis of the available sources and data,
• analysis of the available category of enterprises, including
missing values analyses
• analysis of the method for imputing missing and non-correct
values
Missing, zero and non-correct values are to be checked and
imputed in order to obtain comparable and consistent data.
5
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Background
Available procedures for editing and imputation of
numerical data
Automatic procedures for identifying invalid data, such as the
Hidiroglou-Berthelot algorithm, which had not given good
results in the case of the ASIA turnover.
Software for editing and imputation of numerical data, such as
the BANFF Sas application. Its implementation requires edits
on the basis of numerical variables and in the case of ASIA we
have only few variables to build effective rules for the turnover.
An ad hoc procedure, which can better fit data and is the
solution here proposed.
6
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Available sources in Istat
Background
1. Sector Studies
2. Annual VAT returns
3. Istat Business Surveys
7
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Background
Sector Studies
Instituted by the Tax Authority in 1996 in order to evaluate the
capacity of the enterprises to produce income and knowing
whether they pay taxes, they are a source of great interest as
they gather a lot information on about 3.5 million enterprises. The
subjects liable to Sector Studies are industrial commercial and
manufacturing income owners and VAT number proprietors, with
some exclusion principles (e.g. the turnover threshold and the
activity type and time).
Financial information retrievable from the Sector Studies:
•
•
•
•
•
•
Volume of business
Proceeds
Agios and Income from the Sales of Goods Subject to Fixed Revenue
Other Proceeds
Other Positive Income Components
Other Operations that Produce Income
8
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Background
Annual VAT returns
Only the variable called Volume of Business is available. The
definition is the same as the Sector Studies one. The VAT
statement is declared by (almost) all the Italian enterprises
(excluding some activities that are not subject to declaration).
Istat Business Surveys
These surveys provide for the variable named Proceeds, whose
definition is the same as the Sector Studies variable. A part from
the survey on large enterprises whose values are integrated in
the imputation phase, Istat has only sample data on enterprise
turnover, and they are used here mainly in the outcomes
assessment phase.
Other Istat turnover sample surveys are non taken into
consideration.
9
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Background
Which source and variable?
Notwithstanding both the VAT returns and the Sector Studies
provide for turnover time series we have chosen the variable
called Volume of Business from the VAT returns to be used for
imputing values and the same variable from the Sector Studies to
be used as a cross check variable.
Why?
• The VAT statements are declared by (almost) all Italian
Enterprises whereas the Sector Studies are a partial survey.
• The same variable is also in the Sector Studies source and it
can be used for cross checks.
• The VAT returns are less subject to laws and directions than
Sector Studies.
• From a statistical point of view the VAT returns missing data
can be considered MAR.
10
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
The ASIA turnover imputation procedure
Characteristics
• It is an ad hoc procedure.
• It has been running since the taxation period 2004 after an
experimental period of one year.
• It combines administrative and statistical sources
information, to provide non zero values for (almost) all the
ASIA units.
• It consists of 5 steps, strictly in sequence.
The ASIA turnover, available for all the ASIA units, is being
disseminated by classes
11
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
The procedure
1. identification of the enterprises set to be checked, imputed
and validated (reference set)
2. identification of the existence set of turnover permitted
values (turnover range)
3. data editing through 3-year time series consistency (outliers
detection)
4. missing zero and outliers imputation
5. data validation
assessment)
using
external
indicators
(outcomes
12
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Step 1: Reference set
The procedure
All the following steps work on the enterprises active for at least
six months of all economic activity sectors in the Istat business
surveys with the exception of:
• divisions 65* (Financial intermediation) and 66* (Insurance and
pension funding): not in Istat surveys
• divisions 85* ( Health and Social Work) and 67* (Activities auxiliary to
financial intermediation): not liable to VAT
• divisions with very
characteristics values
few
enterprises
(Mining):
non
effective
• enterprises that have suffered a merger/acquisition event whit a nonzero turnover value
Enterprises excluded will have a value of turnover (including zero
and missing values) as declared in the Volume of Business of the
VAT statement.
*NACE rev. 1.1
13
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Step 2: Turnover range
The procedure
ASIA previous year turnover bounds are used: the minimum and
maximum values by economic activity division of the previous
year data serve as a band-pass filter, determining the following
sets:
• set P (permitted values): contains all the values within the band-pass
filter
• set NP (non permitted values): consists of all the values outsides the
bounds
Previous year data (checked and imputed)
• are considered correct
• are capable of managing the turnover short term variability
14
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
The procedure
Step 3: Data editing for outliers detection
Use of the Sector Studies current year data as a cross check
variable
Use of changes in value in the last three years of
• turnover
• turnover per capita
Different rules (bounds) for set P and set NP to calculate
consistent and non-consistent changes in value by economic
activity groups (3 digits)
Use of a three-element binary vector to identify
• current year non-correct values (to be imputed)
• previous year non-correct values (to be excluded in the imputation
step)
Missing and zero values are beforehand considered non-correct
(outlier)
definition from now on:
outlier is a zero, missing or non correct value
15
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Step 4: Outliers imputation
The procedure
(The large enterprises values are automatically replaced by the Istat
enterprise survey ones, without any checks. Survey data are supposed to
be correct. They are already checked and revised in the survey.)
The enterprises with a non correct current year value are divided
into:
1. enterprises whose previous year data is non correct: their
values are imputed with a mean value imputation
2. enterprises whose previous year is correct: their values are
imputed with a previous year imputation
16
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
The procedure Mean value imputation: multiplication of the number of employed
persons by the average turnover per capita of the group the
enterprise belongs (all data refer to the current year).
it k  tpckxy  ek
Previous year imputation: multiplication of the number of
employed persons by the sum of the previous year turnover per
capita and the previous year turnover per capita multiplied by
the change in value between the last two years.
it k  (tpck 1  tpck 1  tpc xy )  ek
1)
2)
3)
4)
5)
itk is the imputed turnover in the current year
tpcxyk is the current year turnover per capita of the activity groups (3-digit) or division (2digits) x and the class of employed persons y (group xy)
ek is the number of persons an enterprise employs in the year k
tk is the turnover an enterprise obtained in the previous year
∆tpcxy is the percentages turnover change in value that occurs in the group xy between
the previous and the current years
17
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
The procedure
Step 5: Outcomes assessment
Comparing 1) Istat surveys and ASIA data
2) ASIA previous and current year data
by economic activity and class of persons employed through:
1. by group absolute differences and/or ratios of the number of
enterprises
2. by group absolute differences and/or ratios of the number of
persons employed
3. by group differences of the amount of turnover between ASIA and
the Istat surveys, before and after the imputation process
4. by group amount of imputed turnover and percentages of imputed
enterprises, for both all the values and only missing and zero
values
5. by group average values, before and after data imputation
Index 1 and 2 could be quite different from zero as the enterprise
survey universe is an expansion by mean of sample coefficients;
index 3, 4 and 5 are useful to understand the data quality
improvement.
18
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Conclusions
Every year:
• about 3% of data are missing or zero values (the percentage
decreased drastically in 1999)
• about 2% of data are non correct values
• only very few enterprises register a previous year non correct value
(to not use in the previous year imputation)
• few ad hoc studies are necessary, and it depends on the available
data and the reference year (e.g. Aeolian power enterprises in the year
2004)
The procedure preserves time series consistency and
comparisons, punctual data are available for internal uses, e.g.
National Accounts and surveys, macro-data dissemination is
possible by classes of turnover.
19
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
Future developments
As the activities liable to the Sector Studies has been increasing
year by year, above all due to the inclusion of more activities and
the rising of turnover thresholds, we are thinking about
rebuilding the process, substituting the Volume of the business
with the Proceeds coming from the Sector Studies, in order to:
• make the use of Sector Studies information more effective
• Improve integration of the VAT returns and the Sector Studies
• make the ASIA turnover more comparable with Istat surveys
data
• build up an effective imputation for economic activities non
liable to VAT
20
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Check, edit and imputation of the variable turnover of
the Italian Business Register
REFERENCES
Bernardi A., Cerroni F., De Giorgi V. (2008). The Tax Authority Source as an example
of the use of an administrative source as a statistical one. Contributed Paper of
IAOS Conference on Reshaping Official Statistics. Shanghai, 14-16 October 2008.
Hidiroglou, M.A., Berthelot, J.-M. (1986). Statistical Editing and Imputation for
Periodic Business Surveys. Survey Methodology, vol. 12, n. 1, 73-83.
Italian DPR n. 633/72 on VAT returns (Italian).
Sector Studies filling in instructions. Document retrievable on the Italian Tax
Authority Internet site: http://www.agenziaentrate.gov.it (Italian).
Statistics Canada - BANFF Support Team (2005). Functional Description of the Banff
System for Edit and Imputation System. Quality Assurance and Generalized
Systems Section Technical Report.
TUIR Italian consolidated act on income taxes (Italian).
VAT returns filling in instructions. Document retrievable on the Italian Tax Authority
Internet site: http://www.agenziaentrate.gov.it (Italian).
21
Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009
Thank you for your attention!
for further information: viviana.degiorgi@istat.it
22