NTTS 2009 New Techniques and Technologies for Statistics 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Speaker: Viviana De Giorgi (Istat) 1 Check, edit and imputation of the variable turnover of the Italian Business Register Contents Keywords Background and motivations The procedure Conclusions 2 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Keywords administrative sources outliers detection data editing 3 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Keywords ASIA (Statistical Archive of Active Enterprises) Italian Business Register. Vat returns Administrative source of annual statements of VAT payments, provided by the Italian Tax Authority. Sector Studies Administrative source of the annual survey on business economic sectors carried out by the Tax Authority. Istat Business Surveys 1) Annual Survey on Large Enterprises Accounts System (total) 2) Annual Survey on Small and Medium-sized Enterprises (sample) 4 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Background A new need to improve the quality of the business register In 2005, working on the data of the taxation year 2003, a new need arises: making the variable turnover available for dissemination. In fact, until then, data from the VAT returns had been used for the ASIA units, without any processing, only for internal use. The study implies: • analysis of the available sources and data, • analysis of the available category of enterprises, including missing values analyses • analysis of the method for imputing missing and non-correct values Missing, zero and non-correct values are to be checked and imputed in order to obtain comparable and consistent data. 5 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Background Available procedures for editing and imputation of numerical data Automatic procedures for identifying invalid data, such as the Hidiroglou-Berthelot algorithm, which had not given good results in the case of the ASIA turnover. Software for editing and imputation of numerical data, such as the BANFF Sas application. Its implementation requires edits on the basis of numerical variables and in the case of ASIA we have only few variables to build effective rules for the turnover. An ad hoc procedure, which can better fit data and is the solution here proposed. 6 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Available sources in Istat Background 1. Sector Studies 2. Annual VAT returns 3. Istat Business Surveys 7 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Background Sector Studies Instituted by the Tax Authority in 1996 in order to evaluate the capacity of the enterprises to produce income and knowing whether they pay taxes, they are a source of great interest as they gather a lot information on about 3.5 million enterprises. The subjects liable to Sector Studies are industrial commercial and manufacturing income owners and VAT number proprietors, with some exclusion principles (e.g. the turnover threshold and the activity type and time). Financial information retrievable from the Sector Studies: • • • • • • Volume of business Proceeds Agios and Income from the Sales of Goods Subject to Fixed Revenue Other Proceeds Other Positive Income Components Other Operations that Produce Income 8 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Background Annual VAT returns Only the variable called Volume of Business is available. The definition is the same as the Sector Studies one. The VAT statement is declared by (almost) all the Italian enterprises (excluding some activities that are not subject to declaration). Istat Business Surveys These surveys provide for the variable named Proceeds, whose definition is the same as the Sector Studies variable. A part from the survey on large enterprises whose values are integrated in the imputation phase, Istat has only sample data on enterprise turnover, and they are used here mainly in the outcomes assessment phase. Other Istat turnover sample surveys are non taken into consideration. 9 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Background Which source and variable? Notwithstanding both the VAT returns and the Sector Studies provide for turnover time series we have chosen the variable called Volume of Business from the VAT returns to be used for imputing values and the same variable from the Sector Studies to be used as a cross check variable. Why? • The VAT statements are declared by (almost) all Italian Enterprises whereas the Sector Studies are a partial survey. • The same variable is also in the Sector Studies source and it can be used for cross checks. • The VAT returns are less subject to laws and directions than Sector Studies. • From a statistical point of view the VAT returns missing data can be considered MAR. 10 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register The ASIA turnover imputation procedure Characteristics • It is an ad hoc procedure. • It has been running since the taxation period 2004 after an experimental period of one year. • It combines administrative and statistical sources information, to provide non zero values for (almost) all the ASIA units. • It consists of 5 steps, strictly in sequence. The ASIA turnover, available for all the ASIA units, is being disseminated by classes 11 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register The procedure 1. identification of the enterprises set to be checked, imputed and validated (reference set) 2. identification of the existence set of turnover permitted values (turnover range) 3. data editing through 3-year time series consistency (outliers detection) 4. missing zero and outliers imputation 5. data validation assessment) using external indicators (outcomes 12 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Step 1: Reference set The procedure All the following steps work on the enterprises active for at least six months of all economic activity sectors in the Istat business surveys with the exception of: • divisions 65* (Financial intermediation) and 66* (Insurance and pension funding): not in Istat surveys • divisions 85* ( Health and Social Work) and 67* (Activities auxiliary to financial intermediation): not liable to VAT • divisions with very characteristics values few enterprises (Mining): non effective • enterprises that have suffered a merger/acquisition event whit a nonzero turnover value Enterprises excluded will have a value of turnover (including zero and missing values) as declared in the Volume of Business of the VAT statement. *NACE rev. 1.1 13 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Step 2: Turnover range The procedure ASIA previous year turnover bounds are used: the minimum and maximum values by economic activity division of the previous year data serve as a band-pass filter, determining the following sets: • set P (permitted values): contains all the values within the band-pass filter • set NP (non permitted values): consists of all the values outsides the bounds Previous year data (checked and imputed) • are considered correct • are capable of managing the turnover short term variability 14 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register The procedure Step 3: Data editing for outliers detection Use of the Sector Studies current year data as a cross check variable Use of changes in value in the last three years of • turnover • turnover per capita Different rules (bounds) for set P and set NP to calculate consistent and non-consistent changes in value by economic activity groups (3 digits) Use of a three-element binary vector to identify • current year non-correct values (to be imputed) • previous year non-correct values (to be excluded in the imputation step) Missing and zero values are beforehand considered non-correct (outlier) definition from now on: outlier is a zero, missing or non correct value 15 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Step 4: Outliers imputation The procedure (The large enterprises values are automatically replaced by the Istat enterprise survey ones, without any checks. Survey data are supposed to be correct. They are already checked and revised in the survey.) The enterprises with a non correct current year value are divided into: 1. enterprises whose previous year data is non correct: their values are imputed with a mean value imputation 2. enterprises whose previous year is correct: their values are imputed with a previous year imputation 16 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register The procedure Mean value imputation: multiplication of the number of employed persons by the average turnover per capita of the group the enterprise belongs (all data refer to the current year). it k tpckxy ek Previous year imputation: multiplication of the number of employed persons by the sum of the previous year turnover per capita and the previous year turnover per capita multiplied by the change in value between the last two years. it k (tpck 1 tpck 1 tpc xy ) ek 1) 2) 3) 4) 5) itk is the imputed turnover in the current year tpcxyk is the current year turnover per capita of the activity groups (3-digit) or division (2digits) x and the class of employed persons y (group xy) ek is the number of persons an enterprise employs in the year k tk is the turnover an enterprise obtained in the previous year ∆tpcxy is the percentages turnover change in value that occurs in the group xy between the previous and the current years 17 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register The procedure Step 5: Outcomes assessment Comparing 1) Istat surveys and ASIA data 2) ASIA previous and current year data by economic activity and class of persons employed through: 1. by group absolute differences and/or ratios of the number of enterprises 2. by group absolute differences and/or ratios of the number of persons employed 3. by group differences of the amount of turnover between ASIA and the Istat surveys, before and after the imputation process 4. by group amount of imputed turnover and percentages of imputed enterprises, for both all the values and only missing and zero values 5. by group average values, before and after data imputation Index 1 and 2 could be quite different from zero as the enterprise survey universe is an expansion by mean of sample coefficients; index 3, 4 and 5 are useful to understand the data quality improvement. 18 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Conclusions Every year: • about 3% of data are missing or zero values (the percentage decreased drastically in 1999) • about 2% of data are non correct values • only very few enterprises register a previous year non correct value (to not use in the previous year imputation) • few ad hoc studies are necessary, and it depends on the available data and the reference year (e.g. Aeolian power enterprises in the year 2004) The procedure preserves time series consistency and comparisons, punctual data are available for internal uses, e.g. National Accounts and surveys, macro-data dissemination is possible by classes of turnover. 19 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register Future developments As the activities liable to the Sector Studies has been increasing year by year, above all due to the inclusion of more activities and the rising of turnover thresholds, we are thinking about rebuilding the process, substituting the Volume of the business with the Proceeds coming from the Sector Studies, in order to: • make the use of Sector Studies information more effective • Improve integration of the VAT returns and the Sector Studies • make the ASIA turnover more comparable with Istat surveys data • build up an effective imputation for economic activities non liable to VAT 20 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Check, edit and imputation of the variable turnover of the Italian Business Register REFERENCES Bernardi A., Cerroni F., De Giorgi V. (2008). The Tax Authority Source as an example of the use of an administrative source as a statistical one. Contributed Paper of IAOS Conference on Reshaping Official Statistics. Shanghai, 14-16 October 2008. Hidiroglou, M.A., Berthelot, J.-M. (1986). Statistical Editing and Imputation for Periodic Business Surveys. Survey Methodology, vol. 12, n. 1, 73-83. Italian DPR n. 633/72 on VAT returns (Italian). Sector Studies filling in instructions. Document retrievable on the Italian Tax Authority Internet site: http://www.agenziaentrate.gov.it (Italian). Statistics Canada - BANFF Support Team (2005). Functional Description of the Banff System for Edit and Imputation System. Quality Assurance and Generalized Systems Section Technical Report. TUIR Italian consolidated act on income taxes (Italian). VAT returns filling in instructions. Document retrievable on the Italian Tax Authority Internet site: http://www.agenziaentrate.gov.it (Italian). 21 Viviana De Giorgi – NTTS 2009 – Brussels, 18-20 February 2009 Thank you for your attention! for further information: viviana.degiorgi@istat.it 22