Hollenbeck_Johnson_Ludlum

advertisement
Improving the Quality of Tax Statistics:
Recent Innovations in Editing and
Imputation Techniques at the Statistics of
Income Division of the U.S. Internal
Revenue Service
Scott Hollenbeck – Scott.M.Hollenbeck@irs.gov
Barry Johnson – Barry.W.Johnson@irs.gov
Melissa Ludlum – Melissa.R.Ludlum@irs.gov
Today’s Presentation
 Overview of Statistics of Income (SOI)
 Dealing with Missing Data
 Recent Innovations
 Future Plans
What Does SOI Do?
 Primary source of U.S. tax data
 Data from 110 tax returns and information documents
 Test and correct data collected during administrative
processing (IRS Masterfile)
 Collect extensive additional data from forms, schedules
and attachments
 Most projects collect data from samples
 Products



Micro data files for U.S. Treasury Department & Congress
Public-use files
Tables and analysis (www.irs.gov/taxstats)
SOI Data Collection Systems
 Maintains computer network separate from
main IRS processing
 Data collection takes place in IRS
Submissions Processing Centers
 Graphical User Interface (GUI) systems
based in ORACLE
 Data tested for internal consistency
 Post-edit processing overseen by
headquarters’ staff
Three Major SOI Programs
 Individual Income Tax


Filed by individuals and married couples to report most
forms of personal income
133 million returns filed in 2006
 Corporation Income Tax


Filed by incorporated businesses to report income from
parent corporation and subsidiaries
2.5 million returns filed in 2006
 Tax-exempt Organizations


Annual information returns report assets, income,
expenses
833,000 returns filed in 2006
Missing Data – Unit Nonresponse
 Causes


Extensions/late-filed returns
Tax evasion
 Strategies


Update values from prior year using survey
responses
Utilize records for recent prior years filed
during the selection period
Missing Data – Item Nonresponse
 Causes


Taxpayer neglects to provide attachments
Paper return is being used by another IRS
function
 Strategies



Use IRS Masterfile data for key values
Impute values based on existing data and
information provided on prior and/or subsequent
return
Surveys and direct contact with preparers
What’s New?
 Digital images of tax returns
 Electronic filing
 Automated error correction/imputation routines
Digital Return Images
 In 1998 SOI began scanning operations
 Images stored in Tagged Image File Format (TIFF)
 In 2006, imaged more than 71.5 million pages
from 30 different tax and information returns
 Many users:




SOI headquarters staff
SOI edit operations
IRS Functions
General Public (tax-exempt organizations only)
Split-Screen Edit Systems
 Combines scanned image and GUI edit
system on a single 24 inch wide-aspect
monitor
 Image displayed using Adobe Acrobat or
specially adapted ORACLE programs
 Image and edit systems are synchronized
 Online access to instructions, dictionaries,
other tools
Split-Screen Edit Systems
 Positive feedback from editors
 Slight overall improvement in productivity and
quality
 Images available to geographically disbursed
work force
 Reduced storage of paper documents
 Reduced impact on other IRS functions
Electronic Filing of Tax Returns
 2004 Modernized electronic filing (MeF) began
 Uses Extensible Markup Language (XML) to
capture:


Numeric and character strings supplied by
taxpayer
Information tags
 2005 mandatory e-file for large business and
tax-exempt organizations


20.5% SOI sample of corporate income taxes
13.5% SOI sample of tax-exempt organizations
SOI Use of MeF Data
 In 2006, SOI developed programs to render
digital images from XML data
 Edit returns using split-screen applications
 In 2007, will populate ORACLE data tables
directly with XML data

Editors will validate data, supply codes and
allocate certain data items
Electronic Filing of Tax Returns
 Individual income tax returns





1986 – E-file through paid preparers
1992 – E-file from home computers allowed
1994 – 98% of all filers eligible to e-file
2006 – 73 million returns, or 54%, e-filed
Data stored in Tax Return Database (TRDB)


ASCII data, not tagged XML
2010 – Scheduled for conversion to MeF
SOI Individual Income Tax Program
 Sample of returns processed differently
depending on certain criteria

Edited returns

“Missing returns”

Forced closed returns
Individual Processing Programs
 Online editing system – editors transcribe,
code and review any potential data
discrepancies
 Post Edit Reconciliation Process (PERP) –
automated computer program which validates
and adjusts data
Edited Returns
 Edited returns are processed through the online
editing system by an editor, then reviewed
using the PERP program
 Prior to Tax Year 2004, all sampled returns
which were not “missing” were manually edited
 Currently only paper returns and electronically
filed returns with specific characteristics are
edited through online system
“Missing Returns”
 Each year, approximately 250 paper returns
selected for the sample are not located
 Limited IRS Masterfile data available
 PERP program used to impute missing
details of forms and schedules
Forced Closed Returns
 Automated processing of certain E-filed
returns in the SOI sample
 Bypass the online editing system and
processed through the PERP program
 Returns with possible discrepancies are
reviewed by National Office analyst
 Returns that pass all tests are considered
“forced closed” and added to final data file
Results from Forced Closing Returns
 Tax Year 2004 – First year using automated
closing of selected electronically filed returns
 Total sample size – 200,295 returns
 Electronically filed – 64,670 returns
 “Forced Closed” – 18,193 returns
 Editing hours saved – 1,400 hours
Results from Forced Closing Returns
 Tax Year 2005 – Second year of program,
expanded criteria for returns eligible to be
“forced closed”
 Total sample size – 292,837 returns
 Electronically filed – 114,897 returns
 “Forced Closed” – 47,753 returns
 Editing hours saved – 4,100 hours
The Future - Data
 More returns and information documents will
be filed electronically
 Optical Character Recognition or Intelligent
Character Recognition will be used to capture
data from paper-filed returns
 Data will be available in real time
 Enable larger sample sizes and increased
use of population files
The Future – Field Operations
 Increased resources dedicated to resolving
data inconsistencies as opposed to data
transcription
 Paperless environment – use of electronic
data or digital images created from paper
returns
 Increased use of prior year data to identify
and correct data anomalies
The Future - Products
 Improvements in technology and increased
use of electronic filing will allow SOI to
produce more data, more quickly and more
efficiently
 Increased sample sizes will allow small area
estimates
 Population files will allow for creation of ad
hoc panels, linkage of data items across tax
form types and research on infrequent data
items
Download