IRS XML Standards & Tax Return Data Strategy For External Discussion June 30, 2010 Why has the IRS decided to make the change now? 1. When MeF was first deployed industry standards and best practices were in the early stages of development. Over time, stable and well adopted common practices for XML reference models and NDR have evolved across numerous industry sectors and government agencies based on the Core Component Technical Specification (ISO 15000-5). 2. The IRS standards are aligned with the National Information Exchange Model (NIEM) which is now recognized and promoted by the Federal CIO’s Council as an excellent framework for XML data exchange. These standards will allow the IRS to ensure consistency and rapidly integrate published forms, business services, and compliance systems across all data sources, including MeF. 1 What we hope to accomplish TODAY Large vocabulary with numerous redundant terms, message components, messages, and interfaces. A small set of business applications results in a large distributed vocabulary TOMORROW Each layer is assembled from components at the lower levels by following the Naming and Design Rules and discovering re-usable components. Applications Apps Interfaces Interfaces Batch/API/ GUI/Service (Services, API, GUI, batch, etc.) Messages Messages Request/Response/Error Logical Chunks of Data (e.g., Address) Class/Complex Data Type Terms (e.g., City) Attribute/Element WHY: The vocabulary is distributed across project models. Components (aggregated Terms) Terms The vocabulary is shared across project models. A controlled vocabulary supports a broad portfolio of business applications with a shorter time to market 2 How this is accomplished Extension & Revision IRS XML Registry Registry stored in Repository IRS XML Vocabulary EDMO Governance & Harmonization Process Metadata Repository Service Registry User Interface XML - Editor Quality of Design tool Register Import Industry Schema Project XML Development EDMO Action Guidance Project Action Edit/Compose IRS XML NDR 3 Scope There are 2 significant topics: 1. IRS XML Standards: Conformance to IRS XML standards in the development and deployment of the Phase II Form 4868 and Phase III 1040 schema 2. Integration of IRS Forms and Schema This briefing is intended to provide a synopsis of planned changes: MeF schema (XSD) and instance documents (XML) Use of an IRS Common XML Vocabulary Documentation The IRS Tax Return Data Consolidation Strategy XML Publishing (integration of IRS forms and schema) Integration of MeF schema with IRS back end systems 4 Topic #1: IRS XML Standards • The Good news: MeF Vocabulary and XML document structure are aligned with IRS XML Standards in current Phase II development – 95% of MeF Phase II 1040 XML elements will be in the IRS Common vocabulary. Resolving the remaining 5% would be beneficial but is not necessary to move forward – Prior year and shared form schema are interoperable with IRS XML Standards compliant schema allowing for transition through annual maintenance (eFile Types were promoted into the common vocabulary) – The IRS has worked to minimize impact by building from existing MeF practices (e.g. single namespace, version management at the file directory level, and use of eFile Types) • So, what changes? Schema will be composed of global elements defined in a reference library referred to as the IRS XML Common Vocabulary. As MeF expands across multiple form families and tax years these standards become essential to our ability to perform data analysis – Produce a single data dictionary across all form schema – Search, discover, and reuse authoritative terms – Integrate published form components with XML vocabulary components 5 Summary of Changes • Schema (XSD) – Locally defined elements convert to Globally defined elements and types (eFile Types on steroids) – Statement and attachment attributes convert to elements – Some element names change for harmonization into IRS XML Common Vocabulary and consistency across forms • Instance Documents (XML) – Some element names change for harmonization into IRS XML Common Vocabulary and consistency across forms – Statement and attachment attributes convert to elements • Documentation – Form level documentation will need to be maintained separate from the form schema to allow for the reuse of global elements – The integration of the schema and PDF will provide an authoritative mapping of schema elements to the published Tax Form 6 What Are The Benefits? • Integrated form and schema design • Consistent use of terms • Quicker deployment of schema • Improved accuracy for data (form) requirements 7 Example: Form 4868 (Current Practice vs Planned) 8 Form 4868 Schema Composition (Current Practice Vs Planned) The composition of the schema will change with the implementation of the IRS XML Standards. 9 Example: Form 8812 (Current Practice vs Planned) In many cases the IRS XML Standards will result in few, if any, changes to the instance documents. 10 Example: Schedule C (Current Practice vs Planned) Some instance documents differ because the IRS XML Standards restrict the use of attributes and require consistent use of terms. This may lead to the addition of complex types or name changes for harmonization with the IRS Common XML Vocabulary. 11 Example: Form 2106 (Current Practice vs Planned) In this example the terms for the Form 2106 were harmonized with the Schedule C. The “planned” schema for the Schedule C and the 2106 now use the same tag names. 12 Topic #2: XML Publishing Integration Current Practice Planned XML Publishing Practice • • • • • • • • • Tax law specialists (TLS) embed data requirements in forms, instructions & publications The Published form is designed for a paper filer Data requirements are mined from forms, instructions and publications then translated to a record layout The record layout is translated to an XML schema The XML is translated back to the published form ETA publishes schema for eFile stakeholders Internal and external systems custom design presentation MeF stylesheet development is designed based on the published form • • • • • Tax law specialists (TLS) document data requirements Data requirements mapped to common vocabulary and the schema is composed Form updated and schema bound to form IRS publishes schema and form Internal and external systems reuse published forms MeF stylesheet development is streamlined due to the integration of the form and schema with authoritative binding of data elements 13