Real-time Data Quality for SAP Dietrich O. Banschbach Manager, R&D EMEA SAS International Copyright © 2005, SAS Institute Inc. All rights reserved. Agenda Overview dfConnector for SAP Scenarios Technology Additional Information Copyright © 2005, SAS Institute Inc. All rights reserved. 2 Overview: Companies Companies involved: SAP AG - world’s largest Enterprise Resource Planning (ERP) software company DataFlux Corporation (a SAS company) a leading provider of data management solutions consisting of data quality, data profiling, data integration, data augmentation and data monitoring Copyright © 2005, SAS Institute Inc. All rights reserved. 3 Overview: SAP partnership SAS is an SAP Software Partner with several SAP certified interfaces DataFlux, an SAP Software Partner in its own right, has attained SAP interface certification for its DataFlux dfConnector for SAP product Copyright © 2005, SAS Institute Inc. All rights reserved. 4 dfConnector for SAP DataFlux dfConnector for SAP enhances data quality in SAP systems – in real-time Facilitates communication between SAP applications and DataFlux dfIntelliServer Offers transparent access from SAP applications to DataFlux dfIntelliServer services for data validation, standardization, deduplication, errortolerant search, etc. Copyright © 2005, SAS Institute Inc. All rights reserved. 5 dfConnector for SAP Provides a remote function call (RFC) server that channels function calls from within SAP systems to dfIntelliServer and returns results to SAP Framework consisting of a set of DataFlux supplied ABAP functions that map to dfIntelliServer functions. These can be called by any SAP application. Functions can be used to build new or extend existing data quality solutions in SAP using DataFlux methods Copyright © 2005, SAS Institute Inc. All rights reserved. 6 dfConnector for SAP: Architecture BADI SAP Web Application Server Business Add-In (ABAP) API RFC server, based on SAP Java Connector dfIntelliServer (data quality algorithms, reference database) JDBC Search Index SAS Oracle MySQL DB/2 MS SQL Copyright © 2005, SAS Institute Inc. All rights reserved. 7 dfConnector for SAP: Framework Function modules written in ABAP use a standard „call function destination“ to invoke a method that is not part of the current SAP system The „call function destination“ invokes dfConnector listening at the specified destination dfConnector gathers all parameters and initiates the appropriate call into dfIntelliServer using its Java client API Copyright © 2005, SAS Institute Inc. All rights reserved. 8 dfConnector for SAP: Postal Address Validation ABAP programmers can use the framework functions in any SAP application As an example application that uses this framework, dfConnector for SAP supports postal address validation as defined in SAP’s BC-BASPV certification scenario. Enhances SAP’s Business Address Services (formerly Central Address Management) dfConnector is “Certified for SAP NetWeaver”. Formally tested with R/3 Enterprise (4.7) Copyright © 2005, SAS Institute Inc. All rights reserved. 9 dfConnector for SAP: Postal Address Validation Customer, vendor and other addresses in SAP are checked in real-time for correct city names, street names, house numbers and zip codes Missing information is auto completed from a reference database Quarterly adjustment process keeps addresses up to date via a batch-run − Reports which addresses are correct and which ones could not be validated (stating the reason) − Process can be used to do initial validation of all addresses in SAP Copyright © 2005, SAS Institute Inc. All rights reserved. 10 dfConnector for SAP: Deduplication In addition to postal address validation, a duplicate check is carried out before a new entry can be saved in SAP Avoids multiple entries of the same customer or vendor name with slight differences in spelling Offers error tolerant (fuzzy) search Copyright © 2005, SAS Institute Inc. All rights reserved. 11 Scenarios: Postal Address Validation This scenario enhances data quality within SAP in real-time as address data is entered interactively Addresses are checked for correct: − city names − street names − house numbers − zip codes Input is standardized according to postal authority requirements (e.g. USPS rules) Missing information can be auto completed Copyright © 2005, SAS Institute Inc. All rights reserved. 12 Scenario 1: Create new customer Create new customer in SAPGUI using standard SAP transaction XD01 Fill in data: • Company name • City • Country • (No street) Copyright © 2005, SAS Institute Inc. All rights reserved. 13 Scenario 1: Create new customer Copyright © 2005, SAS Institute Inc. All rights reserved. 14 Scenario 1: Create new customer Required entry Copyright © 2005, SAS Institute Inc. All rights reserved. 15 Scenario 1: Create new customer Missing information field is colored and cursor is positoned in that field Error message in status line Copyright © 2005, SAS Institute Inc. All rights reserved. 16 Scenario 1: Create new customer Click on „Check“ button when all data has been entered Street name entered incorrectly („Street“ instead of „Drive“) Region required to resolve the address Copyright © 2005, SAS Institute Inc. All rights reserved. 17 Scenario 1: Create new customer Address is validated by dfIntelliServer • City name converted to uppercase • Postal code (ZiP) added • Street name uppercased and standardized (DR=Drive) • District added automatically Copyright © 2005, SAS Institute Inc. All rights reserved. 18 Scenario 2: Creating a customer with minimal data entry Data entered in SAP: • Part of a street name with a spelling mistake • Postal code • Country (required by SAP) Copyright © 2005, SAS Institute Inc. All rights reserved. 19 Scenario 2: Creating a customer with minimal data entry Partial street name with spelling mistake Basic postal code Copyright © 2005, SAS Institute Inc. All rights reserved. No region specified 20 Scenario 2: Creating a new customer with minimal data entry Address is validated by dfIntelliServer • City name uppercased • Postal code added (zip plus 4) • Street name uppercased and standardized (PKWY=Parkway) − Spelling mistake corrected • District added automatically • Region added automatically Copyright © 2005, SAS Institute Inc. All rights reserved. 21 Scenario 3: Inconsistent or unresolvable addresses Neither post code nor city are specified User insists on saving a record even though the entry could not be validated To ensure high availability of the SAP system, address data can still be entered and saved if dfConnector and/or dfIntelliServer are temporarily unavailable. Entries are marked as not having been checked against official address reference data. Those addresses can be corrected in the dfConnector Quarterly Address Adjustment process which checks and updates in batch mode Copyright © 2005, SAS Institute Inc. All rights reserved. 22 Scenario 3: Inconsistent or unresolvable addresses Error message: No zip code and/or city specified Copyright © 2005, SAS Institute Inc. All rights reserved. 23 Scenario 3: Inconsistent or unresolvable addresses Copyright © 2005, SAS Institute Inc. All rights reserved. 24 Scenario 4: Duplicate search The following scenario shows the duplicate search and elimination capabilities of DataFlux dfConnector for SAP The scenario first shows how easy it is (caused by a small typo) to create a duplicate customer record in the SAP database without dfConnector In comparison, the same process is performed using dfConnector for SAP to identify potential duplicates and resolve the situation Copyright © 2005, SAS Institute Inc. All rights reserved. 25 Scenario 4: Duplicate search Using the standard SAP search, the user first checks in SAP if the customer he would like to create does not currently exist. But accidentally he has a small typo in the street name (Wesston instead of Weston) Copyright © 2005, SAS Institute Inc. All rights reserved. 26 Scenario 4: Duplicate search The search returns no hits and the user proceeds under the assumption he can now create a unique customer He creates and saves a new customer entry, thus creating a duplicate Copyright © 2005, SAS Institute Inc. All rights reserved. 27 Scenario 4: Duplicate search Copyright © 2005, SAS Institute Inc. All rights reserved. 28 Scenario 4: Duplicate search Copyright © 2005, SAS Institute Inc. All rights reserved. 29 Scenario 4: Duplicate search After that the duplicate search capabilities of dfConnector are triggered. Based on matchcodes created by dfIntelliServer, potential duplicates are detected Copyright © 2005, SAS Institute Inc. All rights reserved. 30 Scenario 4: Duplicate search Copyright © 2005, SAS Institute Inc. All rights reserved. 31 Scenario 4: Duplicate search Copyright © 2005, SAS Institute Inc. All rights reserved. 32 Scenario 4: Duplicate search Transaction flow Address data is entered in SAPGUI. Postal address validation executes The /DATAFLUX/ADDR_SEARCH implementation of the BAdI „ADDRESS_SEARCH“ is invoked Function module /DATAFLUX/DUPLICATE_CHECK searches for duplicates /DATAFLUX/DUPLICATE_CHECK calls dfConnector which gathers the entered SAP data. Matchcodes are generated dynamically and a JDBC call is made to retrieve results from the external RDBMS. The results of the search are returned to dfConnector which passes them to SAP to display a list of potential duplicates Copyright © 2005, SAS Institute Inc. All rights reserved. 33 Scenario 5: Quarterly adjustment process Quarterly Adjustment is a batch process that ensures address data stays up to date If new address data are available e.g. from USPS, this can be activated in the system in three steps by running: • SAP report to get all addresses • DataFlux provided report to check, standardize and auto complete addresses • SAP report to write the updated addresses back to the SAP database Copyright © 2005, SAS Institute Inc. All rights reserved. 34 Scenario 5: Quarterly adjustment process RSADRQU1 report scans all addresses for a certain country and inserts them into an index table /DATAFLUX/RSADRQU2 reads all SAP addresses from index table and validates each address. Addresses are checked, auto completed and standardized. If an address cannot be validated it is flagged for later reporting purposes. Indicates the level of address quality, i.e. how many addresses are correct and how many are incorrect RSADRQU3 writes back validated and corrected addresses to the operational SAP database. Alternatively reports reason for not being able to write them back Copyright © 2005, SAS Institute Inc. All rights reserved. 35 Scenario 5: Quarterly adjustment process Copyright © 2005, SAS Institute Inc. All rights reserved. 36 Scenario 5: Quarterly adjustment process Checked addresses: + = ok - = failed Summary Copyright © 2005, SAS Institute Inc. All rights reserved. 37 Scenario 5: Quarterly adjustment process Copyright © 2005, SAS Institute Inc. All rights reserved. 38 Technology Java 1.4.x/1.5 to interface SAP with the Dataflux dfIntelliServer 6 using SAP Java Connector 2.1.3 ABAP programming to hook into the predefined interfaces (SAP Business Add-In) for address validation and deduplication SAP Add-on Assembly Kit (AAK) to allow for SAP certification (e.g. Name spaces, installation, deployment, upgrade etc.) Search index creation in SAS data sets or in any external JDBC-compliant RDBMS Copyright © 2005, SAS Institute Inc. All rights reserved. 39 Technology: dfConnector Framework Functions /DATAFLUX/AREA_CODE /DATAFLUX/DETERMINE_GENDER /DATAFLUX/DETERMINE_LOCALE /DATAFLUX/DETERMINE_ENTITY /DATAFLUX/DIRECTORY_SEARCH /DATAFLUX/DUPLICATE_CHECK /DATAFLUX/GENERATE_MATCHCODE /DATAFLUX/GEN_MATCHCODE_PARSED /DATAFLUX/GEOCODE /DATAFLUX/LOOKUP_COUNTY /DATAFLUX/LOOKUP_PHONE /DATAFLUX/PARSE /DATAFLUX/QUERY_SERVER /DATAFLUX/STANDARDIZE /DATAFLUX/STANDARDIZE_PARSED /DATAFLUX/STANDARDIZE_SCHEME /DATAFLUX/DELETE_INDEX_ENTRY /DATAFLUX/VERIFY_ADDRESS /DATAFLUX/MAINTAIN_INDEX_ENTRY Copyright © 2005, SAS Institute Inc. All rights reserved. 40 Technology: /DATAFLUX/VERIFY_ADDRESS Input data Results Copyright © 2005, SAS Institute Inc. All rights reserved. 41 Technology: /DATAFLUX/VERIFY_ADDRESS Copyright © 2005, SAS Institute Inc. All rights reserved. 42 Technology: External Search Index The external search index can be stored in an arbitrary RDBMS that supports the JDBC interface Examples: • SAS data sets • MySQL • Microsoft SQL Server • MaxDB (formerly known as SAP DB) • Oracle • ... Copyright © 2005, SAS Institute Inc. All rights reserved. 43 Technology: External Search Index Copyright © 2005, SAS Institute Inc. All rights reserved. 44 Technology: External Search Index Copyright © 2005, SAS Institute Inc. All rights reserved. 45 Technology: External Search Index Copyright © 2005, SAS Institute Inc. All rights reserved. 46 Technology: External search index Example: Stored in SAS Copyright © 2005, SAS Institute Inc. All rights reserved. 47 Technology: RFC server platforms SAP supported Java Connector „JCo“ platforms (used by RFC server component of dfConnector): • Windows NT SP4 or later, Win 2000, XP, Win 2003 Server • Sun Solaris/SPARC 8 or later • IBM AIX 4.3 or later • HP-UX 11.0 or later (PA_RISC processors, only) • OS/400 V5R1 or later (not for SAP JCo 2.0.5) • COMPAQ Tru64 5.0 or later (not for SAP JCo 2.1.x) • Z/Linux on S/390 (Linux / Z-series GLIBC 2.2.4 or later) • Linux Kernel 2.2.14 or later (Intel compatible processors) Copyright © 2005, SAS Institute Inc. All rights reserved. 48 Additional Information SUGI Birds-of-a-Feather (BoF) session “Enhancing SAP with SAS”, room 107, Tuesday at 6 p.m. www.dataflux.com Copyright © 2005, SAS Institute Inc. All rights reserved. 49 Copyright © 2005, SAS Institute Inc. All rights reserved. 50