Real-time Data Quality
for SAP
Dietrich O. Banschbach
Manager, R&D EMEA
SAS International
Copyright © 2005, SAS Institute Inc. All rights reserved.
Agenda





Overview
dfConnector for SAP
Scenarios
Technology
Additional Information
Copyright © 2005, SAS Institute Inc. All rights reserved.
2
Overview: Companies
 Companies involved:
SAP AG - world’s largest Enterprise Resource
Planning (ERP) software company
DataFlux Corporation (a SAS company)
a leading provider of data management solutions
consisting of data quality, data profiling, data
integration, data augmentation and data
monitoring
Copyright © 2005, SAS Institute Inc. All rights reserved.
3
Overview: SAP partnership
 SAS is an SAP Software Partner with several
SAP certified interfaces
 DataFlux, an SAP Software Partner in its own
right, has attained SAP interface certification
for its DataFlux dfConnector for SAP product
Copyright © 2005, SAS Institute Inc. All rights reserved.
4
dfConnector for SAP
 DataFlux dfConnector for SAP enhances data
quality in SAP systems – in real-time
 Facilitates communication between SAP
applications and DataFlux dfIntelliServer
 Offers transparent access from SAP applications
to DataFlux dfIntelliServer services for data
validation, standardization, deduplication, errortolerant search, etc.
Copyright © 2005, SAS Institute Inc. All rights reserved.
5
dfConnector for SAP
 Provides a remote function call (RFC) server that
channels function calls from within SAP systems to
dfIntelliServer and returns results to SAP
 Framework consisting of a set of DataFlux supplied
ABAP functions that map to dfIntelliServer
functions. These can be called by any SAP
application.
 Functions can be used to build new or extend
existing data quality solutions in SAP using
DataFlux methods
Copyright © 2005, SAS Institute Inc. All rights reserved.
6
dfConnector for SAP: Architecture
BADI
SAP Web
Application
Server
Business
Add-In
(ABAP)
API
RFC server, based on
SAP Java Connector
dfIntelliServer
(data quality
algorithms,
reference database)
JDBC
Search Index
SAS Oracle MySQL DB/2 MS SQL
Copyright © 2005, SAS Institute Inc. All rights reserved.
7
dfConnector for SAP: Framework
 Function modules written in ABAP use a
standard „call function destination“ to invoke a
method that is not part of the current SAP
system
 The „call function destination“ invokes
dfConnector listening at the specified destination
 dfConnector gathers all parameters and initiates
the appropriate call into dfIntelliServer using its
Java client API
Copyright © 2005, SAS Institute Inc. All rights reserved.
8
dfConnector for SAP: Postal Address Validation
 ABAP programmers can use the framework
functions in any SAP application
 As an example application that uses this
framework, dfConnector for SAP supports postal
address validation as defined in SAP’s BC-BASPV certification scenario.
 Enhances SAP’s Business Address Services
(formerly Central Address Management)
 dfConnector is “Certified for SAP NetWeaver”.
Formally tested with R/3 Enterprise (4.7)
Copyright © 2005, SAS Institute Inc. All rights reserved.
9
dfConnector for SAP: Postal Address Validation
 Customer, vendor and other addresses in SAP
are checked in real-time for correct city names,
street names, house numbers and zip codes
 Missing information is auto completed from a
reference database
 Quarterly adjustment process keeps addresses
up to date via a batch-run
− Reports which addresses are correct and which
ones could not be validated (stating the reason)
− Process can be used to do initial validation of all
addresses in SAP
Copyright © 2005, SAS Institute Inc. All rights reserved.
10
dfConnector for SAP: Deduplication
 In addition to postal address validation, a
duplicate check is carried out before a new entry
can be saved in SAP
 Avoids multiple entries of the same customer or
vendor name with slight differences in spelling
 Offers error tolerant (fuzzy) search
Copyright © 2005, SAS Institute Inc. All rights reserved.
11
Scenarios: Postal Address Validation
 This scenario enhances data quality within SAP in
real-time as address data is entered interactively
 Addresses are checked for correct:
− city names
− street names
− house numbers
− zip codes
 Input is standardized according to postal authority
requirements (e.g. USPS rules)
 Missing information can be auto completed
Copyright © 2005, SAS Institute Inc. All rights reserved.
12
Scenario 1: Create new customer
 Create new customer in SAPGUI using standard
SAP transaction XD01
 Fill in data:
• Company name
• City
• Country
• (No street)
Copyright © 2005, SAS Institute Inc. All rights reserved.
13
Scenario 1: Create new customer
Copyright © 2005, SAS Institute Inc. All rights reserved.
14
Scenario 1: Create new customer
Required
entry
Copyright © 2005, SAS Institute Inc. All rights reserved.
15
Scenario 1: Create new customer
Missing
information
field is
colored and
cursor is
positoned in
that field
Error
message
in status
line
Copyright © 2005, SAS Institute Inc. All rights reserved.
16
Scenario 1: Create new customer
Click on
„Check“
button
when all
data has
been
entered
Street
name
entered
incorrectly
(„Street“
instead of
„Drive“)
Region
required
to resolve
the
address
Copyright © 2005, SAS Institute Inc. All rights reserved.
17
Scenario 1: Create new customer
 Address is validated by dfIntelliServer
• City name converted to uppercase
• Postal code (ZiP) added
• Street name uppercased and standardized (DR=Drive)
• District added automatically
Copyright © 2005, SAS Institute Inc. All rights reserved.
18
Scenario 2:
Creating a customer with minimal data entry
 Data entered in SAP:
• Part of a street name with a spelling mistake
• Postal code
• Country (required by SAP)
Copyright © 2005, SAS Institute Inc. All rights reserved.
19
Scenario 2: Creating a customer with minimal data
entry
Partial
street
name with
spelling
mistake
Basic
postal
code
Copyright © 2005, SAS Institute Inc. All rights reserved.
No region
specified
20
Scenario 2:
Creating a new customer with minimal data entry
 Address is validated by dfIntelliServer
• City name uppercased
• Postal code added (zip plus 4)
• Street name uppercased and standardized (PKWY=Parkway)
− Spelling mistake corrected
• District added automatically
• Region added automatically
Copyright © 2005, SAS Institute Inc. All rights reserved.
21
Scenario 3: Inconsistent or unresolvable addresses
 Neither post code nor city are specified
 User insists on saving a record even though the
entry could not be validated
 To ensure high availability of the SAP system,
address data can still be entered and saved if
dfConnector and/or dfIntelliServer are
temporarily unavailable. Entries are marked as
not having been checked against official address
reference data. Those addresses can be
corrected in the dfConnector Quarterly Address
Adjustment process which checks and updates
in batch mode
Copyright © 2005, SAS Institute Inc. All rights reserved.
22
Scenario 3: Inconsistent or unresolvable addresses
Error message:
No zip code
and/or city
specified
Copyright © 2005, SAS Institute Inc. All rights reserved.
23
Scenario 3: Inconsistent or unresolvable addresses
Copyright © 2005, SAS Institute Inc. All rights reserved.
24
Scenario 4: Duplicate search
 The following scenario shows the duplicate
search and elimination capabilities of DataFlux
dfConnector for SAP
 The scenario first shows how easy it is (caused
by a small typo) to create a duplicate customer
record in the SAP database without dfConnector
 In comparison, the same process is performed
using dfConnector for SAP to identify potential
duplicates and resolve the situation
Copyright © 2005, SAS Institute Inc. All rights reserved.
25
Scenario 4: Duplicate search
 Using the standard SAP search, the user first checks in
SAP if the customer he would like to create does not
currently exist. But accidentally he has a small typo in the
street name (Wesston instead of Weston)
Copyright © 2005, SAS Institute Inc. All rights reserved.
26
Scenario 4: Duplicate search
 The search returns no hits and the user
proceeds under the assumption he can now
create a unique customer
 He creates and saves a new customer entry,
thus creating a duplicate
Copyright © 2005, SAS Institute Inc. All rights reserved.
27
Scenario 4: Duplicate search
Copyright © 2005, SAS Institute Inc. All rights reserved.
28
Scenario 4: Duplicate search
Copyright © 2005, SAS Institute Inc. All rights reserved.
29
Scenario 4: Duplicate search
 After that the duplicate search capabilities of
dfConnector are triggered. Based on
matchcodes created by dfIntelliServer, potential
duplicates are detected
Copyright © 2005, SAS Institute Inc. All rights reserved.
30
Scenario 4: Duplicate search
Copyright © 2005, SAS Institute Inc. All rights reserved.
31
Scenario 4: Duplicate search
Copyright © 2005, SAS Institute Inc. All rights reserved.
32
Scenario 4: Duplicate search
Transaction flow
 Address data is entered in SAPGUI. Postal address
validation executes
 The /DATAFLUX/ADDR_SEARCH implementation of the
BAdI „ADDRESS_SEARCH“ is invoked
 Function module /DATAFLUX/DUPLICATE_CHECK
searches for duplicates
 /DATAFLUX/DUPLICATE_CHECK calls dfConnector which
gathers the entered SAP data.
 Matchcodes are generated dynamically and a JDBC call is
made to retrieve results from the external RDBMS. The
results of the search are returned to dfConnector which
passes them to SAP to display a list of potential duplicates
Copyright © 2005, SAS Institute Inc. All rights reserved.
33
Scenario 5: Quarterly adjustment process
 Quarterly Adjustment is a batch process that
ensures address data stays up to date
 If new address data are available e.g. from
USPS, this can be activated in the system in
three steps by running:
• SAP report to get all addresses
• DataFlux provided report to check,
standardize and auto complete addresses
• SAP report to write the updated addresses
back to the SAP database
Copyright © 2005, SAS Institute Inc. All rights reserved.
34
Scenario 5: Quarterly adjustment process
 RSADRQU1 report scans all addresses for a certain
country and inserts them into an index table
 /DATAFLUX/RSADRQU2 reads all SAP addresses from
index table and validates each address. Addresses are
checked, auto completed and standardized.
If an address cannot be validated it is flagged for later
reporting purposes. Indicates the level of address quality,
i.e. how many addresses are correct and how many are
incorrect
 RSADRQU3 writes back validated and corrected
addresses to the operational SAP database. Alternatively
reports reason for not being able to write them back
Copyright © 2005, SAS Institute Inc. All rights reserved.
35
Scenario 5: Quarterly adjustment process
Copyright © 2005, SAS Institute Inc. All rights reserved.
36
Scenario 5: Quarterly adjustment process
Checked
addresses:
+ = ok
- = failed
Summary
Copyright © 2005, SAS Institute Inc. All rights reserved.
37
Scenario 5: Quarterly adjustment process
Copyright © 2005, SAS Institute Inc. All rights reserved.
38
Technology
 Java 1.4.x/1.5 to interface SAP with the Dataflux
dfIntelliServer 6 using SAP Java Connector 2.1.3
 ABAP programming to hook into the predefined
interfaces (SAP Business Add-In) for address
validation and deduplication
 SAP Add-on Assembly Kit (AAK) to allow for SAP
certification (e.g. Name spaces, installation,
deployment, upgrade etc.)
 Search index creation in SAS data sets or in any
external JDBC-compliant RDBMS
Copyright © 2005, SAS Institute Inc. All rights reserved.
39
Technology: dfConnector Framework Functions



















/DATAFLUX/AREA_CODE
/DATAFLUX/DETERMINE_GENDER
/DATAFLUX/DETERMINE_LOCALE
/DATAFLUX/DETERMINE_ENTITY
/DATAFLUX/DIRECTORY_SEARCH
/DATAFLUX/DUPLICATE_CHECK
/DATAFLUX/GENERATE_MATCHCODE
/DATAFLUX/GEN_MATCHCODE_PARSED
/DATAFLUX/GEOCODE
/DATAFLUX/LOOKUP_COUNTY
/DATAFLUX/LOOKUP_PHONE
/DATAFLUX/PARSE
/DATAFLUX/QUERY_SERVER
/DATAFLUX/STANDARDIZE
/DATAFLUX/STANDARDIZE_PARSED
/DATAFLUX/STANDARDIZE_SCHEME
/DATAFLUX/DELETE_INDEX_ENTRY
/DATAFLUX/VERIFY_ADDRESS
/DATAFLUX/MAINTAIN_INDEX_ENTRY
Copyright © 2005, SAS Institute Inc. All rights reserved.
40
Technology: /DATAFLUX/VERIFY_ADDRESS
Input data
Results
Copyright © 2005, SAS Institute Inc. All rights reserved.
41
Technology: /DATAFLUX/VERIFY_ADDRESS
Copyright © 2005, SAS Institute Inc. All rights reserved.
42
Technology: External Search Index
 The external search index can be stored in an
arbitrary RDBMS that supports the JDBC
interface
 Examples:
• SAS data sets
• MySQL
• Microsoft SQL Server
• MaxDB (formerly known as SAP DB)
• Oracle
• ...
Copyright © 2005, SAS Institute Inc. All rights reserved.
43
Technology: External Search Index
Copyright © 2005, SAS Institute Inc. All rights reserved.
44
Technology: External Search Index
Copyright © 2005, SAS Institute Inc. All rights reserved.
45
Technology: External Search Index
Copyright © 2005, SAS Institute Inc. All rights reserved.
46
Technology: External search index
Example: Stored in SAS
Copyright © 2005, SAS Institute Inc. All rights reserved.
47
Technology: RFC server platforms
 SAP supported Java Connector „JCo“ platforms
(used by RFC server component of dfConnector):
• Windows NT SP4 or later, Win 2000, XP, Win 2003 Server
• Sun Solaris/SPARC 8 or later
• IBM AIX 4.3 or later
• HP-UX 11.0 or later (PA_RISC processors, only)
• OS/400 V5R1 or later (not for SAP JCo 2.0.5)
• COMPAQ Tru64 5.0 or later (not for SAP JCo 2.1.x)
• Z/Linux on S/390 (Linux / Z-series GLIBC 2.2.4 or later)
• Linux Kernel 2.2.14 or later (Intel compatible processors)
Copyright © 2005, SAS Institute Inc. All rights reserved.
48
Additional Information
 SUGI Birds-of-a-Feather (BoF) session
“Enhancing SAP with SAS”, room 107, Tuesday
at 6 p.m.
 www.dataflux.com
Copyright © 2005, SAS Institute Inc. All rights reserved.
49
Copyright © 2005, SAS Institute Inc. All rights reserved.
50