ACL – New Functionality Presentation

advertisement
Fuzzy Duplicates Analysis
with ACL
Prepared by: Kevin Legere
Date: April 3rd, 2013
ACL | Transforming Audit and Risk
Agenda
 Overview
 Example
 FUZZYDUP command
 OMIT() Function
 Script Editor and RECOFFSET
 Q&A
© 2012 ACL Services Ltd.
2
ACL | Transforming Audit and Risk
Overview
 What is a "Fuzzy Duplicate"?
– Match based on criteria where the values are not exact but very close
» EX: "ACL Services" and "ACL Service"
 Typically used for:
»
»
»
»
Keyword matching
Invoice Number matching
Vendor Name matching*
Employee Name matching
 Can be simple or complex
» Completely depends on your approach and desired accuracy
* focus for this presentation
© 2012 ACL Services Ltd.
3
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd.
Overview
 Simple Match Examples:
– Exact or 100% match
» "ACL" = "ACL"
– Force Upper or Lower case
» "ACL" = UPPER("acl")
» "acl" = LOWER("ACL")
– Removal of special characters
» "ACL" = EXCLUDE("*ACL." "!@#$%^&*().")
– Only compare numbers or letters
» "ACL" = INCLUDE(UPPER("ACL123") "ABCDEFGHIJKLMNOPQRSTUVWXYZ")
» "123" = INCLUDE("ACL123" "1234567890")
4
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd.
Overview
 Complex Match Examples:
– Removal of company type indicators (LLC, INC, LTD, etc)
» "ACL Services Ltd." = "ACL Services"
– Percent of word match AKA letter by letter
» "ACL Services"
•
"ACL Service"
11/12 character match or 91.6% match
– Word by Word*
» "ACL Services"
•
•
•
"ACL Champions"
"ACL"
"ACL"
"Services"
"Champions"
= 50% match
– Levenshtein distance
– Sounds like
– NYSIIS
*Most used by ACL Consultants
5
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd.
Vendor Master Analysis
6
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd.
Vendor Master Analysis
 Fuzzy Duplicates on Vendor Name
– Possible Risk
» Payments are being sent to more than one vendor
– May not involve risk. The desire can be to normalize the vendor master list to
ensure that duplicates do not exist.
» Ideally, one unique vendor should exist in your vendor master list with one or more
address records in your vendor address table
7
ACL | Transforming Audit and Risk
Vendor Master Analysis
 Sample file contains 75 vendors
– Only Vendor Code and Vendor Name
 Where do you start for Vendor Name matching?
– Look for exact duplicates
– Focus on Simple matching
– Sort or Summarize!
© 2012 ACL Services Ltd.
8
ACL | Transforming Audit and Risk
Vendor Master Analysis
 Step 1: Summarize your Vendor Master File
» Choose Vendor Name as your key field
» Add Vendor Code as the Other Fields for Summarizing
» Be sure to check "Presort"
© 2012 ACL Services Ltd.
9
ACL | Transforming Audit and Risk
Vendor Master Analysis
 Step 2: Quickly comb over the data to identify a common trend.
» We will focus on this issue, in the sample data:
» Create a computed field that corrects the trend (or cleans the data).
© 2012 ACL Services Ltd. 10
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd. 11
Vendor Master Analysis
 Functions used in Default Value text box:
INCLUDE(UPPER(ALLTRIM(Vendor_Name)) 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
 Within ACL, the computed field will return the following:
ACL | Transforming Audit and Risk
Vendor Master Analysis
 Step 3: Perform a Duplicates Command on the computed field
© 2012 ACL Services Ltd. 12
ACL | Transforming Audit and Risk
Vendor Master Analysis
 Results are as follows:
© 2012 ACL Services Ltd. 13
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd. 14
FUZZYDUP command
 ACL 9.3 has new features that make Fuzzy Duplicate analysis easier
–
–
–
–
FUZZYDUP command
OMIT() function
ISFUZZYDUP() function
LEVDIST() function
 Important parameters to understand
– Levenshtein Distance
– Difference Percentage
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd. 15
FUZZYDUP command
 Syntax
– FUZZYDUP ON {key_field} <OTHER fields> {LEVDISTANCE value} <DIFFPCT value><RESULTSIZE
value> <EXACT> TO table_name
 Example
– FUZZYDUP ON Vendor_Name OTHER ALL LEVDISTANCE 2 DIFFPCT 50 TO My_Results
 Levenshtein Distance (LEVDISTANCE)
» The number of edits required to make the strings equal
•
EX: "Smith" and "Smythe" have a Levenshtein Distance of 2
 Difference Percentage (DIFFPCT)
» The threshold for percentage difference between two strings
•
•
EX: "Smith" and "Smythe" have a Percentage Difference of 40%
(2/5) * 100%
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd. 16
OMIT() Function
 When Do I use OMIT()?
– When you want to refine fuzzy duplicate analysis
– Look for repeating strings you want to remove from your Vendor Name field
 Syntax
– OMIT(string1, string2 <,case_sensitive>)
– Specify T to make substrings specified for removal case-sensitive, or F to ignore
case
 Example
– OMIT(Vendor_Name " Ltd, Inc, Corp, Corporation" F)
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd. 17
Script Editor and RECOFFSET
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd. 18
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd. 19
Contact Information
Kevin Legere
Implementation
Consultant
ACL Services Ltd.
1550 Alberni Street, Vancouver, BC, Canada V6G 1A5
kevin_legere@acl.com | @aclkevin
www.acl.com/linkedin | www.acl.com/twitter | www.acl.com/facebook
ACL | Transforming Audit and Risk
© 2012 ACL Services Ltd. 20
Download