Understanding Address Verification Codes (AVCs) Why isn’t my address being verified? Verification is a spectrum, not a black/white answer Addresses can remain uncorrected or unverified beyond a certain point due to a variety of reasons. All of those reasons can be figured out and possibly corrected by looking at the AVCs We will explain each component of the AVC in detail What are AVCs? AVCs are codes that measure the validity of a cleaned address They are a comparison between the state of the address before and after the Loqate engine is done with it AVCs help you understand two things: The level to which the address has been verified The reasons for reaching or stopping at that level AVCs are what set Loqate apart from the competition: We don’t just try to fix addresses, we tell you exactly what we’ve done with them, how, and why. Why Address Verification Codes Are Important Measurement of data quality before and after processing Provide feedback to application using address verification Enable decisions to be made for each address Filtering results gives focus on efforts to enhance addresses Used to generate data quality reports Address Verification Code Components 1111 Bayhill Drive, San Bruno, CA 94066 USA V44-I44-P6-100 Verification Matchscore Parsing Postcode Verification Status V – Verified Complete match with one record in GKR U – Unverified Unable to verify P – Partially Verified Part of the input is matched with one record in GKR A – Ambiguous Input matches more than one record in GKR R – Revert to Input No processing performed – input returned Levels 5 – Delivery Point Post office box, apartment number, suite number 4 – Premise Building number, house number 3 – Thoroughfare Street, avenue, boulevard, alley 2 – Locality City, town, village 1 – Administrative Area County, state, department Terms based on OASIS Standards Verification Results Status How did input match reference data in GKR? Post-processed match level Level match after all changes and additions are performed Pre-processed match level Level match before any changes or additions 1111 Bayhill Drive, San Bruno, CA 94066 USA V44-I44-P6-100 • • • • Input is correct Input has been correctly parsed Input has been correctly formatted V means that there is a single exact match in the reference data. 1111 Bayhill Drive, Suite 290, San Bruno, CA 94066 USA V55-I55-P7-100 • • With the addition of the suite number, this verifies that an actual delivery point exists in the GKR This is actually a separate reference from the previous address, due to the addition of the delivery point 11 GUANG HUA LU BEIJING 100600 China V33-I44-P6-100 • • V33 means we could only verify down to the street level, because reference data for China only goes down to the street level Loqate will never return a verification level higher than what’s available in the reference data 38 Rue de Cracovie Dijon France 21044 P44-I44-P6-100 • • • P stands for a partial verification, instead of V This is because reference data exists down to the delivery point level (5) but the input only goes down to the premise level (4) This means that the engine could assign a higher verification level with more input Woking Business Park, Albert Drive, Woking, UK P33-I33-P0-100 • • • Another partial verification Here the input only goes down to the street level, so that’s the highest verification level given P basically means that Loqate can do a better verification, but not with the given input 13 Yunosti Str, Moscow, Austria U00-I04-P6-100 • • This has been parsed as an address, but the engine quickly discovered that there is no Moscow in Austria Either the country is wrong, or the rest of the address is. The engine will treat this as unverifiable. I don’t know what my address is U00-I00-P0-000 • • • The bane of all address verification The field is empty, or so completely nonsensical that the system doesn’t even recognize it as an address Nothing will be parsed or verified Ambiguity Ambiguity is defined as having more than one match in the reference data. Ambiguity in the address does not imply the address is undeliverable It means two or more distinct addresses can be classed as part of the input address We will examine ambiguity at the premise, thoroughfare, and locality levels 6th Street Austin TX USA A22-I44-P3-100 • • • Ambiguity at the thoroughfare level There are both a 6th street and an East 6th street in Austin, Texas, and the engine can’t determine which one applies. The engine will only verify down to the locality level, since it knows Austin, Texas is a real locality rua francisco bicalho 2400/301 belo horizonte brazil A44-I44-P6-100 Caiçara Adeláide Padre Eustáquio • • • Ambiguity at the locality level Two premises with the same name exist in two different localities The engine verifies down to the premise level, but flags it as ambiguous to signify more than one such premise exists 12/788 beach rd browns bay auckland 0630 new zealand A33-I44-P6-067 • • • Ambiguity at the premise level We cannot determine if the input contains a delivery point match or a premise match This shows how close the P and A levels are to each other Search vs Verify AVCs Verify: 1111 Bayhill Dr San Bruno CA 94066-3027 V44-I44-P7-067 Sometimes Search and Verify will return different AVCs for the same address This is because they process the address using different methods Verify will return the single address in the lexicon that’s an exact match for the input address It will not try to complete the address by adding a delivery point to it Search vs Verify AVCs Search: 1111 Bayhill Dr Ste 290 San Bruno CA 94066-3027 A54-I55-P8-067 Search will return all addresses in the lexicon that match the input, including all those with an added delivery point. This is intentional, as Search suggests multiple matches to the user while they are typing the input The A means there is ambiguity due to multiple possible matches The match level is upgraded to 5 by the addition of a delivery point Parsing Status Same levels as the Verification Status Lexicon Match (first number after I) Rules and data to guide parsing exist in the Global Knowledge Repository Context Match (second number after I) Address component are parsed based on their position and other information relevant to the country Why Parsing? The parsing level is the extent to which the engine can recognize the input The inability to parse an input address can explain a low verification level Two kinds of parsing: Lexicon match attempts to parse the address by using country specific rules Context match attempts to parse the address by assigning possible meaning to elements depending on the context in which they appear 11 GUANG HUA LU BEIJING 100600 China V33-I44-P6-100 • Uses pattern matching (numeric values = premise number) and lexicon matching depending for each country (‘BEIJING’ matches a locality in the lexicon for China) 6th Street Austin TX USA A22-I44-P3-100 Least accurate form of matching and is based on identifying a word as, for instance, a Thoroughfare based on it being preceded by something that could be a Premise, and followed by something that could be a Locality, the latter items being identified through a match against the reference data or the lexicon 13 Yunosti Str, Moscow, Austria U00-I04-P6-100 • • • The lexicon matching rules are dependent on the country, and this does not follow Austrian standards. Therefore, the lexicon score is 0 However the context match recognizes as an address based on the position and nature of the different fields. Therefore, it receives an I parsing status Postcode Status 8 – PostalCodePrimary and PostalCodeSecondary verified 7 – PostalCodePrimary verified and PostalCodeSecondary added or changed 6 – PostalCodePrimary verified 5 – PostalCodePrimary verified with small change 4 – PostalCodePrimary verified with large change 3 – PostalCodePrimary added 2 – PostalCodePrimary identified by lexicon 1 – PostalCodePrimary identified by context 0 – PostalCodePrimary not processed 1111 Bayhill Drive, San Bruno, CA 94066 USA V44-I44-P6-100 1111 Bayhill Dr San Bruno CA 94066 1111 Bayhill Drive, Suite 290, San Bruno, CA 94066 USA V55-I55-P7-100 1111 Bayhill Dr Ste 290 San Bruno CA 94066-3053 • • • The secondary postcode can verify down to the delivery point level By adding the suite number, we gave the engine enough input to determine the associated secondary delivery point This increases the Postcode Status from 6 to 7 6th Street Austin TX USA A22-I44-P3-100 • • We didn’t provide any postcode as input, so the engine had to supply the primary postcode itself, thus the low postcode of 3 With no premise or delivery point to specify a secondary postcode, the postcode status cannot be increased beyond its previous level. It will remain 3 Matchscore The accuracy matchscore gives the similarity between the input data and closest reference data match Percentage between 0 and 100 100% means complete similarity Lower matchscore means more changes done during the verification process 1111 Bayhill Drive, San Bruno, CA 94066 USA V44-I44-P6-100 This will be our reference address. The engine didn’t do any changes before verifying it, hence a match score of 100 1111 Bayhil, San Bruno, CA USA V42-I44-P3-092 • • • Compared to the previous address, Bayhil is misspelled, Drive is missing and so is the postcode. The input verification level is 2, meaning that upon parsing the engine could only verify down to the locality. However the engine was able to correct all this, resulting in a V4. It decreased the match code to 92 from 100 to signify the changes made. Note the lower postcode status. 12/788 beach rd browns bay auckland 0630 new zealand A33-I44-P6-067 12/788 Beach Road Murrays Bay Auckland 0630 • • The engine had to change the locality name from browns bay to Murrays Bay. This is a significant change, thus the much lower Match Score More Important Information R – Revert to Input verification status Users can specify the minimum verification level for an address Addresses that do not pass this test postverification will be reverted to their input values, and marked as such More Important Information Default country Tells the engine to use a specific country’s rules when no country is specified in the input Force country Tells the engine to use a specific country’s rules even when a different country is specified in the input More Important Information Native language Transliterate addresses between Roman characters and other character sets This is not translation, there is no attempt to preserve the meaning Transliteration matches corresponding characters from different alphabets GeoAccuracy Code (GAC) P4 1 2 1. Verification Status P: Point A single geocode was found matching the input address I: Interpolated A geocode was able to be interpolated from the input addresses location in a range A: Average Multiple candidate geocodes were found to match the input address, and an average of these was returned U: Unable to geocode A geocode was not able to be generated for the input address 2. Geocoding Level 4: Premise 3: Thoroughfare 2: Locality 1: AdministrativeArea 0: None GeoAccuracy Codes: Visualized Point: • The unique address 11 Main Street is confirmed to exist • It is within 21 meters (68 feet) of a point in the global knowledge repository GeoAccuracy Codes: Visualized Interpolated: • • • 13 Main Street does not have a match in the reference DB. However two addresses on the same street exist, on either side of it The engine assumes that 13 lies on a vector between them, less than 21 meters away from one or the other GeoAccuracy Codes: Visualized Average: • 12 Main Street does not have a match in the reference DB. • There are multiple reference points around it, but it is not assumed to lie on a direct vector between two adjacent points. 12 Main Street is assumed to be in the resulting polygon • The engine returns the point but decreases its Geocoding level, and the polygon’s diagonal distance 999 BAKER WAY STE 320 SAN MATEO CA 94404-1566 loqate.com | everythinglocation.com | lqt.me AVCDKV1061814