V33-I44-P6-100

advertisement
Understanding Address
Verification Codes (AVCs)
Why isn’t my address being verified?
Verification is a spectrum, not a black/white answer
Addresses can remain uncorrected or unverified
beyond a certain point due to a variety of reasons.
All of those reasons can be figured out and possibly
corrected by looking at the AVCs
We will explain each component of the AVC in detail
What are AVCs?
AVCs are codes that measure the validity of a cleaned address
They are a comparison between the state of the address
before and after the Loqate engine is done with it
AVCs help you understand two things:
The level to which the address has been verified
The reasons for reaching or stopping at that level
AVCs are what set Loqate apart from the competition:
We don’t just try to fix addresses, we tell you exactly what
we’ve done with them, how, and why.
Why Address Verification Codes Are Important
Measurement of data quality before and after processing
Provide feedback to application using address verification
Enable decisions to be made for each address
Filtering results gives focus on efforts to enhance
addresses
Used to generate data quality reports
Address Verification Code Components
1111 Bayhill Drive, San Bruno, CA 94066 USA
V44-I44-P6-100
Verification
Matchscore
Parsing
Postcode
Verification Status
V – Verified
Complete match with one record in GKR
U – Unverified
Unable to verify
P – Partially Verified
Part of the input is matched with one record in GKR
A – Ambiguous
Input matches more than one record in GKR
R – Revert to Input
No processing performed – input returned
Levels
5 – Delivery Point
Post office box, apartment number, suite number
4 – Premise
Building number, house number
3 – Thoroughfare
Street, avenue, boulevard, alley
2 – Locality
City, town, village
1 – Administrative Area
County, state, department
Terms based on OASIS Standards
Verification Results
Status
How did input match reference data in GKR?
Post-processed match level
Level match after all changes and additions are performed
Pre-processed match level
Level match before any changes or additions
1111 Bayhill Drive, San Bruno, CA 94066 USA
V44-I44-P6-100
•
•
•
•
Input is correct
Input has been correctly parsed
Input has been correctly formatted
V means that there is a single exact match in the reference data.
1111 Bayhill Drive, Suite 290, San Bruno, CA 94066 USA
V55-I55-P7-100
•
•
With the addition of the suite number, this verifies that an actual
delivery point exists in the GKR
This is actually a separate reference from the previous address, due
to the addition of the delivery point
11 GUANG HUA LU BEIJING 100600 China
V33-I44-P6-100
•
•
V33 means we could only verify down to the street level, because
reference data for China only goes down to the street level
Loqate will never return a verification level higher than what’s
available in the reference data
38 Rue de Cracovie Dijon France 21044
P44-I44-P6-100
•
•
•
P stands for a partial verification, instead of V
This is because reference data exists down to the delivery point
level (5) but the input only goes down to the premise level (4)
This means that the engine could assign a higher verification level
with more input
Woking Business Park, Albert Drive, Woking, UK
P33-I33-P0-100
•
•
•
Another partial verification
Here the input only goes down to the street level, so that’s the
highest verification level given
P basically means that Loqate can do a better verification, but not
with the given input
13 Yunosti Str, Moscow, Austria
U00-I04-P6-100
•
•
This has been parsed as an address, but the engine quickly
discovered that there is no Moscow in Austria
Either the country is wrong, or the rest of the address is. The engine
will treat this as unverifiable.
I don’t know what my address is
U00-I00-P0-000
•
•
•
The bane of all address verification
The field is empty, or so completely nonsensical that the system
doesn’t even recognize it as an address
Nothing will be parsed or verified
Ambiguity
Ambiguity is defined as having more than one match
in the reference data.
Ambiguity in the address does not imply the address
is undeliverable
It means two or more distinct addresses can be
classed as part of the input address
We will examine ambiguity at the premise,
thoroughfare, and locality levels
6th Street Austin TX USA
A22-I44-P3-100
•
•
•
Ambiguity at the thoroughfare level
There are both a 6th street and an East 6th street in Austin, Texas,
and the engine can’t determine which one applies.
The engine will only verify down to the locality level, since it knows
Austin, Texas is a real locality
rua francisco bicalho 2400/301 belo horizonte brazil
A44-I44-P6-100
Caiçara Adeláide
Padre Eustáquio
•
•
•
Ambiguity at the locality level
Two premises with the same name exist in two different localities
The engine verifies down to the premise level, but flags it as
ambiguous to signify more than one such premise exists
12/788 beach rd browns bay auckland 0630 new zealand
A33-I44-P6-067
•
•
•
Ambiguity at the premise level
We cannot determine if the input contains a delivery point match or
a premise match
This shows how close the P and A levels are to each other
Search vs Verify AVCs
Verify: 1111 Bayhill Dr San Bruno CA 94066-3027
V44-I44-P7-067
Sometimes Search and Verify will return different AVCs for the same
address
This is because they process the address using different methods
Verify will return the single address in the lexicon that’s an exact match
for the input address
It will not try to complete the address by adding a delivery point to it
Search vs Verify AVCs
Search: 1111 Bayhill Dr Ste 290 San Bruno CA 94066-3027
A54-I55-P8-067
Search will return all addresses in the lexicon that match the input,
including all those with an added delivery point. This is intentional, as
Search suggests multiple matches to the user while they are typing the
input
The A means there is ambiguity due to multiple possible matches
The match level is upgraded to 5 by the addition of a delivery point
Parsing Status
Same levels as the Verification Status
Lexicon Match (first number after I)
Rules and data to guide parsing exist in the
Global Knowledge Repository
Context Match (second number after I)
Address component are parsed based on their position
and other information relevant to the country
Why Parsing?
The parsing level is the extent to which the engine can
recognize the input
The inability to parse an input address can explain a low
verification level
Two kinds of parsing:
Lexicon match attempts to parse the address by using
country specific rules
Context match attempts to parse the address by assigning
possible meaning to elements depending on the context in
which they appear
11 GUANG HUA LU BEIJING 100600 China
V33-I44-P6-100
•
Uses pattern matching (numeric values = premise number) and
lexicon matching depending for each country (‘BEIJING’ matches a
locality in the lexicon for China)
6th Street Austin TX USA
A22-I44-P3-100
Least accurate form of matching and is based on identifying a word
as, for instance, a Thoroughfare based on it being preceded by
something that could be a Premise, and followed by something that
could be a Locality, the latter items being identified through a match
against the reference data or the lexicon
13 Yunosti Str, Moscow, Austria
U00-I04-P6-100
•
•
•
The lexicon matching rules are dependent on the country, and this
does not follow Austrian standards. Therefore, the lexicon score is 0
However the context match recognizes as an address based on the
position and nature of the different fields.
Therefore, it receives an I parsing status
Postcode Status
8 – PostalCodePrimary and PostalCodeSecondary verified
7 – PostalCodePrimary verified and PostalCodeSecondary added or changed
6 – PostalCodePrimary verified
5 – PostalCodePrimary verified with small change
4 – PostalCodePrimary verified with large change
3 – PostalCodePrimary added
2 – PostalCodePrimary identified by lexicon
1 – PostalCodePrimary identified by context
0 – PostalCodePrimary not processed
1111 Bayhill Drive, San Bruno, CA 94066 USA
V44-I44-P6-100
1111 Bayhill Dr
San Bruno CA 94066
1111 Bayhill Drive, Suite 290, San Bruno, CA 94066 USA
V55-I55-P7-100
1111 Bayhill Dr Ste 290
San Bruno CA 94066-3053
•
•
•
The secondary postcode can verify down to the delivery point level
By adding the suite number, we gave the engine enough input to determine the
associated secondary delivery point
This increases the Postcode Status from 6 to 7
6th Street Austin TX USA
A22-I44-P3-100
•
•
We didn’t provide any postcode as input, so the engine had to supply the primary
postcode itself, thus the low postcode of 3
With no premise or delivery point to specify a secondary postcode, the postcode
status cannot be increased beyond its previous level. It will remain 3
Matchscore
The accuracy matchscore gives the similarity between the input
data and closest reference data match
Percentage between 0 and 100
100% means complete similarity
Lower matchscore means more changes done during the
verification process
1111 Bayhill Drive, San Bruno, CA 94066 USA
V44-I44-P6-100
This will be our reference address. The engine didn’t do any
changes before verifying it, hence a match score of 100
1111 Bayhil, San Bruno, CA USA
V42-I44-P3-092
•
•
•
Compared to the previous address, Bayhil is misspelled, Drive is
missing and so is the postcode.
The input verification level is 2, meaning that upon parsing the
engine could only verify down to the locality.
However the engine was able to correct all this, resulting in a V4. It
decreased the match code to 92 from 100 to signify the changes
made. Note the lower postcode status.
12/788 beach rd browns bay auckland 0630 new zealand
A33-I44-P6-067
12/788 Beach Road
Murrays Bay
Auckland 0630
•
•
The engine had to change the locality name from browns bay
to Murrays Bay.
This is a significant change, thus the much lower Match Score
More Important Information
R – Revert to Input verification status
Users can specify the minimum verification
level for an address
Addresses that do not pass this test postverification will be reverted to their input
values, and marked as such
More Important Information
Default country
Tells the engine to use a specific country’s
rules when no country is specified in the input
Force country
Tells the engine to use a specific country’s
rules even when a different country is
specified in the input
More Important Information
Native language
Transliterate addresses between Roman
characters and other character sets
This is not translation, there is no attempt to
preserve the meaning
Transliteration matches corresponding characters
from different alphabets
GeoAccuracy Code (GAC)
P4
1
2
1. Verification Status
P: Point
A single geocode was found matching the input address
I: Interpolated
A geocode was able to be interpolated from the input addresses
location in a range
A: Average
Multiple candidate geocodes were found to match the input
address, and an average of these was returned
U: Unable to geocode
A geocode was not able to be generated for the input address
2. Geocoding Level
4: Premise
3: Thoroughfare
2: Locality
1: AdministrativeArea
0: None
GeoAccuracy Codes: Visualized
Point:
• The unique address 11 Main Street is confirmed to exist
• It is within 21 meters (68 feet) of a point in the global knowledge
repository
GeoAccuracy Codes: Visualized
Interpolated:
•
•
•
13 Main Street does not have a match in the reference DB.
However two addresses on the same street exist, on either side of it
The engine assumes that 13 lies on a vector between them, less than 21
meters away from one or the other
GeoAccuracy Codes: Visualized
Average:
• 12 Main Street does not have a match in the reference DB.
• There are multiple reference points around it, but it is not assumed to lie on a
direct vector between two adjacent points. 12 Main Street is assumed to be in
the resulting polygon
• The engine returns the point but decreases its Geocoding level, and the polygon’s
diagonal distance
999 BAKER WAY STE 320
SAN MATEO CA 94404-1566
loqate.com | everythinglocation.com | lqt.me
AVCDKV1061814
Download