lecture 8 ppt

advertisement
Introduction to Geographic Information Systems
Fall 2013 (INF 385T-28620)
Dr. David Arctur
Research Fellow, Adjunct Faculty
University of Texas at Austin
Lecture 8
October 17, 2013
Geocoding
Outline






Geocoding overview
Polygon geocoding
Linear (street) geocoding
Problems and solutions
Geocoding layer sources
Geocoding in ArcGIS
INF385T(28620) – Fall 2013 – Lecture 8
2
Overview


Process of creating geometric representations
for locations (such as points) from descriptions
of locations (such as street addresses)
Uses a computer program called a geocoding
engine that employs code tables and rules to
standardize address components
INF385T(28620) – Fall 2013 – Lecture 8
3
Examples

City’s economic development department
 Maps technology businesses by street address to determine
technology-rich areas in a city

Hospital
 Maps patients to determine where to open a satellite clinic

Emergency dispatch
 Maps callers’ addresses to determine who should respond to an
emergency

Retail store chain
 Maps store and customer locations, and compares to mapped
competitor locations

Others?
INF385T(28620) – Fall 2013 – Lecture 8
4
Tabular data

Text file or database
 Street addresses
 ZIP Codes
INF385T(28620) – Fall 2013 – Lecture 8
5
Geocoding reference layers

Street centerlines

ZIP Code polygons
INF385T(28620) – Fall 2013 – Lecture 8
6
Lecture 8
POLYGON GEOCODING
ZIP Code geocoding

Method to map data whose geocode is
for a polygon
 Assign each record to its polygon
 Count the records for each polygon
 Join the table to the corresponding
polygon layer
 Symbolize using a choropleth map or
graduated point symbols
INF385T(28620) – Fall 2013 – Lecture 8
8
ZIP Code geocoding
INF385T(28620) – Fall 2013 – Lecture 8
9
ZIP Code geocoding
Points created at ZIP Code centroids
INF385T(28620) – Fall 2013 – Lecture 8
10
ZIP Code geocoding
Points (attendees) spatially joined to ZIP Code polygons
INF385T(28620) – Fall 2013 – Lecture 8
11
ZIP Code geocoding

Choropleth map created
INF385T(28620) – Fall 2013 – Lecture 8
12
Lecture 8
LINEAR (STREET)
GEOCODING
Linear geocoding (streets)

TIGER (Census Bureau) street maps
 Four street address numbers, low to high for each side of a street
segment
100
101
INF385T(28620) – Fall 2013 – Lecture 8
Oak Street
198
199
14
Address components
Number
Street name
Street type
Direction, suffix
Direction, prefix
Unit number
Zone, city
Zone, ZIP Code
125 Oak St E, Apt. 2, Pittsburgh, PA 15213
125 Oak St E, Apt. 2, Pittsburgh, PA 15213
125 Oak St E, Apt. 2, Pittsburgh, PA 15213
125 Oak St E, Apt. 2, Pittsburgh, PA 15213
125 E Oak St, Apt. 2, Pittsburgh, PA 15213
125 Oak St E, Apt. 2, Pittsburgh, PA 15213
125 Oak St E, Apt. 2, Pittsburgh, PA 15213
125 Oak St E, Apt. 2, Pittsburgh, PA 15213
Items for single-number street address:
Address
Unit
City
ZIP Code
125 Oak St E Apt. 2
Pittsburgh
15213
INF385T(28620) – Fall 2013 – Lecture 8
15
Street Intersections

Put intersections in address field
Forbes AV & Craig ST
Grant ST & 5th AV
E North Star RD & Duncan AV

Do not include street numbers
3999 Forbes Ave & 100 Craig ST

Connectors
Any unusual character (e.g., &, @, |)
Just be consistent
16
Geocoding Flowchart
Input
Address
Parse
Address
Yes
Score
Matches
Matches
?
No
Output
No match
Generate
Soundex Key
Best match
>= 90?
Find
Candidates:
No Range &
Soundex Key
INF385T(28620) – Fall 2013 – Lecture 8
No
Yes
Output
Address
17
Geocoding steps
Original address: 125 East Oak Street 15213
Address parsed: |125|East|Oak|Street|15213
Abbreviations standardized: |125|E|Oak|St|15213
Elements assigned to match keys:
[HN]:125 [SN]:Oak[ST]:St [SD]:E [ZP]:15213
Index values calculated:
[HN]:125 [SN]:Oak(Soundex #) [ST]:St [SD]:E [ZP]:15213 (Index #)
INF385T(28620) – Fall 2013 – Lecture 8
18
Soundex index

Matches names based on
how they sound (if indices
match)
 Translates names to a 4-digit
index of 1 letter and 3 numbers
 First character of name remains
unchanged
 Adjacent letters in the name
which have the same Soundex
key are assigned a single digit
 If the end of the name is reached
before filling 3 digits, use zeros
to complete the code
Oake = O-200, Oak = O-200
Smith = S-530, Smythe = S-530
Paine = P-500, Payne = P-500
Callahan = C-450, Calahan = C-450
Key
Letters
1
bfpv
2
cgjkqs
xz
3
dt
4
l
5
mn
6
r
disregard
aehio
uyw
Beadles = B-342, Beattles = B-342
Schultz = S-243, Shults = S-432
http://www.sconsig.com/sastips/soundex-01.htm
http://www.archives.gov/research/census/soundex.html
19
Scoring candidates

Use a rule base to score source
and reference matches
 Start with score of 100
 Subtract points for each mismatch
 Examples from rule base
 Soundex indices match but street
names do not (-2)
 Street type missing in source (-1)
 Street types do not match (-2)
INF385T(28620) – Fall 2013 – Lecture 8
20
Candidate streets
Candidates identified: 125 East Oak Street 15213
From
To
Street
Type
Side
Parity
Direction
Street_
2
98
Oak
St
R
E
W
4344
1
99
Oak
St
L
O
W
4345
100
198
Oak
St
R
E
E
4346
101
199
Oak
St
L
O
E
4357
Candidates scored and filtered:
From
To
Street
Type
Side
Parity
Direction
Street_
100
198
Oak
St
R
E
E
4346
101
199
Oak
St
L
O
E
4357
INF385T(28620) – Fall 2013 – Lecture 8
21
Address matched as point
Best candidate matched
From
To
Street
Type
Side
Parity
Direction
Street_
101
199
Oak
St
L
O
E
4357
Oak St
2
98
100
198
1
99
101
199
Pine
Ave
125
INF385T(28620) – Fall 2013 – Lecture 8
22
Lecture 8
PROBLEMS AND SOLUTIONS
Possible problems

Variations in street names
 Fifth Avenue, Fifth Ave., 5th AV
 Saw Mill Run Blvd, Route 51

Data entry errors
 Fidth Avenue
 Sawmill Run

Place names
 White House, Heinz Field, Empire State Building

Intersections
 Fifth Avenue and Craig Street
INF385T(28620) – Fall 2013 – Lecture 8
24
Possible problems

Zones
 100 Main ST 15101, 100 Main ST 16202

P.O. boxes
 P.O. Box 125

Missing street data
INF385T(28620) – Fall 2013 – Lecture 8
25
Solutions





Clean data before geocoding
Purchase or build high-quality maps (field
verification)
Use postal address standards
Assign house numbers in rural areas
Use alias tables
Alias
Address
White House
1600 Pennsylvania Avenue
Heinz Field
100 Art Rooney Avenue
Empire State Building
350 5th Ave
INF385T(28620) – Fall 2013 – Lecture 8
26
Alias table
Alias
Address
CMU
5000 Forbes Av
Carnegie Mellon
5000 Forbes Av
Carnegie Mellon U
5000 Forbes Av
Carnegie Mellon Univ
5000 Forbes Av
Carnegie Mellon University
5000 Forbes Av
Etc.
INF385T(28620) – Fall 2013 – Lecture 8
27
Lecture 8
GEOCODING LAYER
SOURCES
US Census TIGER files


Digitized from 1:100,000 scale maps
Pros:
 Free and easy to download
 Uniform across jurisdictional lines (nationally)
 Street address formatting works well with
standard GIS geocoding capacities

Cons:
 Incomplete data
 Placement of address point is approximate
INF385T(28620) – Fall 2013 – Lecture 8
29
TIGER line attribute table

Census street centerlines extracted from
lines that make up census boundaries
 tl_2009_04013_edges.shp
 "FEATCAT" = 'S'
INF385T(28620) – Fall 2013 – Lecture 8
30
MAF/TIGER

Master Address File / Topologically Integrated
Geographic Encoding and Referencing
 MAF is a complete inventory of housing units and businesses in the
United States and its territories
TIGER is a collection of lines as we know it

MAF produces mail-out census forms and ACS random samples

MAF/TIGER produces maps for on-the-ground census takers
 MAF is confidential
 TIGER 2009 and newer have much improved positional accuracy
INF385T(28620) – Fall 2013 – Lecture 8
31
US Census ZIP Codes



ZIP Code Tabulation Areas (ZCTAs)
Approximations for census purposes
Do not reflect actual ZIP Code areas and are
not kept up to date
INF385T(28620) – Fall 2013 – Lecture 8
32
Local jurisdictions

Parcel address points
 Pros: Accurate placement of residential location
(parcel positional data is often very good;
e.g., +/- 5 meters or less)
 Cons:
 May need to contact individuals within
agencies to get most up-to-date data
 May not be available, or may cost a
substantial amount of money
 Data ends at jurisdictional boundaries
 Data files tend to be very large
INF385T(28620) – Fall 2013 – Lecture 8
33
Local jurisdictions

Street centerlines
 Pros:
 Potential to be more up to date (often yearly
updates, sometimes quarterly)
 Often accuracy adequate to meet city
infrastructure needs (typically +/- 10 meters or
less)
 Cons:
 May need to contact individuals within
agencies to get most up-to-date data
 Data ends at jurisdictional boundaries
INF385T(28620) – Fall 2013 – Lecture 8
34
Private vendors

StreetMap USA
 National dataset (US and Canada)
 Address locators prebuilt, can geocode across the United
States

GDT Dynamap/2000 US street data
 Small fee for individual ZIP Code layers.
 Map layers are the highest quality street map layers in
terms of appearance, completeness, and accuracy.
 More than one million changes every quarter
 Maps include more than 14 million US street segments and
include postal boundaries, landmarks, water features, and
other features
INF385T(28620) – Fall 2013 – Lecture 8
35
Online geocoding


ArcGIS.com, Google, GeoCommons,
Maptive, etc.
Pros:
 Fast and easy to access
 Free or inexpensive

Cons
 Loss of privacy/confidentiality
 Accuracy
 Usability in desktop GIS
INF385T(28620) – Fall 2013 – Lecture 8
36
Lecture 8
GEOCODING IN ARCGIS
Create address locator

ArcCatalog
INF385T(28620) – Fall 2013 – Lecture 8
38
Choose address locator style


Skeleton of the address locator
Based on data tables and reference layer
INF385T(28620) – Fall 2013 – Lecture 8
39
Address locator styles
Style
Reference
dataset
geometry
Address
Reference dataset
search
representation
parameters
Example
Applications
US Address—
Lines
Dual Ranges
Address range for All address 320 Madison St.
Finding a house
both sides of street elements in a N2W1700 County Rd. on a specific side
segment
single field
105-30 Union St.
of the street
US Address— Points or
Single House polygons
Each feature
represents an
address
INF385T(28620) – Fall 2013 – Lecture 8
All address 71 Cherry Ln.
elements in a W1700 Rock Rd.
single field
38-76 Carson Rd.
Finding parcels,
buildings, or
address points
40
Note: there are other styles…
INF385T(28620) – Fall 2013 – Lecture 8
41
Other styles… (build custom locators)

Queens, NY

Salt Lake City, UT


Regions of Illinois &
Wisconsin
Germany
… and many others!
INF385T(28620) – Fall 2013 – Lecture 8
42
Choose reference layer

Streets, ZIP Codes
INF385T(28620) – Fall 2013 – Lecture 8
43
ArcGIS locator parameters
INF385T(28620) – Fall 2013 – Lecture 8
44
Geocode in ArcMap





Add tabular data and streets layer
Add address locator
Geocode addresses
View geocoding results
Interactively rematch addresses
INF385T(28620) – Fall 2013 – Lecture 8
45
Address rematching

Investigate
unmatched
addresses
 Generally requires
expertise and
knowledge of local
streets
 Compare a street name
in the attributes of the
streets table and the
address table.
INF385T(28620) – Fall 2013 – Lecture 8
46
Prepare log file


Log file includes reasons why addresses did
not get geocoded.
Useful for future work on cleaning addresses
or repairing street maps
Incorrect address
Possible reason/solution
490 Penn Avenue
Missing ZIP Code
111 Hawksworth
Spelled incorrectly
900 Smallman Street
TIGER street missing
900 Lib Ave
Spelled incorrectly
INF385T(28620) – Fall 2013 – Lecture 8
47
Summary






Geocoding overview
Polygon geocoding
Linear (street) geocoding
Problems and solutions
Geocoding layer sources
Geocoding in ArcGIS
Next week: Tutorial chapter 9, and discussion
of term projects – see iSchool syllabus links:
http://courses.ischool.utexas.edu/Arctur_David/2013/fall/385T/schedule.php
INF385T(28620) – Fall 2013 – Lecture 8
48
Download