DOH_Geocoding_Protocol_June_1_2012 - NM-IBIS

advertisement
Drafted June 1, 2012
NMDOH Geocoding Protocol
Processes outlined below are represented in the attached flowchart. The geocoder should obtain the most upto-date versions of reference datasets such as parcel files and ancillary lists and other tools referenced by name
in the protocol. See Appendix 1: List of Reference Datasets. Please note that Appendix 1 will continue to be
updated.
BLUE PROCESS: EVALUATE ADDRESS QUALITY (‘AUDIT’)
Audit addresses in database and classify & summarize address type according to a standardized coding scheme
(see Address Type Table in process flow chart) and add address type variable to each record. Flag records with
discordance between city & county & zip code using file that crosswalks city, counties, and zipcode. (Note: This
is a replicable manual or batched step performed by DOH Data Stewards or GIS specialists prior to delivery to
geo-coding contractor).
GREEN PROCESS:
Green Step 1. STANDARDIZE COMPLETE ADDRESSES
Submit records with complete addresses to address clean-up software such as Semaphore ZP4. This also
provides some measure of evaluating and/or correcting city-county-zip discordance. Preference: data
stewards perform this step, before county estimates generated, or county-specific resources are used.
Green Step 2. GEOCODE RECORDS FROM STEP 1 WITH COMPLETE ADDRESSES USING CURRENT BEST
PRACTICES.
Points are matched with 20-foot offset. Use the most appropriate and available geo-referenced
databases such as: NAVTEQ NM Road Centerline network, TomTom Multinet street network, ESRI street
network, local, high quality geocoding reference files (e.g. parcel files and other local available files.
Add match type (manual vs. automated), match tool and version used, and match score (0 to 100) to the
geocoded records.
GOLD PROCESS: PROCESS ALL UNMATCHED RECORDS FROM Green STEP 2 AND THOSE WITH INCOMPLETE OR
NON-STANDARDIZED ADDRESSES AS DETERMINED BY AUDIT.
Visually check address fields and MANUALLY or BATCH correct: misspellings, incomplete addresses, address
abbreviations, towns misspelled, and data input errors such as incorrect city (address and ZIP do not match city).
Geocode records with incomplete addresses that match an ancillary list (see Address Types Table). Follow
Appendix 2: Recommended Manual Geocoding Methods to code incomplete addresses (street name only,
intersection, rural or highway route, location only, etc). Add match type (manual vs. automated), and match
score (0 to 100) to the geocoded records.
Gold Toolbox: DOH maintains tools for Gold Process in central location and improves reference datasets
(address directories) over time with addresses matched. DOH will maintain a catalog of reference
datasets, parcel files, and ancillary lists and other tools (Appendix 1: List of Reference Dataset).
ORANGE PROCESS: PROCESS LOCATIONAL TYPE ADDRESSES AND RE-PROCESS REJECTED RECORDS FROM GOLD
Process AUTOMATICALLY AND USING INTERACTIVE REMATCH.
Drafted June 1, 2012
‘Interactive Match’ is defined as visual checking of rejected records, manual correction, manual position location
or approximation with live map, application of special GIS tools such as parcel files, group quarters lists,
alternate or specialized address directories, etc. Maintain tools for Gold Process and improve reference datasets
(address directories) over time with addresses matched. Add match type (manual vs. automated), and match
score (0 to 100) to the geocoded records.
RED PROCESS: PROCESS RECORDS STILL UNMATCHED RESULTING FROM ALL PROCESSES
Match unmatched record to zip code or populated place using ESRI ZIP code centroid file or GNIS populated
place file. Use GNIS FIRST for rural and frontier areas; Use ZIP CODE FIRST for large cities (for example,
population greater than 50,000 or more than 1 zip code). Add city population size, the match type (manual vs.
automated), and match score (0 to 100) to the geocoded records. Check for multiple cities or towns with the
same name. Use county and other information on record to determine correct location. That is, we will end up
with one set of XY coordinates rather than multiple (e.g., one for GNIS centroid, one for ZIP Code centroid, one
for address). For small area analysis, zip code centroids are commonly removed.
OUTPUT DATASET: REVIEW SUPPLEMENTARY FIELDS TO ALL MATCHED RECORDS AND PRODUCE THE FINAL
DELIVERABLE FILE.
a) Conduct final review of any records that lack XY coordinates
b) Add FIPS codes for block and tract number of match locations
c) Ensure all records include indicator(s) for the quality of the match (match type, match score,
Address, GNIS, Unmatched, etc.)
d) Produce GIS shapefiles including all relevant fields
e) Provide a narrative about the geocoding process which may include any deviations from the
protocol and why these were necessary, match tools and versions used, and barriers experienced.
POST-GEO-CODING: SUMMARY STATISTICS ON MATCH (THIS IS THE ROLE OF THE DATA STEWARD).
a) Compare results to original Address Audit to determine if expected results were achieved.
b) Produce aggregate files with key counts and rates at census tract and small area levels.
Download