The Census Bureau's Geographic Support System Initiative – An Update Council of Professional Associations on Federal Statistics September 21, 2012 Tim Trainor Chief, Geography Division U.S. Census Bureau Census Geographic Support – Major Initiatives Over Time For the 1990 Census – Introduced TIGER For the 2000 Census – Introduced the Master Address File For the 2010 Census – Realigned the street network through the MAF/TIGER Enhancement Program For the 2020 Census – The GSS Initiative A Change in Methodology In Taking a Census Prior to 1960 • Door to door enumeration 1960 1970 1980 1990 • First mail-out • Census • ~95% of the • First use of census created an U.S. TIGER address population is • USPS • Address list register for now included delivered a created from in the mailquestionnaire densely the ground-up populated out/mail-back to every USPS routes census household on their routes • First mail• Address list out/mail-back created from • Enumerators census the ground-up collected the completed • Urban areas forms mailed back their forms; rural area forms were collected by enumerators 2000 2010 2020 • Birth of the • Continuous • Introduction MAF update of the of Targeted MAF to • MAF/TIGER Address Enhancement support the Canvassing ACS Project and the • 1990 Address • Address Geographic canvassing list was used Support covered the as a starting System entirety of the Initiative point U.S. prior to (GSS-I) • Began Census day receiving the DSF from the USPS Improving Data Quality 1: Establish quantitative measures of address and spatial data quality Existing MAF/TIGER Data New incoming data 3: Monitor and Improve the quality of the: 2: Assign Quality Indicators to MAF/TIGER data IT processes for updating the MAF/TIGER System Geographic products output from the MAF/TIGER System Improved Partnerships TIGERweb Enhanced collaboration Community TIGER Expand Existing Partnerships Crowd Sourcing Web-based Address Tools New Tools Partners Volunteered Geographic Information (VGI) Utilize new tools and programs to acquire address and spatial data in the most efficient and least intrusive ways New and Enhanced Programs Enhanced Feedback Engage New Partners Address Feedback adhering to Title 13 confidentiality laws Build on and Expand Feedback for Spatial Features Research Activities • GSS-I Working Groups • Address Summit – Address Pilots • External Expert Reports • Research Project Examples – – – – iSimple GSS Lab Data Viewer Quality Indicators Census 2010 Road Update Operations Evaluation – Targeted Address Canvassing Continuum Project/Contract Management Quality Assessments FY2011 10 GSS-I Working Groups Policy Research and Development To date, 11 IPTs formed Highway Median “Flag” Improvements Parcel Data and Centroid Use Feature Coverage and Sources Address Coverage and Sources Partnerships MAF/TIGER Integration/ Linkage Geocoding Global Positioning Systems (GPS) Problem Capture Tool Quality Indicators Improving Group Quarters Data iSIMPLE Metadata Improvements Features Source Evaluation The CATT Better Meeting MAF “Facility” Data User Needs MTAG Census Address Summit Goals • Educate our partners about the Geographic Support System Initiative (GSS-I) and the benefits of targeted address canvassing • Gain a common understanding regarding the definition of an address • Learn how our partners are collecting, using, and maintaining address data Address Summit Participants Participants - All Levels of Government Observations • • • • Continuous partnerships are needed and welcome Public safety is a driving factor for local governments Urban and rural areas will pose different challenges Address coverage varies and is sometimes not known or quantifiable • Communication and engagement are key Results of the Address Summit • Five Pilot Projects • Address Authority Outreach and Support for Data Sharing Efforts • FGDC Address Standard and Implementation • Federal/State/Tribal/Local Address Management Coordination • Data Sharing – Local/State/USPS/Census • Hidden/Hard to Capture Addresses 2012 Address Pilot Schedule Moving Forward These pilots will provide: • The Census Bureau with a testing ground for future geographic partnership programs • The Census Bureau with an opportunity to identify the best methods for the continual update of the MAF/TIGER System • www.census.gov/geo/www/gss/address_summit/ 15 Benefits of Establishing an Census Address Ontology • Establishing an Ontology allows for – Effective communication – Common language – Ease the burden of data sharing – Explicit terminology, concepts, and relationships Expert Research at Census • Five reports created by outside experts: – The State and Anticipated Future of Addresses and Addressing – Identifying the Current State and Anticipated Future Direction of Potentially Useful Developing Technologies – Measuring Data Quality – Use of Handheld Computers and the Display/Capture of Geospatial Data – Researching Address and Spatial Data Digital Exchange – http://www.census.gov/geo/www/gss/reports.html • Summer at Census: – Steve Guptill; USGS Chief Scientist (Retired) • Quantifying the Quality of the MAF/TIGER Database – David Cowen; Distinguished Professor Emeritus • Use of Parcel Data to Update and Enhance Census Bureau Geospatial Data – http://www.census.gov/geo/www/gss/qaewg.html • 2 In-Progress Reports – Change Detection – Master Address File (MAF) Evaluation Analysis of the MAF/TIGER System • iSIMPLE – Evaluation of road features in TIGER – Is TIGER consistent with imagery? – 852,090 grid cells reviewed • 94% had NO missing features • 5% had 4 or less missing features • 70% had NO misaligned features • 26% had 4 or less misaligned features – First web service based review – Research will assist with targeting efforts 18 iSIMPLE Missing Road Features GSS Lab Data Viewer • An on-line, interactive mapping tool to facilitate visualization of data and information • Examples include: – 2010 Census Data • Address Canvassing adds • Type A adds • Undeliverable as Addressed – Delivery Sequence File Statistics – Natural Disaster Information 20 21 22 Quality Indicators • Evaluating the current quality of the MTDB - Addresses Features Geographic areas Geocodes • And only evaluate MTDB • Unit of work is the (current) census tract 23 Address Indicators • Overall Address QIs - Address consistency Mailability Deliverability Locatability Geocode accuracy Tests for ‘other’ 24 Feature Indicators • Overall Feature QIs - Spatial accuracy Feature naming Address ranges Feature classification 25 Geographic Area Indicators • For each Geographic Area, four major tests or sub-indicators - Local review/approval of areas Regional review/approval of areas Program review/approval of areas Independent subject matter review/approval of areas 26 Geographic Area Indicators • Additional tests for statistical criteria, attributes, type of submission, contiguity, etc… • Also tests for geographic interaction (slivers), and block size and shape 27 Geocode Indicators • Combines specific sub-indicators from each other category - Locatability and geocode accuracy (Address) - Spatial accuracy & address ranges (Feature) - Block size & shape (Geography) 28 Overall indicators & weighting • Addresses, Features, Geographic Areas, and Geocodes QIs are then aggregated according to subject matter formulas • Each census tract will receive a single overall score, and category scores where relevant • History and tendency will be tracked 29 External sources • Quality Indicators are MTDB only • In the future, external sources may also help determine MTDB quality, such as: - Population estimates - Building permits (new development) - Comparison to Imagery • Additional tests to check for completeness of MTDB (omission/commission) 30 Tract profiles • Additional ability to adjust Quality Indicators based upon profile elements of the tract, such as: - Natural disaster Unique address types Rapidly changing development Special land use areas 31 Rapid Landscape Change: Picher, OK • Census 2000: 708 housing units – 621 occupied – 87 vacant • 2010 Census: 30 housing units – 10 occupied – 20 vacant The Result • All census tracts will be tested and ranked • Work and updates can then be targeted to specific areas most in need of update - Prioritization of internal work - Prioritization of partner contact and file ingestion - Improved resource allocation 33 2010 Road Update Operations Evaluation Project Scope: The project evaluated the spatial accuracy of new road edges added to the MAF/TIGER database (MTDB) by 2010 Decennial Update Operations. The Decennial Operations in the Study: • • • • • • Address Canvassing Update Leave Update Enumerate Enumeration at Transitory Locations Group Quarters Enumeration Group Quarters Validation 2010 Road Update Operations Evaluation Hypothesis (1): By using imagery to systematically assess the spatial accuracy of road edges added by different operations, we can choose update methods that consistently produce higher quality linear features. 2010 Road Update Operations Evaluation Hypothesis (2): Road updates made with GPS were more spatially accurate than paper-based road updates. 2010 Road Update Operations Evaluation Project Phases: 1. SQL Metrics – Queried MTDB for counts of new road edges added during 2010 operations, by county. 2. Sample Design – Worked with DSSD, to design a sample of counties, as random as possible, that would include all Operations and all Regions. 3. Spatial Evaluation – Assessed selected edges, overlaid on imagery. Tested spatial accuracy of the imagery to a CE95 of 5 meters or less. 4. Data Analysis – With DSSD, obtained metrics from observations. 5. Conclusions 2010 Road Update Operations Evaluation We looked at over 42,000 edges… in 72 counties……. Conclusions • Road updates made with GPS were more spatially accurate than paperbased road updates. • • An estimated 90% of road edges added with GPS were spatially accurate. An estimated 67% of road edges digitized from paper-based operations were spatially accurate. • By using imagery to systematically assess the spatial accuracy of road edges added by different operations, we can choose update methods that consistently produce higher quality linear features. Suggestions for Further Study • Find other ways to glean what contributes to spatial quality using the data obtained in this review. • Are edges with SMIDs (Spatial Metadata IDs) more likely to be spatially accurate than edges without SMIDs? • Are roads that were named more likely to be spatially accurate than those not named? • Why was the incidence of roads with no name information 38%? Were road names not collected? • Is there a correlation between the use of NAIP imagery and the number of edges not visible in the imagery because NAIP is collected leaf-on? • Is it possible to operationalize or automate this review so that it may be applied at a larger scale? • What other operations that add linear features would benefit from the use of imagery for quality control? Targeted Address Canvassing Continuum Targeted Address Canvassing Continuum Targeted Address Canvassing Continuum Targeted Address Canvassing Continuum Scores, Census Tract 6069.04, Howard County, Maryland 2010 Base Overall Score = 93.7 2010 Base: Category Score Ratio of 2010 HU counts to 2010 MAF units 81.5 Percentage of area governments participating in LUCA (one local government) 100.0 Type A non-ID adds as percent of total housing units (Score = 100 – Percent Type A non-ID adds) 99.5 Mail back rate 81.2 No successful CQR cases (no cases = score of 100) 100.0 Undeliverable as Addressed (UAA) as a percentage of total housing units (Score = 100 – Percent UAA) 91.4 DSF Stability Index 97.3 Ratio of Spring 2010 DSF to 2010 Census housing unit count 98.5 Total Points Overall Score 749.4 93.7 Current State: Category Score Quality Indicator Score Percent City Style Addresses Lack of/presence of hidden units Targeted Address Canvassing Continuum Scores, Census Tract 6069.04, Howard County, Maryland Current State Overall Score = 96.9 Lack of/presence of informal or unique housing situations Lack of/presence of seasonal housing (Score = 100 = pct seasonal vacant HUs) Conversion from single to multi-unit or multi-unit to single (Score = 100 – conversions as pct of all housing units) 100.0 99.0 100.0 99.5 100.0 Area not known to subdivide single housing units into multi-unit structures Lack of/presence of hard to count populations 99.0 Percent MAF TIGER agreement on geocodes 98.0 Percent MAF address confirmation rate (matching rate) with admin records Percentage of city-style address MAF units preferred MSPs Area classified/not classified as "needs to be canvassed" in field survey staff feedback 76.3 100.0 DSF Stability Index GEO Change detection processes indicate no changes have occurred 97.3 Overall Score 96.9 High Stability Census Tracts Tract 3406, Harris County, TX Tract 4302.03, Fairfax County, VA Category 00-10 DSF Stability Ratio Fall 09 DSF to 2010 Census HU Count Value 1.0 1.0 Category Value 00-10 DSF Stability 1.0 Ratio Fall 09 DSF to 2010 Census HU Count 1.0 Ratio Spring 09 DSF to 2010 Census HU Count 1.0 Ratio S09 DSF to 2010 Census HU Count 1.0 Type A adds UAA 2010 HU DSF Spring 11 DSF Fall 10 Census 2000 HU Ad Can True Adds Ad Can Deletes 3 7 988 988 988 989 0 3 Type A adds 5 UAA 14 2010 HU 910 DSF spring 11 910 DSF fall 10 910 Census 2000 HU 911 Ad Can True Adds 1 Ad Can Deletes 17 • Questions?