National Tracking Layer Technical Description Document 12-22-2014 Document Version: Version 3.0 Document History ID Changes Date Delivered Author 1.0 Draft 1.0 06-13-2014 RAMPP 2.0 Draft 2.0 10-31-2014 RAMPP 3.0 Draft 3.0 12-22-2014 RAMPP 1 Introduction ......................................................................................................................................... 3 1.1 2 Background ........................................................................................................................................ 3 Business Requirements ....................................................................................................................... 4 2.1 3 Description of Business Requirements ................................................................................................... 4 Methodology and Technical Specification ......................................................................................... 7 3.1 3.2 Base Layers as Input ............................................................................................................................ 7 Methodology of Data Processing ........................................................................................................... 8 3.2.1 Generate “Temp_1” by Performing Spatial Union on Special Land Use Authority Areas and Tribal Land .................. 8 3.2.2 Generate “Temp_2” by Performing Spatial Union on Temp_1 and Jurisdiction ................................................... 9 3.2.3 Generate “Temp_3” by Performing Spatial Union on Temp_2 and HUC12 ......................................................... 9 3.2.4 Generate “Temp_4” by Performing Spatial Union on Temp_3 and Census Block 2010. Perform Area-weighted Population calculation................................................................................................................................................. 9 3.2.5 Generate EHID for each record in Temp_4 .................................................................................................. 10 3.2.6 Generate the Final Output, the National Tracking Layer. Spatially Dissolve Temp_4 by EHIDs. ........................... 10 3.2.7 Add and calculate additional fields in the National Tracking Layer .................................................................. 11 3.2.8 Manual post-processing of the National Tracking Layer. ................................................................................ 11 4 Data Structure of the National Tracking Layer...............................................................................11 4.1 4.2 5 General Information ............................................................................................................................11 Key Attributes....................................................................................................................................11 Known Issues and Other Notes .........................................................................................................12 5.1 5.2 5.3 5.4 5.5 5.6 Artefacts from Union Operation ...........................................................................................................12 Topology ...........................................................................................................................................13 Other Territories of US Not Included ....................................................................................................14 XY Tolerance ....................................................................................................................................14 Automating the Tracking Layer Generation ...........................................................................................14 Using the Comparison Utility ...............................................................................................................15 2 1 Introduction The purpose of this document, National Tracking Layer – Technical Description Document, is to describe the high-level business requirements and technical approaches adopted to create FEMA’s National Tracking Layer. This document includes the background information, business requirements, data sources, technical methodology, and the description of the resulted layer. 1.1 Background A number of Flood Insurance and Mitigation Administration systems require accurate community boundaries and population estimates for program metric and tracking purposes. These systems require reporting at various levels of resolution- community, county, watershed, State, FEMA region, and National. During Risk MAP, many of these systems used a community boundary dataset that was developed circa 2003. FEMA recognized the need to update this information with respect to Census 2010 jurisdictional boundary information while also incorporating Tribal lands and other special jurisdictional boundaries to meet current and evolving business needs. This update was completed in November 2013 and delivered as the community layer. In May 2014, FEMA tasked RAMPP to produce both a National Tracking Layer, and an automated process for creating updates to the National Tracking Layer as constituent components of the layer change. This document describes the output of this effort – the process and the resulted layer with collaborative PTS support. (PTS contractors formed the “Workgroup of National Tracking Layer” in September 2014 to discuss technical topics.) In November 2014, STARR updated the community layer to remove any overlaps in the special land use authority areas. The updated community layer was used as an input to Version 3 of the National Tracking Layer. This version of the tracking layer lends well with maintaining the spatial relationship with the community layer. For this version of the tracking layer, efforts were made to include jurisdictions in the US territories like the American Samoa, Guam, Virgin Islands and Northern Marianas. However, US Census Bureau does not distribute population for these territories at the census block level. Hence, they were not included in this version. This version also addresses gaps and missing watershed IDs by providing input ranks during the spatial union process. 3 2 2.1 Business Requirements Description of Business Requirements The following high-level business requirements were captured: ID Business Requirement Statement Description and Notes - NTL-001 Define a standard Methodology for creating a National Tracking Layer, with specification listed in #NTL-002. 4 The complexity of continuously maintaining, updating, and versioning the dataset requires a well-defined methodology/process to ensure its efficiency as well as its effectiveness. The resulted methodology and process would produce accurate and consistent results, easy to be carried out through automation, and easy for stakeholders to understand. NTL-002 Create a National Tracking Layer, a flat layer in which all polygons are spatially unioned and derived from the following 5 base layers: HUC12 Census Blocks 2010 Special Land Use Authority Areas Jurisdiction Tribal Lands The extents of the National Tracking Layer shall include the 50 States, District of Columbia, Puerto Rico, and Virgin Islands. A unique EHID is assigned to each record that is fixed length and contains information for the county FIPS code, HUC12 ID, Jurisdiction, Special Land Use Authority Area ID, and Tribal Entity ID. The National Tracking Layer shall be topologically sound, without overlapping polygons. Areal interpolation of population counts shall only be performed at an entity equal to or smaller than a Census block. The National Tracking Layer shall be in North America Albers Equal Area Conic projection. 5 - The National Tracking Layer will be in FGDB format for the purpose of easy data exchange and share. - Spatial Union operation on the dataset follows ESRI’s definition. - The unique Enhanced Hybrid Identifier (EHID) can be used to quickly “summarize” data based on county FIPS, Community (Jurisdiction) Area ID, tribal Area ID, HUC12 ID, and Special Authority Area Area ID. It can be extended beyond these five, should needs arise in the future. Topological issues, such as overlapping and gaps, in the “Tracking Layer” could cause problems such as double count of population. The proposed automated process would minimize topological issues in the output layer to a great extent by conducting clean spatial operations. - - When a boundary, such as watershed or community boundary, cut through Census Blocks polygons, these blocks would be split on each side of the cutting boundary. Based on the ratio of the areas of each side, the total population count for those blocks w o u l d split as well. - The resulted National Tracking Layer shall be in Albers Projection, which allows accurate areal calculation, and can manage the databases for the entire extent of the dataset. NTL-003 Produce a script to automate the data processing in the future. - Once the methodology and the automated process are standardized and put in place, incorporating additional base layers and incorporating changes would become much easier, comparing with the past experience. - Through the execution of this model script, which automates both spatial and non-spatial manipulation of data, the National Tracking Layer can be regenerated with less effort to accommodate future changes, such as population count, community boundary change, and watershed boundary shift. Whenever a new boundary layer is to be used, or a change happened on any existing boundaries, the new process would update information stored in the database by performing extensive spatial and relational data manipulations, on a nation-wide basis. - 6 The automation would not only increase efficiency greatly, but also prevent human errors, resulting in better consistency, accuracy, topological correctness, and overall quality. - NTL-004 3 3.1 Source for population shall be solely Census 2010. The highest resolution of population data should be the Census Block data from Census 2010. It is authoritative and offers sufficient granularity for FIMA program needs; at the same time, while still a large dataset with more than 10 million records, it is manageable through a well-defined process. Methodology and Technical Specification Base Layers as Input The following base layers are used as the input for the creation of the National Tracking Layer. Base Layer Name Description Purpose Source A dataset, provided by US Census, containing the smallest geographic unit. Serving as the sole source of population data. http://www.Census.gov Hydrologic Unit layer with 12-digit HUC code. Serving as the lowest level of hydrologic unit. http://www.usgs.gov Special Land Use Authority Areas A GIS layer for local political land use authority areas to enforce NFIP regulations. Serving as the authoritative layer for Special Land Use Authority Areas. Tribal Boundaries 2010 Areas with tribal land use authority Serving as the authoritative layer for tribal land boundaries. A GIS data layer that contains all jurisdiction in US Puerto Rico and other territories. Serving as the authoritative layer for Jurisdiction boundaries. Census blocks 2010 HUC 12 Jurisdiction Boundaries 2010 7 This layer has been updated through March 2014. FEMA 2014 Community Layer FEMA 2014 Community Layer FEMA 2014 Community Layer 3.2 Methodology of Data Processing This document provides a high-level description of the methodology that RAMPP designed and implemented. The PTS Workgroup on this subject provided valuable input during the process. For details of the input data layers, please reference documentation of each input, such as the Community Layer. 3.2.1 Generate “Temp_1” by Performing Spatial Union on Special Land Use Authority Areas and Tribal Land This operation is performed on ESRI’s ArcGIS platform. It can be illustrated in the figure below: ArcGIS Spatial Union Function The resulted layer, Temp_1, contains information for both Special Land Use Authority Areas and Tribal Land, including attributes of unique Tribal Area ID (T_AREA_ID) and Special Land Use Authority Area ID (S_AREA_ID) for each polygon. The layer will be used as input for further spatial manipulation of the data. And the two Area IDs mentioned above will be concatenated with other IDs in the following steps. Due to variable spatial accuracy between the input layers, ranks were provided to them to indicate priority during the union process. During the union operation, the feature class with a higher ranking will get precedence in resolving spatial issues at the edges. This was included to alleviate the gaps generated by the spatial union. For this step, the tribal areas were given a higher ranking of 1 and the special land use authority areas were given a ranking of 2. 8 3.2.2 Generate “Temp_2” by Performing Spatial Union on Temp_1 and Jurisdiction In a similar fashion as in Step 3.2.1, the Jurisdiction layer is “unioned” with the output of the previous step. The resulted layer, Temp_2, now contains information and Area IDs for Tribal Land (T_AREA_ID) Jurisdiction (J_AREA_ID), and Special Land Use Authority Areas (S_AREA_ID). The Area IDs will be concatenated at the end of the process to generate the unique EHID. For this step, the jurisdiction areas were given a higher ranking of 1 and the union output from previous step was given a ranking of 2. 3.2.3 Generate “Temp_3” by Performing Spatial Union on Temp_2 and HUC12 In a similar fashion as in Step 3.2.2, the HUC12 layer is “unioned” with the output of the previous step. The output, Temp_3, now contains information and Area IDs for Tribal Land, Jurisdiction, Special Land Use Authority Areas, and HUC12. The Area IDs will be concatenated at the end of the process to generate the unique EHID. For this step, the HUC12 feature class was given a lower ranking of 2 and the union output from previous step was given a higher ranking of 1. 3.2.4 Generate “Temp_4” by Performing Spatial Union on Temp_3 and Census Block 2010. Perform Area-weighted Population calculation. In a similar fashion as Step 3.2.3, the Census Blocks from Census 2010 are “unioned” with the output of the previous step. The output, Temp_4, now contains information and Area IDs for Jurisdictions, Tribal Lands and Special Land Use Authority Areas; HUC12, and Census Block GEOID. The census blocks GEOID is a 15 character identifier used by the Census Bureau to represent a census block. These unique IDs will be concatenated at the end of the process to generate the unique EHID. Based on the population count in the Census Block layer, population is calculated for each resulted polygon after the spatial union operation. For those Census blocks that are split into multiple ones due to the union, the original population counts would be split accordingly among all the polygons based on the areaweighted algorithm. For example, assuming a Census block with a population of 10,000 is split into two polygons after the union, one is 70% of the size of the original block, and the other is 30%. Then the bigger polygon would carry 7,000 as its population. The other one would carry 3,000 as its population. The area ratio method used is a commonly used mathematical method in the GIS community, especially when other viable approaches are limited. We should look into adopting other products or approaches that provides a better reflection of population distribution. 9 3.2.5 Generate EHID for each record in Temp_4. A new field, “EHID,” is then added to the Temp_4 layer from the previous step. All the previous captured IDs for each polygon, originated from different input layers, will be concatenated into a unique ID – EHID. The EHID is fixed in length – 44 characters in total, and in the following format: C23017_H010600010101_23J0430_23S8120_23T0005 “C” indicates the beginning of a 5-digit County FIPS Code. The county FIPS code was extracted from the GEOID of the census blocks. - “H” indicates the beginning of a 12-digit HUC code. - “J” indicates the Jurisdiction Area ID component. - “S” indicates the Special Authority Area ID component. - T” indicates the Tribal Entity Area ID component. Alternatively, the tribal entity component in an EHID can represent a tribal reservation by substituting “T” with “R” as shown below. The “R” in the EHID indicates that it is part of a tribal reservation land. - C23017_H010600010101_23J0430_23S8120_23R0004 3.2.6 Generate the Final Output, the National Tracking Layer. Spatially Dissolve Temp_4 by EHIDs. At this point, more than 13 million of records are in this Temp_4 database. To reduce the number of records for maximum performance and efficiency, a spatial dissolving operation is performed on Temp_4. All polygons with the same EHID will be aggregated into one, as illustrated in below: During the aggregation (spatial dissolve) process, the population is added to produce the final population in the aggregated area. The resulted layer becomes the final output: the National Tracking Layer. Its data structure is described in Section 4.1. 10 3.2.7 Add and calculate additional fields in the National Tracking Layer Additional fields for the individual components and the Community Identifiers (CIDs) are added to the national tracking layer. The individual components (Area IDs) are calculated as extracts from the EHID and the CIDs are calculated, wherever available, by joining the layer to the individual STARR layers. The main purpose of including these CIDs is to maintain the linkages between the National Tracking Layer and the Community Layer. This may facilitate using the National Tracking Layer for other FEMA programs like the Mitigation Planning Portal (MPP). 3.2.8 Manual post-processing of the National Tracking Layer The union operation creates some gaps inland that will result in the EHID not having the other spatial components. It is expected that the EHID of union polygons along the coast may not have the other components. For inland areas, we need to identify such polygons and update the attributes and re-dissolve to update the final tracking layer. The tracking layer uses the STARR’s December 2014 Community Layer database “as is” without any modification. The attribution on the tracking layer is contingent upon the attribution in the STARR’s community layer database. For further information on the fields, it is recommended to refer to Community Layer guidance previously developed by STARR. 4 4.1 Data Structure of the National Tracking Layer General Information Name National Tracking Layer Type Vector, in FGDB format Date of Creation 12/22/2014 Version Version 3.0 Creator RAMPP, tasked by FEMA Geographic Coordinate GCS_North_American_1983 Projection North_America_Albers_Equal_Area_Conic Spatial Extent US and Puerto Rico Number of Records 4.2 226,251 Key Attributes The tracking layer inputs and outputs were set at a XY tolerance of 0.5m and a XY resolution of 0.05m. 11 Attribute Description Data Type Shape Polygon Geometry Geometry Shape_Area Area of a polygon Double The Enhanced Hybrid ID String Population Count Double J_AREA_ID Identifier from Jurisdictional Boundaries String S_AREA_ID Identifier from Special Land Use String T_AREA_ID Identifier from Tribal_Boundaries String J_CID CID from Jurisdictional Boundaries String S_CID CID from Special Land Use String T_CID CID from Tribal_Boundaries String EHID ADJ_POP 5 Known Issues and Other Notes 5.1 Artefacts from Union Operation The spatial union operation sometimes causes a small portion of the population of one census block “shifting” into other census blocks. As shown in screenshot below, the Census block - GEOID ‘040250006071062’ (red boundary) are portioned into the ‘040250006071062’, ‘04025000671061’ and ‘040250006102024’ census blocks after the union. So, the area-weighted population calculations under-estimate the population for census block ‘040250006071062’ and over-estimate for census blocks ‘04025000671061’ and ‘040250006102024’. 12 Another type of artefact noticed is the “gaps” generated in the final union with census blocks. As shown in the screen capture below, Census Block ‘450790028001065’ has gaps on both sides resulting in a population difference of 56 people when compared to census. 5.2 Topology Spatial union operation could produce overlapping polygons that cause population to be assigned multiple times in the tracking layer. These overlapping polygons should be deleted before the final Dissolve operation is performed. Post-processing procedures like topology validation and population checks should be implemented to minimize these artefacts. There are multipart geometry features in the input layers as well as the final 13 tracking layer. These multipart features may retrieve erroneous population metrics when queried. Sliver polygons are also generated which do not cover a significant area, but, are undesirable by-products due to the complexity of the spatial operations and the multipart geometry features. 5.3 Other Territories of US Not Included For this delivery, the National Tracking Layer does not include Guam, American Samoa, Northern Marianas and Virgin Islands. The Census Bureau does not distribute population at a census block level for these territories. 5.4 XY Tolerance The national tracking layer Version 3.0 is produced with a XY tolerance of 0.5m. A feature dataset is created with a 0.5m XY tolerance (and 0.05m XY resolution). All the inputs are imported into the feature dataset before the tracking layer creation script is run. 5.5 Automating the Tracking Layer Generation o Set up the census blocks as an input layer It is assumed that the census blocks data have been downloaded from the Census Bureau website and compiled following the instructions provided on their site. Join the population attribute from the non-spatial tables (StateName_GEO) into the spatial census blocks files via the GEOID10 field. Merge all the census blocks into a single feature class. Re-project the layer into USA Albers Equal Area Conic projection. Add two fields – CensusPop (long integer type) and CBArea (double type). Calculate those fields. CensusPop field will have the population and the CBArea holds the area of each census block in square meters. o Set up the Geodatabase, XY tolerance, and other input layers Create an empty file geodatabase. Create a feature dataset called “Inputs” at 0.5m XY tolerance and 0.05m XY resolution. You are free to pick a different name too. Import all the inputs into the feature dataset. o Set up and run the Python script Place the python script in a suitable folder location. In Arc Catalog, add a new toolbox in the same folder as the python script. Right click on the toolbox and add a new script. Select the python as the source of the script. Add the 5 parameters in the order given below: Display name: Tribal Areas ; Data type: Feature Class 14 Display name: Special Land Use Authority Areas ; Data type: Feature Class Display name: Jurisdictions; Data type: Feature Class Display name: HUC12 Watersheds ; Data type: Feature Class Display name: Census blocks ; Data type: Feature Class It is imperative that the input parameters are mentioned in this order for the python script to run correctly. Run the Python script. 5.6 Using the Comparison Utility A comparison utility has been developed to facilitate easy comparison between two versions of the tracking layer. The user interface is as shown below. It requires the user to input the locations of the two versions of the tracking layer. The output is a table in the same feature geodatabase as the tracking layer and also a summary log containing the major changes identified between the two. A screenshot of the output table is shown below. Notes: o The comparison utility works on ArcGIS 10 systems only. o The comparison utility assumes that the old and new versions of the tracking layers have a different name. This is not a limitation in the tool, but, ESRI cannot handle joins well if the input layers are named the same. o The comparison utility requires that the feature classes do not reside inside a feature dataset. 15 EHID: Primary Key Field VERS1_POP: Population Field populated from the old tracking layer VERS2_POP: Population Field populated from the new tracking layer POP_DIFFERENCE: Population difference field (VERS1_POP-VERS2_POP) 16