NationalTrackingLayer_TDD_v3

advertisement
National Tracking Layer
Technical Description Document
12-22-2014
Document Version: Version 3.0
Document History
ID
Changes
Date Delivered
Author
1.0
Draft 1.0
06-13-2014
RAMPP
2.0
Draft 2.0
10-31-2014
RAMPP
3.0
Draft 3.0
12-22-2014
RAMPP
1
Introduction ......................................................................................................................................... 3
1.1
2
Background ........................................................................................................................................ 3
Business Requirements ....................................................................................................................... 4
2.1
3
Description of Business Requirements ................................................................................................... 4
Methodology and Technical Specification ......................................................................................... 7
3.1
3.2
Base Layers as Input ............................................................................................................................ 7
Methodology of Data Processing ........................................................................................................... 8
3.2.1
Generate “Temp_1” by Performing Spatial Union on Special Land Use Authority Areas and Tribal Land .................. 8
3.2.2
Generate “Temp_2” by Performing Spatial Union on Temp_1 and Jurisdiction ................................................... 9
3.2.3
Generate “Temp_3” by Performing Spatial Union on Temp_2 and HUC12 ......................................................... 9
3.2.4
Generate “Temp_4” by Performing Spatial Union on Temp_3 and Census Block 2010. Perform Area-weighted
Population calculation................................................................................................................................................. 9
3.2.5
Generate EHID for each record in Temp_4 .................................................................................................. 10
3.2.6
Generate the Final Output, the National Tracking Layer. Spatially Dissolve Temp_4 by EHIDs. ........................... 10
3.2.7
Add and calculate additional fields in the National Tracking Layer .................................................................. 11
3.2.8
Manual post-processing of the National Tracking Layer. ................................................................................ 11
4
Data Structure of the National Tracking Layer...............................................................................11
4.1
4.2
5
General Information ............................................................................................................................11
Key Attributes....................................................................................................................................11
Known Issues and Other Notes .........................................................................................................12
5.1
5.2
5.3
5.4
5.5
5.6
Artefacts from Union Operation ...........................................................................................................12
Topology ...........................................................................................................................................13
Other Territories of US Not Included ....................................................................................................14
XY Tolerance ....................................................................................................................................14
Automating the Tracking Layer Generation ...........................................................................................14
Using the Comparison Utility ...............................................................................................................15
2
1 Introduction
The purpose of this document, National Tracking Layer – Technical Description
Document, is to describe the high-level business requirements and technical approaches
adopted to create FEMA’s National Tracking Layer. This document includes the
background information, business requirements, data sources, technical methodology,
and the description of the resulted layer.
1.1
Background
A number of Flood Insurance and Mitigation Administration systems require accurate
community boundaries and population estimates for program metric and tracking
purposes. These systems require reporting at various levels of resolution- community,
county, watershed, State, FEMA region, and National. During Risk MAP, many of these
systems used a community boundary dataset that was developed circa 2003. FEMA
recognized the need to update this information with respect to Census 2010 jurisdictional
boundary information while also incorporating Tribal lands and other special
jurisdictional boundaries to meet current and evolving business needs. This update was
completed in November 2013 and delivered as the community layer.
In May 2014, FEMA tasked RAMPP to produce both a National Tracking Layer, and an
automated process for creating updates to the National Tracking Layer as constituent
components of the layer change. This document describes the output of this effort – the
process and the resulted layer with collaborative PTS support. (PTS contractors formed the
“Workgroup of National Tracking Layer” in September 2014 to discuss technical topics.)
In November 2014, STARR updated the community layer to remove any overlaps in the
special land use authority areas. The updated community layer was used as an input to
Version 3 of the National Tracking Layer. This version of the tracking layer lends well
with maintaining the spatial relationship with the community layer.
For this version of the tracking layer, efforts were made to include jurisdictions in the US
territories like the American Samoa, Guam, Virgin Islands and Northern Marianas.
However, US Census Bureau does not distribute population for these territories at the
census block level. Hence, they were not included in this version. This version also
addresses gaps and missing watershed IDs by providing input ranks during the spatial union
process.
3
2
2.1
Business Requirements
Description of Business Requirements
The following high-level business requirements were captured:
ID
Business Requirement Statement
Description and Notes
-
NTL-001
Define a standard Methodology for creating
a National Tracking Layer, with
specification listed in #NTL-002.
4
The complexity of continuously maintaining,
updating, and versioning the dataset requires
a well-defined methodology/process to
ensure its efficiency as well as its
effectiveness. The resulted methodology
and process would produce accurate and
consistent results, easy to be carried out
through automation, and easy for
stakeholders to understand.
NTL-002
Create a National Tracking Layer, a flat
layer in which all polygons are spatially
unioned and derived from the following 5
base layers:

HUC12

Census Blocks 2010

Special Land Use Authority Areas

Jurisdiction

Tribal Lands
The extents of the National Tracking Layer
shall include the 50 States, District of
Columbia, Puerto Rico, and Virgin Islands.
A unique EHID is assigned to each record
that is fixed length and contains information
for the county FIPS code, HUC12 ID,
Jurisdiction, Special Land Use Authority
Area ID, and Tribal Entity ID.
The National Tracking Layer shall be
topologically sound, without overlapping
polygons.
Areal interpolation of population counts
shall only be performed at an entity equal to
or smaller than a Census block.
The National Tracking Layer shall be in
North America Albers Equal Area Conic
projection.
5
-
The National Tracking Layer will be
in FGDB format for the purpose of
easy data exchange and share.
-
Spatial Union operation on the dataset
follows ESRI’s definition.
-
The unique Enhanced Hybrid Identifier
(EHID) can be used to quickly “summarize”
data based on county FIPS, Community
(Jurisdiction) Area ID, tribal Area ID,
HUC12 ID, and Special Authority Area
Area ID. It can be extended beyond these
five, should needs arise in the future.
Topological issues, such as overlapping and
gaps, in the “Tracking Layer” could cause
problems such as double count of
population. The proposed automated
process would minimize topological issues
in the output layer to a great extent by
conducting clean spatial operations.
-
-
When a boundary, such as watershed or
community boundary, cut through Census
Blocks polygons, these blocks would be split
on each side of the cutting boundary. Based
on the ratio of the areas of each side, the
total population count for those blocks
w o u l d split as well.
-
The resulted National Tracking Layer
shall be in Albers Projection, which
allows accurate areal calculation, and can
manage the databases for the entire extent
of the dataset.
NTL-003
Produce a script to automate the data
processing in the future.
-
Once the methodology and the automated
process are standardized and put in place,
incorporating additional base layers and
incorporating changes would become much
easier, comparing with the past experience.
-
Through the execution of this model script,
which automates both spatial and non-spatial
manipulation of data, the National Tracking
Layer can be regenerated with less effort to
accommodate future changes, such as
population count, community boundary
change, and watershed boundary shift.
Whenever a new boundary layer is to be
used, or a change happened on any existing
boundaries, the new process would update
information stored in the database by
performing extensive spatial and relational
data manipulations, on a nation-wide basis.
-
6
The automation would not only increase
efficiency greatly, but also prevent human
errors, resulting in better consistency,
accuracy, topological correctness, and overall
quality.
-
NTL-004
3
3.1
Source for population shall be solely Census
2010.
The highest resolution of population data
should be the Census Block data from
Census 2010. It is authoritative and offers
sufficient granularity for FIMA program
needs; at the same time, while still a large
dataset with more than 10 million records, it
is manageable through a well-defined
process.
Methodology and Technical Specification
Base Layers as Input
The following base layers are used as the input for the creation of the National
Tracking Layer.
Base Layer Name
Description
Purpose
Source
A dataset, provided
by US Census,
containing the
smallest geographic
unit.
Serving as the sole
source of population
data.
http://www.Census.gov
Hydrologic Unit
layer with 12-digit
HUC code.
Serving as the lowest
level of hydrologic
unit.
http://www.usgs.gov
Special Land Use
Authority Areas
A GIS layer for
local political land
use authority areas
to enforce NFIP
regulations.
Serving as the
authoritative layer for
Special Land Use
Authority Areas.
Tribal Boundaries
2010
Areas with tribal
land use authority
Serving as the
authoritative layer for
tribal land boundaries.
A GIS data layer
that contains all
jurisdiction in US
Puerto Rico and
other territories.
Serving as the
authoritative layer for
Jurisdiction
boundaries.
Census blocks 2010
HUC 12
Jurisdiction
Boundaries 2010
7
This layer has been
updated through March
2014.
FEMA 2014 Community
Layer
FEMA 2014 Community
Layer
FEMA 2014 Community
Layer
3.2
Methodology of Data Processing
This document provides a high-level description of the methodology that RAMPP
designed and implemented. The PTS Workgroup on this subject provided valuable input
during the process. For details of the input data layers, please reference documentation
of each input, such as the Community Layer.
3.2.1 Generate “Temp_1” by Performing Spatial Union on Special Land Use Authority
Areas and Tribal Land
This operation is performed on ESRI’s ArcGIS platform. It can be illustrated in
the figure below:
ArcGIS Spatial Union Function
The resulted layer, Temp_1, contains information for both Special Land Use
Authority Areas and Tribal Land, including attributes of unique Tribal Area ID
(T_AREA_ID) and Special Land Use Authority Area ID (S_AREA_ID) for each
polygon. The layer will be used as input for further spatial manipulation of the
data. And the two Area IDs mentioned above will be concatenated with other
IDs in the following steps. Due to variable spatial accuracy between the input
layers, ranks were provided to them to indicate priority during the union process.
During the union operation, the feature class with a higher ranking will get
precedence in resolving spatial issues at the edges. This was included to alleviate
the gaps generated by the spatial union. For this step, the tribal areas were given a
higher ranking of 1 and the special land use authority areas were given a ranking
of 2.
8
3.2.2 Generate “Temp_2” by Performing Spatial Union on Temp_1 and Jurisdiction
In a similar fashion as in Step 3.2.1, the Jurisdiction layer is “unioned” with the
output of the previous step. The resulted layer, Temp_2, now contains
information and Area IDs for Tribal Land (T_AREA_ID) Jurisdiction
(J_AREA_ID), and Special Land Use Authority Areas (S_AREA_ID). The
Area IDs will be concatenated at the end of the process to generate the unique
EHID. For this step, the jurisdiction areas were given a higher ranking of 1 and
the union output from previous step was given a ranking of 2.
3.2.3 Generate “Temp_3” by Performing Spatial Union on Temp_2 and HUC12
In a similar fashion as in Step 3.2.2, the HUC12 layer is “unioned” with the
output of the previous step. The output, Temp_3, now contains information and
Area IDs for Tribal Land, Jurisdiction, Special Land Use Authority Areas, and
HUC12. The Area IDs will be concatenated at the end of the process to generate
the unique EHID. For this step, the HUC12 feature class was given a lower
ranking of 2 and the union output from previous step was given a higher ranking
of 1.
3.2.4 Generate “Temp_4” by Performing Spatial Union on Temp_3 and Census Block
2010. Perform Area-weighted Population calculation.
In a similar fashion as Step 3.2.3, the Census Blocks from Census 2010 are
“unioned” with the output of the previous step. The output, Temp_4, now
contains information and Area IDs for Jurisdictions, Tribal Lands and Special
Land Use Authority Areas; HUC12, and Census Block GEOID. The census
blocks GEOID is a 15 character identifier used by the Census Bureau to
represent a census block. These unique IDs will be concatenated at the end of
the process to generate the unique EHID.
Based on the population count in the Census Block layer, population is calculated
for each resulted polygon after the spatial union operation. For those Census
blocks that are split into multiple ones due to the union, the original population
counts would be split accordingly among all the polygons based on the areaweighted algorithm. For example, assuming a Census block with a population of
10,000 is split into two polygons after the union, one is 70% of the size of the
original block, and the other is 30%. Then the bigger polygon would carry 7,000
as its population. The other one would carry 3,000 as its population. The area
ratio method used is a commonly used mathematical method in the GIS
community, especially when other viable approaches are limited. We should look
into adopting other products or approaches that provides a better reflection of
population distribution.
9
3.2.5 Generate EHID for each record in Temp_4.
A new field, “EHID,” is then added to the Temp_4 layer from the previous step.
All the previous captured IDs for each polygon, originated from different input
layers, will be concatenated into a unique ID – EHID.
The EHID is fixed in length – 44 characters in total, and in the following format:
C23017_H010600010101_23J0430_23S8120_23T0005
“C” indicates the beginning of a 5-digit County FIPS Code. The county FIPS
code was extracted from the GEOID of the census blocks.
- “H” indicates the beginning of a 12-digit HUC code.
- “J” indicates the Jurisdiction Area ID component.
- “S” indicates the Special Authority Area ID component.
- T” indicates the Tribal Entity Area ID component.
Alternatively, the tribal entity component in an EHID can represent a tribal
reservation by substituting “T” with “R” as shown below. The “R” in the
EHID indicates that it is part of a tribal reservation land.
-
C23017_H010600010101_23J0430_23S8120_23R0004
3.2.6 Generate the Final Output, the National Tracking Layer. Spatially Dissolve
Temp_4 by EHIDs.
At this point, more than 13 million of records are in this Temp_4 database. To
reduce the number of records for maximum performance and efficiency, a spatial
dissolving operation is performed on Temp_4. All polygons with the same EHID
will be aggregated into one, as illustrated in below:
During the aggregation (spatial dissolve) process, the population is added to
produce the final population in the aggregated area. The resulted layer becomes
the final output: the National Tracking Layer. Its data structure is described in
Section 4.1.
10
3.2.7
Add and calculate additional fields in the National Tracking Layer
Additional fields for the individual components and the Community Identifiers (CIDs)
are added to the national tracking layer. The individual components (Area IDs) are
calculated as extracts from the EHID and the CIDs are calculated, wherever available,
by joining the layer to the individual STARR layers. The main purpose of including
these CIDs is to maintain the linkages between the National Tracking Layer and the
Community Layer. This may facilitate using the National Tracking Layer for other
FEMA programs like the Mitigation Planning Portal (MPP).
3.2.8
Manual post-processing of the National Tracking Layer
The union operation creates some gaps inland that will result in the EHID not having
the other spatial components. It is expected that the EHID of union polygons along the
coast may not have the other components. For inland areas, we need to identify such
polygons and update the attributes and re-dissolve to update the final tracking layer.
The tracking layer uses the STARR’s December 2014 Community Layer database “as is”
without any modification. The attribution on the tracking layer is contingent upon the
attribution in the STARR’s community layer database. For further information on the fields,
it is recommended to refer to Community Layer guidance previously developed by STARR.
4
4.1
Data Structure of the National Tracking Layer
General Information
Name
National Tracking Layer
Type
Vector, in FGDB format
Date of Creation
12/22/2014
Version
Version 3.0
Creator
RAMPP, tasked by FEMA
Geographic Coordinate
GCS_North_American_1983
Projection
North_America_Albers_Equal_Area_Conic
Spatial Extent
US and Puerto Rico
Number of Records
4.2
226,251
Key Attributes
The tracking layer inputs and outputs were set at a XY tolerance of 0.5m and a XY
resolution of 0.05m.
11
Attribute
Description
Data Type
Shape
Polygon Geometry
Geometry
Shape_Area
Area of a polygon
Double
The Enhanced Hybrid ID
String
Population Count
Double
J_AREA_ID
Identifier from Jurisdictional Boundaries
String
S_AREA_ID
Identifier from Special Land Use
String
T_AREA_ID
Identifier from Tribal_Boundaries
String
J_CID
CID from Jurisdictional Boundaries
String
S_CID
CID from Special Land Use
String
T_CID
CID from Tribal_Boundaries
String
EHID
ADJ_POP
5
Known Issues and Other Notes
5.1 Artefacts from Union Operation
The spatial union operation sometimes causes a small portion of the population of one
census block “shifting” into other census blocks. As shown in screenshot below, the
Census block - GEOID ‘040250006071062’ (red boundary) are portioned into the
‘040250006071062’, ‘04025000671061’ and ‘040250006102024’ census blocks after
the union. So, the area-weighted population calculations under-estimate the population
for census block ‘040250006071062’ and over-estimate for census blocks
‘04025000671061’ and ‘040250006102024’.
12
Another type of artefact noticed is the “gaps” generated in the final union with census
blocks. As shown in the screen capture below, Census Block ‘450790028001065’ has
gaps on both sides resulting in a population difference of 56 people when compared to
census.
5.2 Topology
Spatial union operation could produce overlapping polygons that cause population to be
assigned multiple times in the tracking layer. These overlapping polygons should be
deleted before the final Dissolve operation is performed. Post-processing procedures like
topology validation and population checks should be implemented to minimize these
artefacts. There are multipart geometry features in the input layers as well as the final
13
tracking layer. These multipart features may retrieve erroneous population metrics when
queried. Sliver polygons are also generated which do not cover a significant area, but,
are undesirable by-products due to the complexity of the spatial operations and the
multipart geometry features.
5.3 Other Territories of US Not Included
For this delivery, the National Tracking Layer does not include Guam, American
Samoa, Northern Marianas and Virgin Islands. The Census Bureau does not
distribute population at a census block level for these territories.
5.4 XY Tolerance
The national tracking layer Version 3.0 is produced with a XY tolerance of
0.5m. A feature dataset is created with a 0.5m XY tolerance (and 0.05m XY
resolution). All the inputs are imported into the feature dataset before the
tracking layer creation script is run.
5.5 Automating the Tracking Layer Generation
o
Set up the census blocks as an input layer
It is assumed that the census blocks data have been downloaded from the Census
Bureau website and compiled following the instructions provided on their site.
 Join the population attribute from the non-spatial tables (StateName_GEO)
into the spatial census blocks files via the GEOID10 field.
 Merge all the census blocks into a single feature class.
 Re-project the layer into USA Albers Equal Area Conic projection.
 Add two fields – CensusPop (long integer type) and CBArea (double type).
 Calculate those fields. CensusPop field will have the population and the
CBArea holds the area of each census block in square meters.
o Set up the Geodatabase, XY tolerance, and other input layers
 Create an empty file geodatabase.
 Create a feature dataset called “Inputs” at 0.5m XY tolerance and 0.05m
XY resolution. You are free to pick a different name too.
 Import all the inputs into the feature dataset.
o Set up and run the Python script
 Place the python script in a suitable folder location.
 In Arc Catalog, add a new toolbox in the same folder as the python script.
 Right click on the toolbox and add a new script. Select the python as the
source of the script.
 Add the 5 parameters in the order given below:
 Display name: Tribal Areas ; Data type: Feature Class
14

Display name: Special Land Use Authority Areas ; Data type:
Feature Class
 Display name: Jurisdictions; Data type: Feature Class
 Display name: HUC12 Watersheds ; Data type: Feature Class
 Display name: Census blocks ; Data type: Feature Class
It is imperative that the input parameters are mentioned in this order for the python
script to run correctly.
Run the Python script.
5.6 Using the Comparison Utility
A comparison utility has been developed to facilitate easy comparison between
two versions of the tracking layer. The user interface is as shown below. It
requires the user to input the locations of the two versions of the tracking layer.
The output is a table in the same feature geodatabase as the tracking layer and also a
summary log containing the major changes identified between the two. A screenshot
of the output table is shown below.
Notes:
o The comparison utility works on ArcGIS 10 systems only.
o The comparison utility assumes that the old and new versions of the tracking
layers have a different name. This is not a limitation in the tool, but, ESRI
cannot handle joins well if the input layers are named the same.
o The comparison utility requires that the feature classes do not reside inside a
feature dataset.
15
EHID: Primary Key Field
VERS1_POP: Population Field populated from the old tracking layer
VERS2_POP: Population Field populated from the new tracking layer
POP_DIFFERENCE: Population difference field (VERS1_POP-VERS2_POP)
16
Download