MassGIS, 20110615, Christian.jacqz@state.ma.us
The Data Model for Address Points and Master Address List provides a single integrated schema for GIS and tabular data in which address records are linked to geographic points representing their best available location. Address records can include all levels of geographic specification, not only street address but also building, floor, unit and so on as specified in the FGDC address standard. The data model is intended to support the migration from managing multiple address listings in different locations to a more desirable configuration in which there is a single shared, authoritative listing. That process involves merging and scrubbing address databases, comparing different geographic source datasets and doing field data collection. The data model provides the input elements of the single repository plus metadata and other elements used to create the repository; the reality is that in many communities there will be a prolonged transition to the ideal state.
In part to support a transition from many disparate sources of address information, and also in order to implement best practices for data management in an ESRI environment, the model is a relational one, with one-to-one and many-to-one relationships which are managed with primary and foreign keys. The model also incorporates topological relationships, in particular point-in-polygon processing is critical to establishing identity or foreign key relationships between polygon features and point features.
The overall concept of the workflow is simple. Input addresses are parsed or aggregated as necessary into three parts – number, street and location. Location, which USPS calls sub-address, includes content that is more detailed than the numbered address, content like “East Side”, “Hurley Building” or “Unit 6.”
Unique street names are extracted from the street field and standardized. Then the remaining content for each record is parsed and standardized and through a series of automated and manual editing steps, each address is linked to a single address point, generally starting with linking addresses records to parcels. Address points can be one of the following: a parcel polygon centroid, a building centroid, or a building entry point. There is no ‘stacking’ of points, instead multiple addresses may be represented by a single point if they cannot be resolved to more accurate geographic locations. This is done to avoid misleading geographic representations and awkwardness in editing.
The data model also includes two kinds of polygon features, parcels and building footprints, both of which are being developed as statewide layers, and stores the relationships between the points and the
“containing” polygons based on point in polygon topology as described above. The model recognizes the dynamic and evolving nature of address data in that each address is linked to a point which represents the most current and best available geographic knowledge of its location. The type of location is stored as an attribute of the point, so that a data user can clearly differentiate which points represent more accurate locations, and not all addresses are necessarily represented by the same kind of point. This allows for making the best quality data continuously available in a production environment as the data are being improved over time. Also, to support public safety applications, the model includes the representation of “access” to an address, defined as the point where a driveway or
other vehicular route internal to a parcel intersects a mapped and named street. This access point is not required, and will usually only be shown when the location of an address point (usually a structure) is not readily visible from the named street and / or multiple access possibilities are mutually exclusive.
Master Street List
The Master Street List, as the name implies, is a master list of standardized street names in the community, with the elements of the street name parsed and standardized using the FGDC schema and then concatenated into a single field, the full_str_std field. But it also serves as a lookup or alias table, with a record for each variant of the street name listed in its original form along with the standardized version – this is so that disparate datasets with no other common identifier can be linked using the street name without having to push the standardized name back into the source dataset. This approach allows for ready comparison and editing of the standardized street name according to whether different address records are found to refer to the same geographic feature. The original name of the street from the source dataset is carried in the full_str field. Any location information (eg “rear”) that was carried in the original name is retained in the full_str field and also parsed out into a location field. Location information may indicate further research is needed. For example, “off” sometimes means that the address in question is on a short driveway directly off the street listed, and sometimes that it is on a longer, un-named, shared private road off the street listed with multiple structures which would need to be researched. Finally, complete metadata about the source of the original street name is part of the record, including when the name was added to the list, any information about edits and a free-form field for notes.
Master Address List
This table is analogous to the street listing – all addresses from various sources are listed and can be readily compared. Like the street table, the master address list stores metadata for each address record. The most important field from a model perspective is the pt_id field which stores the link to the point representing the location of the address. Additional links to an access point and to other geographic features are stored in the acc_id, basestr_id and misc_data and misc_id fields. The original full address including number and location is retained, both in the full_addr field and, after an initial parsing, into three fields containing the number, street name and location components of the original address (i.e. not standardized.) Again, this allows for editing in place and for running a series of scripts to achieve the desired standardization. The master address list also contains a number of fields that are designed to store input from just one type of address source. For example, the master address list accepts street segment ranges for left or right side of a street segment – the fields side and parity store information which is only relevant to these records. Similarly, there are fields that store information from the level III parcel layer. The purpose of putting all this information in one table is to allow for queries based on self-joins and non-equi-joins.
The full breakdown of the address number is contained in this table, such that the address prefix (yes, these do exist), number and suffix are parsed into separate fields. For the location info, key/value pairs of the FGDC standard are numbered and should be populated in order from least to most geographic
specificity. Thus, key/value pairs (type=BUILDING, value=HURLEY) (type=FLOOR, value=6) and
(type=ROOM, value=601) would be stored in BFU fields 1,2 and 3 respectively.
Address Point
The address point dataset, as explained in the concepts section above, stores a single point for each address that has a point location. The primary key for this point is generated from the x and y coordinate values. There are three types of address points – parcel centroids, building centroids and building entry points. Note that for U-shaped parcels and buildings, the centroid must fall within the polygon. As illustrated in the data model, the geographic relationships between building entry points, building footprints and parcels are managed in this table using primary/foreign keys whose assignment depends on the sub-type of the individual feature. Thus a building entry point record will store a reference to the parent building footprint and to the parent parcel. The building footprint ID field would be null for a parcel centroid.
In addition, in line with the objective of supporting a workflow of building the single address data repository, there are fields to record the status of research or data collection on individual records.
These do not have domains assigned as yet.
Building Footprints
The statewide building footprint layer is one which we will initiate this year and will hopefully complete early in FY13. The source is current orthophoto imagery, possibly licensed imagery from Digital Globe from spring 2011, possibly the 2008-9 imagery that we already have. The primary key for this layer is also a centroid loc_id with the same caveat that for U-shaped buildings the centroid must lie within the polygon (tools exist to guarantee this). Metadata fields are provided to allow for use with local building footprint layers.
Tax Parcels are also included in the schema – the field layout is documented in the parcel data standard v.2 on the MassGIS web site. The coordinates for the loc_id field, as with the building polys, are usually the same as the parcel centroid except as noted for U-shaped parcels.