Emily Wier Assignment 5: data quality 1. Children today are

advertisement
Emily Wier
Assignment 5: data quality
1. Children today are spending more of their time in front of a computer or television instead of playing
outside. Does this trend reflect the increasing urbanization of our society and the lack of open spaces
for children to play? In this project, I will research the accessibility that children have to parks in the city
of Somerville. This will be done by investigating the distance between elementary schools and open
spaces. Which areas of Somerville lack access to open spaces? Which children are most likely to be
found at home after school or playing in the park? A reasonable walking distance for a child is estimated
to be 0.25 miles. As such, I would like to maximize accuracy and think that an accuracy of 300 feet is
reasonable for this project.
2. To best illustrate the discrepancies of accuracy amongst the various data sets that could be used for
this project, I am highlighting an area of interest near my house. The data show three different road
data along Broadway St in Somerville between Powderhouse and Teele Squares; Somerville roads are in
red, 2000 census TIGER roads are in yellow, and StreetMap USA roads are in green. Data from
Somerville have the highest positional accuracy when compared to world imagery taken from ArcGIS
online. Data from StreetMap USA are not complete for some minor roads (i.e.: Corinthian Road).
Because this project will be investigating walkability, it is necessary to have data for all roads, major and
nonmajor. I would not use this data set in my analysis. The 2000 census TIGER roads do not follow the
orthophoto and cut through houses. The encircled region in this image represents the greatest
discrepancy between the census and Somerville data; the distance between the two lines is 45 feet.
Given the accuracy requirements of my project, all data would be suitable. However, the Somerville
data set should be used because it has the highest precision.
3. To illustrate the discrepancies between different hydrology data sets, I am comparing data from 2000
census TIGER and MassGIS for Spy Pond in Arlington, MA. Although this region is not in my specified
survey area, the data is still applicable. There are many parks and schools in Arlington and the same
analysis could be performed on this city.
The first photo is the orthophoto from ArcGIS online.
The following two photos have 2000 census TIGER
hydrology data and MassGIS hydrology data. Note the
island in Spy Pond is completely absent in the TIGER
data set. Also, there is higher positional accuracy for
the MassGIS data set than for TIGER. The
discrepancies between lines range from -173 to +203
feet. However, data from MassGIS most closely fit the
orthophoto and therefore would be the better data
set to use in this project. In addition, the attribute
table for MassGIS is more complete than that of the
2000 census TIGER data; MassGIS has information on
type of habitat (i.e.: wetland, surface water, etc).
4. Somerville road data contains no metadata. This therefore calls into question the validity of using
data with no metadata in a project.
StreetMap USA data contains source data from 2000 census TIGER. It was mapped using a scale of
1:50,000 but there is no mention of the source maps used in the process. This scale gives the map a
positional accuracy of +/- 75 feet.
The data from 2000 census TIGER were scanned from source maps, which were usually USGS 1:100,000
scale digital line graphs, USGS 1:24,000 scale quadrangles, and the US Census Bureau’s 1980 geographic
base files. However, the handbook Technical Documentation of 2000 TIGER line files also states that
additional source maps were used. It would be helpful to know what these additional sources were.
The goal of the TIGER files is not accuracy; “[the mission of the US Census Bureau] does not require very
high levels of positional accuracy in its geographic products. Its files and maps are designed to show only
the relative positions of elements.” As such, TIGER data should be used with this in mind. Positional
accuracy is stated as +/- 167 feet.
MassGIS hydrology data was digitized from USGS 1:25,000 topographic quadrangle maps. This
translates to a map accuracy of approximately +/- 40ft. Data are also from USGS digital line graphs,
scanned mylar separates from USGS, and data from the MassDEP wetlands datalayer. Data from
MassDEP are from 1:12,000 scale,stereo color-infrared photography from staff at UMass Amherst from
1990-2000. In addition, MassGIS gives information on how the data were edited as well as recent
updates to the data set.
5. Open space near Powderhouse circle is not consistent between data sources. There are minor
discrepancies between the shape of the park borders (the parks labeled with stars). However, the
largest park in this photo, Tufts field, is completely absent from the Somerville open space data set. I
speculate that Somerville did not include this because most of the field is for athletic purposes but the
lower base of the field is open to the public. In addition, other parks/open space are digitized in
MassGIS but not in Somerville. If there were metadata for Somerville parks, then perhaps this
discrepancy could be resolved.
Data on school locations obtained from
MassGIS generally have high precision with
the orthophoto. For example, Winter Hill
Community School and the Lincoln School
at Thurston St are both coded to the center
of their respective buildings with respect to
both the orthophoto and parcel data
obtained from MassGIS.
Upon checking the names of these schools
searching online, a Lincoln School in
Somerville, MA cannot be located. There is no record for Dr. Albert F. Argenziano School at Lincoln Park
(located on Washington St); therefore I am assuming that the record for Lincoln School should actually
be Dr. Argenziano School.
6. In terms of positional accuracy, these layers would be suitable for my project. My project does not
require high accuracy and I feel that all layers are within my benchmark accuracy of 300 feet. The
biggest issue I will have will be with the data on schools. I have found that some schools are incorrectly
labeled, do not have the correct address, or are completely absent from the data.
7. Most of the data sets used appear to be complete. I just showed that the MassGIS school data is not
complete. The 2000 census TIGER data is not complete. Part of the Mystic River is completely absent
from this data layer. The Charles River and the headwaters of the Mystic are on the map. However, the
actual water body is absent.
8. For data on hydrology and street locations, currency is not of critical importance. All TIGER data is
from 2000, the last year the last census was produced. Data from Somerville has no metadata, so there
is no way of knowing how current the data are. MassGIS open space data is from 2008. MassGIS school
data is from 2009. MassGIS building footprints are from 2002. MassGIS hydrography data is from 2009.
9. There is a large difference in complexity of the attribute table. TIGER data has poor complexity in the
attribute table whereas the MassGIS hydrography data contains more features. Somerville parks data
list the name of each park but MassGIS open space data gives much more information on management,
public access, etc. Data on schools contains school name (which may or may not be accurate, as we
have already seen), principal name, grades offered, etc. It is important to consider the features that are
included in the attribute table when determining whether or not to use a particular data set.
Download