Emily Wier Assignment 5: data quality 1. Children today are spending more of their time in front of a computer or television instead of playing outside. Does this trend reflect the increasing urbanization of our society and the lack of open spaces for children to play? In this project, I will research the accessibility that children have to parks in the city of Somerville. This will be done by investigating the distance between elementary schools and open spaces. Which areas of Somerville lack access to open spaces? Which children are most likely to be found at home after school or playing in the park? A reasonable walking distance for a child is estimated to be 0.25 miles. As such, I would like to maximize accuracy and think that an accuracy of 300 feet is reasonable for this project. 2. To best illustrate the discrepancies of accuracy amongst the various data sets that could be used for this project, I am highlighting an area of interest near my house. The data show three different road data along Broadway St in Somerville between Powderhouse and Teele Squares; Somerville roads are in red, 2000 census TIGER roads are in yellow, and StreetMap USA roads are in green. Data from Somerville have the highest positional accuracy when compared to world imagery taken from ArcGIS online. Data from StreetMap USA are not complete for some minor roads (i.e.: Corinthian Road). Because this project will be investigating walkability, it is necessary to have data for all roads, major and nonmajor. I would not use this data set in my analysis. The 2000 census TIGER roads do not follow the orthophoto and cut through houses. The encircled region in this image represents the greatest discrepancy between the census and Somerville data; the distance between the two lines is 45 feet. Given the accuracy requirements of my project, all data would be suitable. However, the Somerville data set should be used because it has the highest precision. 3. To illustrate the discrepancies between different hydrology data sets, I am comparing data from 2000 census TIGER and MassGIS for Spy Pond in Arlington, MA. Although this region is not in my specified survey area, the data is still applicable. There are many parks and schools in Arlington and the same analysis could be performed on this city. The first photo is the orthophoto from ArcGIS online. The following two photos have 2000 census TIGER hydrology data and MassGIS hydrology data. Note the island in Spy Pond is completely absent in the TIGER data set. Also, there is higher positional accuracy for the MassGIS data set than for TIGER. The discrepancies between lines range from -173 to +203 feet. However, data from MassGIS most closely fit the orthophoto and therefore would be the better data set to use in this project. In addition, the attribute table for MassGIS is more complete than that of the 2000 census TIGER data; MassGIS has information on type of habitat (i.e.: wetland, surface water, etc). 4. Somerville road data contains no metadata. This therefore calls into question the validity of using data with no metadata in a project. StreetMap USA data contains source data from 2000 census TIGER. It was mapped using a scale of 1:50,000 but there is no mention of the source maps used in the process. This scale gives the map a positional accuracy of +/- 75 feet. The data from 2000 census TIGER were scanned from source maps, which were usually USGS 1:100,000 scale digital line graphs, USGS 1:24,000 scale quadrangles, and the US Census Bureau’s 1980 geographic base files. However, the handbook Technical Documentation of 2000 TIGER line files also states that additional source maps were used. It would be helpful to know what these additional sources were. The goal of the TIGER files is not accuracy; “[the mission of the US Census Bureau] does not require very high levels of positional accuracy in its geographic products. Its files and maps are designed to show only the relative positions of elements.” As such, TIGER data should be used with this in mind. Positional accuracy is stated as +/- 167 feet. MassGIS hydrology data was digitized from USGS 1:25,000 topographic quadrangle maps. This translates to a map accuracy of approximately +/- 40ft. Data are also from USGS digital line graphs, scanned mylar separates from USGS, and data from the MassDEP wetlands datalayer. Data from MassDEP are from 1:12,000 scale,stereo color-infrared photography from staff at UMass Amherst from 1990-2000. In addition, MassGIS gives information on how the data were edited as well as recent updates to the data set. 5. Open space near Powderhouse circle is not consistent between data sources. There are minor discrepancies between the shape of the park borders (the parks labeled with stars). However, the largest park in this photo, Tufts field, is completely absent from the Somerville open space data set. I speculate that Somerville did not include this because most of the field is for athletic purposes but the lower base of the field is open to the public. In addition, other parks/open space are digitized in MassGIS but not in Somerville. If there were metadata for Somerville parks, then perhaps this discrepancy could be resolved. Data on school locations obtained from MassGIS generally have high precision with the orthophoto. For example, Winter Hill Community School and the Lincoln School at Thurston St are both coded to the center of their respective buildings with respect to both the orthophoto and parcel data obtained from MassGIS. Upon checking the names of these schools searching online, a Lincoln School in Somerville, MA cannot be located. There is no record for Dr. Albert F. Argenziano School at Lincoln Park (located on Washington St); therefore I am assuming that the record for Lincoln School should actually be Dr. Argenziano School. 6. In terms of positional accuracy, these layers would be suitable for my project. My project does not require high accuracy and I feel that all layers are within my benchmark accuracy of 300 feet. The biggest issue I will have will be with the data on schools. I have found that some schools are incorrectly labeled, do not have the correct address, or are completely absent from the data. 7. Most of the data sets used appear to be complete. I just showed that the MassGIS school data is not complete. The 2000 census TIGER data is not complete. Part of the Mystic River is completely absent from this data layer. The Charles River and the headwaters of the Mystic are on the map. However, the actual water body is absent. 8. For data on hydrology and street locations, currency is not of critical importance. All TIGER data is from 2000, the last year the last census was produced. Data from Somerville has no metadata, so there is no way of knowing how current the data are. MassGIS open space data is from 2008. MassGIS school data is from 2009. MassGIS building footprints are from 2002. MassGIS hydrography data is from 2009. 9. There is a large difference in complexity of the attribute table. TIGER data has poor complexity in the attribute table whereas the MassGIS hydrography data contains more features. Somerville parks data list the name of each park but MassGIS open space data gives much more information on management, public access, etc. Data on schools contains school name (which may or may not be accurate, as we have already seen), principal name, grades offered, etc. It is important to consider the features that are included in the attribute table when determining whether or not to use a particular data set.