GIS Assignment 4: GIS Data Quality Assessment Tai Ullmann For this assignment, my project is to identify neighborhoods in Boston that my cousin might want to live in. He recently had a baby and is looking to move into Boston. His wife is a doctor and he is a professor, so I am looking at several factors, including proximity to schools, libraries, hospitals and open space. I would like to be able to recommend specific neighborhoods, so positional accuracy is very important but could have an error up to several blocks. I would like the data to be accurate enough to produce large scale maps. Required attributes include address information for schools, libraries and hospitals. Open space data needs to include information on if the area is open to the public. It is important the data is complete for open space, hospitals, libraries and schools. For schools I need information to include both public and private schools through high school. Hydrology and streets can be less complete and exclude the level of detail that includes alleys and small streams. The more recent the data the better, however this project can include data from previous years, as many of the establishments being measured do not change greatly over a couple of years. It is important streets and hydrology are connected. Compared to the aerial imagery, both the TIGER and MassGIS road data sets have errors – the most visible being identified roads that do not exist (figure 1) and unidentified roads that do exist (figure 2) in the imagery data. Figure 1. Figure 2. The MassGIS data set is based on 1:5,000 scale, with an accuracy of +/- around 13.33 feet. But, looking at the map ,the quantitative positional accuracy ranges from 5 feet to 150 feet. The TIGER data set quantitative positional accuracy is based on 1:100,000 and 1:24,000-scales, which accounts to an accuracy of +/- 40 to 166.67 feet. It is important to note the error rate is probably higher, as more error can occur when the data is digitized. Looking at the map, it seems the data is off by an estimated 30 feet to 100 feet. In terms of its currency, the MassGIS data set is better for the project because the data was updated in April 2012 and the TIGER data is from 2010. However, in terms of positional accuracy, the TIGER data is better for the project as it has a lower quantitative positional accuracy error and includes more roads than the MassGIS data set when compared to the aerial imagery data (figure 3). In terms of attribute information required, either data set would work, as I don’t need any additional attribute information. However, it is important to note the TIGER data has more information. The TIGER data set seems to be most complete as it has more roads that do not exist in MassGIS. Figure 3. Figure 4 Similar to the road data, both MassGIS and TIGER hydrology data have some errors. The MassGIS hydrology data’s scale is 1:25,000, with an error around +/- 40 feet. The TIGER data’s scale is based on 1:100,000 and 1:24,000-scales, which accounts to an accuracy of +/- 40 to 166.67 feet. It is unclear which scale is used. When comparing the data to the imagery, on average, both dataset‘s quantitative positional accuracy is off by around 80 feet (figure 4). In terms of currency, the TIGER data is more current as it is based off of the 2010 census and MassGIS hydrology was updated in 2009. In terms of positional accuracy, both data sets are off by the same amount. However, the MassGIS data has more information and includes more water outlined than the TIGER data. Furthermore, the MassGIS data has better connectivity, as the TIGER data has many rivers that do not connect with each other. Additional attribute information is not required beyond the location of the river. However, MassGIS has more attribute information as it classifies water type. This information might be useful in further analysis. The school location data is from MassGIS. The data was refined either from 1:5,000 or 1:25,000 scale ortho imagery. Thus, the positional accuracy ranges from around +/13 to 40 feet. I cannot provide an accurate quantitative assessment of positional accuracy because I don’t know which scales were used for each school and I don’t know how much additional error occurred during the digitization process. The data set seems complete because it includes public, private, charter, collaborative, and special education schools from preschool to high school for all of Boston. This covers the school grades I am interested in, as well as both public and private schools. This information was available in the data’s metadata file. The data was last updated February 12, 2012, which is current enough for my project. The attribute information seems quite accurate with a lot of useful information. The attribute information includes the names, grades and type of school for each point. It also includes contact information for the school and the current principle as of February 2012. This is more information than I need but is useful for further analysis of a neighborhood. The hospital information is also from MassGIS. The data scale is 1:25,000 with a corresponding error of around +/-40 feet. I assume more error occurred during the digitization process. To my knowledge, this data seems somewhat complete. The metadata explains the data only contains “acute care” hospitals, which includes “the majority” of medical-surgical, pediatric, obstetric and maternity beds. This makes me concerned that there are some hospitals missing that may not meet the acute care guidelines. But it does seem to cover the entire area I am interested in. This data was last updated in August 2009, which is a little out of date but current enough for this project. The attribute information is less robust than the school data but does include the hospital name, address and if it has an emergency room. This is helpful but it would be more useful if the attribute information contained more information about the hospital types. The library information is also from MassGIS. The metadata does not include a scale, so I cannot provide the quantitative positional accuracy of the data. The data also seems incomplete. The metadata says the dataset only includes libraries that meet the Massachusetts Board of Library Commissioners requirements but that some libraries do not meet these requirements and thus, are not included. However, libraries as part of schools are included – which is useful. The data was last updated October 2004, which is not current enough for this project. The attribute information is useful. It includes the basic necessary information (name, address and type) but it also includes information on if the library is part of a larger institution, as well as the method used to determine the location of the library. This is very useful to determine how accurate some of the locations of the library are. I especially like the breakdown of types of libraries, as it tells me which ones are open to the public and accessible to my cousin. The open space data is from MassGIS. The data uses a 1:25,000 scale, which corresponds to an error of around +/- 40 feet. The data seems quite complete, however the metadata warns that open space changes frequently and requires a collaborative effort. Therefore, there may be incomplete data that requires more extensive research and knowledge of the area. This data was last updated on September 21, 2012, which is very useful and the most current data used in the project. This is especially helpful given the incomplete data warning. The attribute information is very robust. The most useful information is the public access code, which shows if the area is open to the public. It is also helpful to see what the primary purpose of the land is. The attribute information provides more than enough information for my project needs.