Georeferencing Introduction: Collaboration to Automation Michelle Koo, Carol Spencer, David Bloom, Nelson Rios Museum of Vertebrate Zoology (UC Berkeley), VertNet, & Tulane University 1. Georeferencing 2. Collaborations 3. Automation What is a georeference? A numerical description in a coordinate system of a place. What we have: Localities we can read ID 1 2 3 4 5 6 7 8 9 Species Lynx rufus Pudu puda Canis lupus Felis concolor Lama alpaca Panthera leo Sorex lyelli Orcinus orca Ursus arctos Locality Dawson Rd. N Whitehorse cerca de Valdivia 20 mi NW Duluth Pichi Trafúl near Cuzco San Diego Zoo Lyell Canyon, Yosemite 1 mi W San Juan Island Bear Flat, Haines Junction What we want: Localities we can map Darwin Core •Metadata standards initially built off of Dublin Core Metadata Standards •Collections of any kind of biological objects or data. •Terminology associated with biological collection data. •Striving for compatibility with other biodiversity-related standards. •Facilitating the addition of components and attributes of biological data. Darwin Core Location Terms HigherGeography waterbody, island, islandGroup continent, country, countryCode, stateProvince, county, municipality locality minimumElevationInMeters, maximumElevationInMeters, minimumDepthInMeters, maximumDepthInMeters Darwin Core Georeference Terms decimalLatitude, decimalLongitude geodeticDatum coordinateUncertaintyInMeters coordinatePrecision pointRadiusSpatialFit footprintWKT, footprintSRS, footprintSpatialFit georeferencedBy, georeferenceProtocol georeferenceSources georeferenceVerificationStatus georeferenceRemarks What is a georeference? A numerical description of a place that can be mapped. “Davis, Yolo County, California” Coordinates: 38.5463 -121.7425 Coordinates: 38.5463 -121.7425 Horizontal Geodetic Datum: NAD27 Horizontal Geodetic Datum: NAD27 “point method” Data Quality data have the potential to be used in ways unforeseen when collected. the value of the data is directly related to the fitness for a variety of uses. “as data become more accessible many more uses become apparent” – Chapman 2005 the MaNIS/HerpNET/ORNIS guidelines follow best practices (Chapman and Wieczorek 2006) to enhance data quality and value What is an acceptable georeference? A numerical description of a place that can be mapped and that describes the spatial extent of a locality and its associated uncertainties. “Davis, Yolo County, California” Coordinates: Coordinates: 38.5486 -121.7542 38.5486 -121.7542 38.545 -121.7394 38.545 -121.7394 Horizontal Geodetic Datum: NAD27 Horizontal Geodetic Datum: NAD27 “bounding-box method” “Davis, Yolo County, California” Coordinates:38.5468 38.5468-121.7469 -121.7469 Coordinates: HorizontalGeodetic GeodeticDatum: Datum:NAD27 NAD27 Horizontal MaximumUncertainty: Uncertainty:8325 8325mm Maximum “point-radius method” What is an ideal georeference? A numerical description of a place that can be mapped and that describes the spatial extent of a locality and its associated uncertainties as well as all possibilities. “Davis, Yolo County, California” “shape method” “20 mi E Hayfork, California” “probability method” Method Comparison point easy to produce no data quality bounding-box simple spatial queries difficult quality assessment point-radius easy quality assessment difficult spatial queries shape accurate representation complex, uniform probability accurate representation complex, non-uniform Georeferencing Using MaNIS/HerpNET/ORNIS Guidelines Parallels of Latitude Meridians of Longitude Graticular Network MaNIS/HerpNET/ORNIS (MHO) Guidelines http://manisnet.org/GeorefGuide.html ► uses point-radius representation of georeferences ► circle encompasses all sources of uncertainty about the location ► methodology formalizes assumptions, algorithms, and documentation standards that promote reproducible results ► methods are universally applicable MHO Guidelines ► Think of georeferencing as “many-stepped process” – MHO projects produced a first pass. Then validation and refinement should be done using itineries, field notes, collector verification and by mapping the localities and making these maps available on-line. Data Quality ► “Fitness of use” of the data ► As a collector, you may have an intended use for the data you collect but data have the potential to be used in unforeseen ways…. The value of your data is directly related to the fitness for a variety of uses. ► As data become more accessible many more uses become apparent. – Chapman 2005, Chapman and Wieczorek 2006 ► We are using the MHO methods as a tool to enhance data quality Maximum Error Distance from Uncertainties: ► Uncertainty is a “measure of the incompleteness of one’s knowledge or information about an unknown quantity whose true value can be established if a perfect measure device were available.” (Cullen & Frey 1999) ► In MHO Guidelines, this is defined as the numerical value for the upper limit of the distance from the coordinates of a locality to the outer extremity of the area within which the whole of the described locality must lie (i.e., what can be mistaken for that locality based on the description given). Extents: Extent- the geographic range, magnitude or distance that a location may actually represent. (With a town, the extent is the polygon that encompasses the area inside the town’s boundaries.) Linear extent- what we use for the Point-Radius Method. Defined as the distance from the geographic center of the location to the furthest point of the geographic extent of the location. Precision and Accuracy: Always use as many decimal places as given by the coordinate source. ► A measurement in decimal degrees given to five decimal places is more precise than a measurement in degrees minutes seconds. ► False precision will result if data are recorded with a greater number of decimal points (e.g., when converting from DMS to decimal degrees). ► Always record the accuracy of your GPS readings (how well the GPS measures the true value of the location). The accuracy is given at the same time as the coordinate, but usually will not be recorded with the coordinates when you output them on recreational GPS units. Otherwise, default accuracy is assumed 30 m, so stating your accuracy is better. ► Geographical Concepts: Geodetic Datum: defines the position of the origin, scale, shape, and orientation of a 3dimensional model of the earth. Example: WGS84. Coordinate System: defines the “units of measure” of position with respect to the datum. Example: latitude, longitude in degrees, minutes, seconds Map Projections: Mathematical transformations of the 3-D model of the surface of the earth onto a 2-D map. Many different kinds (e.g., conical, cylindrical, azimuthal) – all are compromises in distortions (either area, shape, distance, or direction), but some preserve areas or distances. When measuring distances on paper maps, use an equal distance projection, if available, otherwise understand the implications. Georeferencing Concepts Named place: a place of reference in a locality description. Example: “Davis” in “5 mi N of Davis” Areal extent: the geographic area covered by a named place (feature). Example: the area inside the boundaries of a town. Linear extent: the distance from the geographic center to the furthest point of the areal extent of a named place. Georeferencing Concepts ► Offset: the distance from a named place. Example: “5 mi” in “5 mi NE of Beatty”. ► Heading: the direction from a named place. Example: “NE” in “5 mi NE of Beatty”. Georeferencing Concepts ► coordinateUncertaintyInMeters: “The horizontal distance (in meters) from the given decimalLatitude and decimalLongitude describing the smallest circle containing the whole of the Location. Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (i.e., there are no coordinates). Zero is not a valid value for this term.” (from Darwin Core) ► Maximum Error Distance: same as coordinateUncertaintyInMeters, except the units are the same as in the locality description, not necessarily meters. Sources of uncertainty: Scale 20° 30’ N 112° 36’ W Coordinate Uncertainty Map scale The extent of the locality GPS accuracy Unknown datum Imprecision in direction measurements Imprecision in distance measurements (e.g., 1 km vs. 1.1 km) Uncertainty (ft) Uncertainty (m) 1:1,200 3.3 ft 1.0 m 1:2,400 6.7 ft 2.0 m 1:4,800 13.3 ft 4.1 m 1:10,000 27.8 ft 8.5 m 1:12,000 33.3 ft 10.2 m 1:24,000 40.0 ft 12.2 m 1:25,000 41.8 ft 12.8 m 1:63,360 106 ft 32.2 m 1:100,000 167 ft 50.9 m 1:250,000 417 ft 127 m 1. Georeferencing 2. Collaborations 3. Automation Collaborative Distributed Database Portals for Vertebrates Collaborations MaNIS Localities Georeferenced n = 326k localities (1.4M specimens) r = 14 localities/hr (point-radius method) t = 3 yrs (~40 georeferencers) ORNIS Localities Georeferenced n = 267k localities (1.4M specimens) r = 30 localities/hr (point-radius method) t = 2 yrs (~30 georeferencers) HERPNET Localities Georeferenced n = 646 k localities (1.8 M specimens) r = 15 localities/hr (point-radius method) t =5 yrs (111 georeferencers) Scope of the Problem for Natural History Collections ~2.5 Billion (109) ~6 records per locality* ~14 (30) localities per hour* ~15,500 (7,233) years * based on the MaNIS (ORNIS) Project The Collaboration continues… 1. Georeferencing 2. Collaborations 3. Automation Automation Tools for Georeferencing BioGeomancer Classic GeoLocate Georeferencing Calculator DIVA-GIS http://www.biogeomancer.org