Map Portals and Geoarchiving: New Opportunities in Geospatial Information Services Steve Morris Head of Digital Library Initiatives NCSU Libraries GIS Technology: Sustaining the Future & Understanding the Past Case Western Reserve University October 13, 2005 Overview Brief overview of library roles in digital geographic information services Geospatial web services: opportunities and challenges for libraries Long-term preservation of digital geospatial data Note: Percentages based on the actual number of respondents to each question 2 Library Geospatial Data Services: Data Collections Acquire data (licensed and public domain) License data for in-library or campus use Provide networked access Acquire or create valueadded derivatives Note: Percentages based on the actual number of respondents to each question 3 Library Geospatial Data Services: Discovery Tools Web documentation Author and publish metadata Searchable metadata catalogs Integrate data into library catalog Note: Percentages based on the actual number of respondents to each question 4 Library Geospatial Data Services: Reference and Technical Support Assistance with finding and selecting data GIS “reference interview” Line between reference support and technical support is extremely fuzzy Support or administration of campus GIS software licenses Reference support for locating software tools (e.g. scripts for ArcView and ArcGIS) Note: Percentages based on the actual number of respondents to each question 5 Library Geospatial Data Services: Workshops and Outreach In-library workshops and class visits Online workshops (Virtual Campus) Marketing and Outreach Work to engage broader number of academic departments in GIS activity Work to lower barrier to entry in GIS work (access to software, data, training, support) Library as ‘neutral ground’ well suited to coordinate with campus GIS infrastructure Note: Percentages based on the actual number of respondents to each question 6 Library Geospatial Data Services Timeline Map Collections Data Collections Map Servers Map Portals Map Collections Paper Maps Data Collections CD-ROMs, File server & FTP access Map Servers Integrate collected data, Web-based mapping Map Portals Integrate distributed, streaming data Note: Percentages based on the actual number of respondents to each question 7 NC Local Government Map Services # # # # # # # # # # # # # # # # # ## # # # # # # ## # # City Map Services County Map Services # Note: Percentages based on the actual number of respondents to each question # # 8 County Government Map Server Note: Percentages based on the actual number of respondents to each question 9 State Government Map Server Note: Percentages based on the actual number of respondents to each question 10 Federal Government Map Server Note: Percentages based on the actual number of respondents to each question 11 Open Geospatial Consortium (OGC) Technology Overview The Open Geospatial Consortium (OGC) is a not-forprofit, international consortium: focus on data interoperability Operates a Specification Development Program that is similar to other Industry consortia (W3C, etc.) Also operates an Interoperability Program (IP), a partnership-driven engineering and testing program designed to deliver proven specifications into the Specification Development Program. OGC used to talk about “web-enabling GIS”, now they talk about “geo-enabling the web.” Note: Percentages based on the actual number of respondents to each question 12 National Approaches USGS National Map Integrated WMS services Services catalog Geospatial One-Stop Searchable services Specialized Portals FEMA Mapping Katrina Portal HUD E-Maps Note: Percentages based on the actual number of respondents to each question 13 State Approach: NC OneMap Data integration through OGC specifications (currently just WMS) Data sharing agreements Metadata outreach Ongoing data inventories Practices and guidelines vis-à-vis map service configuration Note: Percentages based on the actual number of respondents to each question 14 Note: Percentages based on the actual number of respondents to each question 15 Note: Percentages based on the actual number of respondents to each question 16 Note: Percentages based on the actual number of respondents to each question 17 Note: Percentages based on the actual number of respondents to each question 18 Note: Percentages based on the actual number of respondents to each question 19 Geospatial Web Service Types Image services Deliver image resulting from query against underlying data Limited opportunity for analysis Feature services Stream actual feature data, greater opportunity for data analysis Other Geocoding services Routing .etc. Note: Percentages based on the actual number of respondents to each question 20 Geospatial Web Services: Advantages Time- and location-independent access Access to extremely large datasets Access to most current data Ad hoc access to data for which there is typically low demand Reduce barriers imposed by differences in formats, coordinate systems, etc. Access to geoprocessing functionality Note: Percentages based on the actual number of respondents to each question 21 Note: Percentages based on the actual number of respondents to each question 22 Note: Percentages based on the actual number of respondents to each question 23 Note: Percentages based on the actual number of respondents to each question 24 Geospatial Web Services: Shortcomings Application performance will frequently not match that of locally loaded data Up-time reliability issues Many demonstration services, persistence is open to question Dynamically changing content can lead to analysis surprises Does not replace aesthetic value of paper map Note: Percentages based on the actual number of respondents to each question 25 Geospatial Web Services: When Most Useful? User needs most current data Data is subject to frequent change & update User needs access to extremely large datasets User wishes to preview data prior to use User just needs background display Need to integrate data into portable devices Data not otherwise available Note: Percentages based on the actual number of respondents to each question 26 Geospatial Web Services: Integration Challenges for Libraries Services difficult to discover and select from In case of commercial services, campus licensing models not well evolved Linking data objects with services that act upon them is not well supported by existing metadata and catalog schemes Ambiguous rights issues How to integrate into the physical browse environment of the map library? Note: Percentages based on the actual number of respondents to each question 27 Geospatial Web Services Rights Issues Example: Desktop GIS-accessible ArcIMS 39 of 100 NC counties have desktop GIS-accessible ArcIMS services It is difficult to know how many of these counties actually expect users to either: A) access data through desktop GIS for viewing only, or B) extract and download data Accessible ArcXML Services Note: Percentages based on the actual number of respondents to each question 28 Geospatial Data: Discovery and Selection Issues Data extent Thematic content & attributes Currency Format, coordinate system, datum, etc. Licensing restrictions Ease of access Metadata availability More … Note: Percentages based on the actual number of respondents to each question 29 Geospatial Web Services: Discovery and Selection Issues Inherits many data selection issues such as coordinate system, etc. Service type: image, feature, geocoding, … Access protocol: OGC specs (WMS, WFS, WCS …), SOAP, ArcXML (ArcIMS image and feature services, specialized APIs (e.g. Google Maps) Reliability, up-time performance, speed Licensing scheme Functions: annotation, saved maps, etc. Image services: image formats Note: Percentages based on the actual number of respondents to each question 30 Facilitating Discovery of Services: Example: Directory of County Map Services Among top 15 most used resources on library web site 99.5% of directory users from outside ncsu.edu Note: Percentages based on the actual number of respondents to each question 31 Library Opportunities to Provide Geospatial Web Services Publish WMS servers from public domain content not already available Fill holes in service availability Publish archival content counter bias towards current content in the industry Publish cascading map services Create specialized front-ends to existing, distributed services Note: Percentages based on the actual number of respondents to each question 32 Cascading Map Services: Problems Different versions of OGC standards e.g., WMS 1.1.0, WMS 1.1.1 … Differences in layer naming ‘cadastral’ vs. ‘parcels’ vs. ‘property boundaries’ Differences in classification schemes e.g., inconsistent land use, zoning schemes Service reliability, addressing stability, uptime On top of standards & specifications, need community overlay of best practices Note: Percentages based on the actual number of respondents to each question 33 Community Practices in Cascading Map Services Example: Layer Names, Symbology, Classification Note: Percentages based on the actual number of respondents to each question 34 “Web mash-ups” and the New Mainstream Geospatial Web Services New services such as Google Maps, MSN Virtual Earth, Yahoo Maps Static, tiled images for efficient access API’s for developer access Positioning for mobile device-oriented application development Engaging mainstream IT and general public AJAX: Asynchronous Javascript and XML New forms of map and service publishing Note: Percentages based on the actual number of respondents to each question 35 Integrating Traditional Geospatial Data and Services with New Services Note: Percentages based on the actual number of respondents to each question 36 Integrating Traditional Geospatial Data and Services with New Services But who preserves the data …? Note: Percentages based on the actual number of respondents to each question 37 Today’s geospatial data as tomorrow’s cultural heritage Note: Percentages based on the actual number of respondents to each question 38 Time series – vector data Parcel Boundary Changes 2001-2004, North Raleigh, NC Note: Percentages based on the actual number of respondents to each question 39 Time series – Ortho imagery Vicinity of Raleigh-Durham International Airport 1993-2002 Note: Percentages based on the actual number of respondents to each question 40 Risks to Digital Geospatial Data Producer focus on current data “Kill and fill”, absence of time-versioned content Future support of data formats in question Vast range of data formats in use--complex Shift to “streaming data” for access Archives have been a by-product of providing access Preservation metadata requirements Descriptive, administrative, technical, DRM Geodatabases Complex functionality Note: Percentages based on the actual number of respondents to each question 41 NC Geospatial Data Archiving Project (NCGDAP) Partnership between university library (NCSU) and state agency (NCCGIA) Focus on state and local geospatial content in North Carolina (state demonstration) Tied to NC OneMap initiative Part of Library of Congress National Digital Information Infrastructure & Preservation Program (NDIIPP) Objective: engage existing state/federal geospatial data infrastructures in preservation Note: Percentages based on the actual number of respondents to each question 42 NCGDAP Philosophy of Engagement Take the data as in the manner In which it can be obtained Wrangle and archive data Provide feedback to producer organizations/ inform state geospatial infrastructure Note the ‘Project’ in ‘North Carolina Geospatial Data Archiving Project’– the process, the learning experience, and the engagement with geospatial data infrastructures are more important than the archive Note: Percentages based on the actual number of respondents to each question 43 Earlier NCSU Acquisition Efforts NCSU University Extension project 2000-2001 Target: County/city data in eastern NC “Digital rescue” not “digital preservation” Hurricane Floyd flood response Project learning outcomes Confirmed concerns about long term access Need for efficient inventory/acquisition Wide range in rights/licensing Need to work within statewide infrastructure Note: Percentages based on the actual number of respondents to each question 44 Big Geoarchiving Challenges Format migration paths Management of data versions over time Preservation metadata Harnessing geospatial web services Preserving cartographic representation Keeping content repository-agnostic Preserving geodatabases More … Note: Percentages based on the actual number of respondents to each question 45 Vector Data Format Issues Vector data much more complicated than image data ‘Archiving’ vs. ‘Permanent access’ An ‘open’ pile of XML might make an archive, but if using it requires a team of programmers to do digital archaeology then it does not provide permanent access Piles of XML need to be widely understood piles GML: need widely accepted application schemas (like OSMM?) The Geodatabase conundrum Export feature classes, and lose topology, annotation, relationships, etc. … or use the Geodatabase as the primary archival platform (some are now thinking this way) Note: Percentages based on the actual number of respondents to each question 46 Managing Time-versioned Content Many local agency data layers continuously updated E.g., some county cadastral data updated daily— older versions not generally available Individual versioned datasets will wander off from the archive How do users “get current metadata/DRM/object” from a versioned dataset found “in the wild”? How do we certify concurrency and agreement between the metadata and the data? Note: Percentages based on the actual number of respondents to each question 47 Preservation Metadata Issues FGDC Metadata Many flavors, incoming metadata needs processing Cross-walk elements to PREMIS, MODS? Metadata wrapper METS (Metadata Encoding and Transmission Standard) vs. other industry solutions Need a geospatial industry solution for the ‘METSlike problem’ GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGIS Web Services 3) Note: Percentages based on the actual number of respondents to each question 48 Preserving Cartographic Representation The true counterpart of the old map is not the GIS dataset, but rather the cartographic representation that builds on that data: Intellectual choices about symbolization, layer combinations Data models, analysis, annotations Cartographic representation typically encoded in proprietary files (.avl, .lyr, .apr, .mxd) that do not lend themselves well to migration Symbologies have meaning to particular communities at particular points in time, preserving information about symbol sets and their meaning is a different problem Note: Percentages based on the actual number of respondents to each question 49 Preserving Cartographic Representation Note: Percentages based on the actual number of respondents to each question 50 Repository Architecture Issues Interest in how geospatial content interacts with widely available digital repository software Focus on salient, domain-specific issues Challenge: remain repository agnostic Avoid “imprinting” on repository software environment Preservation package should not be the same as the ingest object of the first environment Tension between exploiting repository software features vs. becoming software dependent Note: Percentages based on the actual number of respondents to each question 51 Preserving Geodatabases Spatial databases in general vs. ESRI Geodatabase “format” Not just data layers and attributes—also topology, annotation, relationships, behaviors ESRI Geodatabase archival issues XML Export, Geodatabase History, File Geodatabase, Geodatabase Replication Growing use of geodatabases by municipal, county agencies Some looking to Geodatabase as archival platform (in addition to feature class export) Note: Percentages based on the actual number of respondents to each question 52 Geodatabase Availability According to the 2003 Local Government GIS Data Inventory, 10.0% of all county framework data and 32.7% of all municipal framework data were managed in that format. Cities: Street Centerline Formats Counties: Street Centerline Formats Geodatabase Geodatabase Shapefile Shapefile Coverage Coverage Other Other Note: Percentages based on the actual number of respondents to each question 53 Harnessing Geospatial Web Services Automated content identification ‘capabilities files,’ registries, catalog services WMS (Web Map Service) for batch extraction of image atlases last ditch capture option preserve cartographic representation retain records of decision-making process … feature services (WFS) later. Rights issues in the web services space are ambiguous Note: Percentages based on the actual number of respondents to each question 54 Questions? Contact: Steve Morris Head, Digital Library Initiatives NCSU Libraries Steven_Morris@ncsu.edu Note: Percentages based on the actual number of respondents to each question 55