Collection and Preservation of At

advertisement
Collection and Preservation of AtRisk Digital Geospatial Data:
the North Carolina NDIIPP Project
Partners:
NCSU Libraries
Project Lead: Steve Morris
NC Center for Geographic Information & Analysis
Project Lead: Zsolt Nagy
LCFS Database Group
May 30, 2005
Project Context
Partnership between university library (NCSU)
and state agency (NCCGIA)
Focus on state and local geospatial content in
North Carolina (state demonstration)
Tied to NC OneMap initiative, which provides
for seamless access to data, metadata, and
inventory information
Objective: engage existing state/federal
geospatial data infrastructures in preservation
Note: Percentages based on the actual number of
respondents to each question
2
Targeted Content
Resource Types
GIS “vector” (point/line/polygon) data
Digital orthophotography
Digital maps
Tabular data (e.g. assessment data)
Content Producers
Mostly state, local, regional agencies
Some university, not-for-profit, commercial
Selected local federal projects
Note: Percentages based on the actual number of
respondents to each question
3
NC Local GIS Landscape
100 counties, 92 with GIS
80 counties with high resolution orthophotography
65+ counties with unique map servers.
Growing number of municipal systems
Value: $162 million plus investment
Note: Percentages based on the actual number of
respondents to each question
4
NC OneMap Initial Data Layers
Produced by Cities and Counties
80%
60%
40%
20%
0%
Ortho
County Bnd.
Land Use
Hospitals
Landfills
Building Footprints
Cadastral
ETJs
Airports
Storm Surge
Watersheds
Future Land Use
Roads
Surface Waters
Schools
Police Stations
Wetlands
Water Lines
Note: Percentages based on the actual number of
respondents to each question
Municipal Bnd.
Elevation
Universities
Fire Stations
Hazardous Disposal Sites
Sewer Lines
5
Vector data (scale, accuracy, currency, etc.)
Note: Percentages based on the actual number of
respondents to each question
6
Time series – vector data
Parcel Boundary Changes 2001-2004, North Raleigh, NC
Note: Percentages based on the actual number of
respondents to each question
7
Aerial imagery (image resolution, etc.)
Note: Percentages based on the actual number of
respondents to each question
8
Aerial imagery (image resolution, etc.)
Note: Percentages based on the actual number of
respondents to each question
9
Aerial imagery (image resolution, etc.)
Note: Percentages based on the actual number of
respondents to each question
10
Aerial imagery (image resolution, etc.)
Note: Percentages based on the actual number of
respondents to each question
11
Time series – Ortho imagery
Vicinity of Raleigh-Durham International Airport 1993-2002
Note: Percentages based on the actual number of
respondents to each question
12
Tabular data (combined with vector data)
Note: Percentages based on the actual number of
respondents to each question
13
Tabular data (combined with vector data)
Note: Percentages based on the actual number of
respondents to each question
14
Tabular data (combined with vector data)
Note: Percentages based on the actual number of
respondents to each question
15
Today’s geospatial data as tomorrow’s cultural heritage
Note: Percentages based on the actual number of
respondents to each question
16
Risks to Digital Geospatial Data
Producer focus on current data
Time-versioned content generally not archives
Future support of data formats in question
Vast range of data formats in use--complex
Shift to “streaming data” for access
Archives have been a by-product of providing access
Preservation metadata requirements
Descriptive, administrative, technical, DRM
Geodatabases
Complex functionality
Note: Percentages based on the actual number of
respondents to each question
17
GIS Software Used – Local Agencies
70%
60%
50%
40%
30%
20%
10%
0%
ArcGIS (ESRI)
ArcView 3.x (ESRI)
ArcInfo (ESRI)
ArcIMS (ESRI)
ArcView 8.x (ESRI)
GenaMap
IMAGINE
Intergraph
MapInfo
Understanding Systems
Other
Not Sure
Note: Percentages based on the actual number of
respondents to each question
Source: NC OneMap Data Inventory 2004
18
Earlier NCSU Acquisition Efforts
NCSU University Extension project 2000-2001
Target: County/city data in eastern NC
“Digital rescue” not “digital preservation”
Project learning outcomes
Confirmed concerns about long term access
Need for efficient inventory/acquisition
Wide range in rights/licensing
Need to work within statewide infrastructure
Acquired experience; unanticipated collaboration
Note: Percentages based on the actual number of
respondents to each question
19
Exploring Approaches to Sharing Data
County and City GIS Directories
Note: Percentages based on the actual number of
respondents to each question
20
Processing Ingested Data
e.g. Testing for data gaps in county orthophoto sets
Note: Percentages based on the actual number of
respondents to each question
21
Content Identification and Selection
Work from NC OneMap Data Inventory
Combine with inventory information from
various state agencies and from previous
NCSU efforts
Develop methodology for selecting from among
“early,” “middle,” and “late” stage products
Develop criteria for time series development
Investigate use of emerging Open Geospatial
Consortium technologies in data identification
Note: Percentages based on the actual number of
respondents to each question
22
Content Acquisition
Work from NC OneMap Data Sharing
Agreements as a starting point (the “blanket”)
Secure individual agreements (the “quilt”)
Investigate use of OGC technologies in capture
Explore use of METS as a metadata wrapper
Ingest FGDC metadata; Xwalk to MODS? PREMIS?
Maybe METS DRM short term; GeoDRM long term
Consider links to services; version management
Get the geospatial community to tackle the content
packaging problem (maybe MPEG 21?)
Note: Percentages based on the actual number of
respondents to each question
23
Partnership Building
Work within context of the NC OneMap initiative
State, local, federal partnership
State expression of the National Map
Defined characteristic: “Historic and temporal data
will be maintained and available”
Advisory Committee drawn from the NC Geographic
Information Coordinating Council subcommittees
Seek external partners
National States Geographic Information Council
FGDC Historical Data Committee
… more
Note: Percentages based on the actual number of
respondents to each question
24
Content Retention and Transfer
Ingest into Dspace
Explore how geospatial content interacts with existing
digital repository software environments
Investigate re-ingest into a second platform
Challenge: keep the collection repository-agnostic
Start to define format migration paths
Special problem: geodatabases
Purse long term solution
Roles of data producing agencies, state agencies; NC
OneMap; NCSU
Note: Percentages based on the actual number of
respondents to each question
25
Rights Issues
Various interpretations of public records law
53.9% of local NC agencies charge for data
43.7% of local NC agencies restrict redistribution
Desire for downstream control of data
Disclaimer clickthrough; liability concerns
Filtered locations/individuals; post 9/11 issues
Restrictions on redistribution; commercial resale
Web services area in “Wild West” stage
Both content and technical agreements
GeoDRM initiative in the works
Note: Percentages based on the actual number of
respondents to each question
26
Big Challenges
Format migration paths
Management of data versions over time
Preservation metadata
Harnessing geospatial web services
Preserving cartographic representation
Keeping content repository-agnostic
Preserving geodatabases
More …
Note: Percentages based on the actual number of
respondents to each question
27
Vector Data Format Issues
Vector data much more complicated than image data
‘Archiving’ vs. ‘Permanent access’
An ‘open’ pile of XML might make an archive, but if using it
requires a team of programmers to do digital archaeology then it
does not provide permanent access
Piles of XML need to be widely understood piles
GML: need widely accepted application schemas (like OSMM?)
The Geodatabase conundrum
Export feature classes, and lose topology, annotation,
relationships, etc.
… or use the Geodatabase as the primary archival platform
(some are now thinking this way)
Note: Percentages based on the actual number of
respondents to each question
28
Vector Data Format Options
Option A: use an open format and have a really
unfortunate transformation and limited vendor support
for the output object
Option B: use closed format but retain the original
content and count on short- and medium-term vendor
support.
Option C: do both to buy time and look for an open,
ASCII solution. (watch GML activity)
No sweet spot, just an evolving and changing mix of
flawed options that are used in combination.
Note: Percentages based on the actual number of
respondents to each question
29
Managing Time-versioned Content
Note: Percentages based on the actual number of
respondents to each question
30
Managing Time-versioned Content
Many local agency data layers continuously
updated
E.g., some county cadastral data updated daily—
older versions not generally available
Individual versioned datasets will wander off
from the archive
How do users “get current metadata/DRM/object”
from a versioned dataset found “in the wild”?
How do we certify concurrency and agreement
between the metadata and the data?
Note: Percentages based on the actual number of
respondents to each question
31
Managing Time-versioned Content
Can we manage the relationship loosely using a
persistent identifier link to a parent object?
Persistent ID
Resolver
version
version
version
Parent Object
Manager
version
version
Note: Percentages based on the actual number of
respondents to each question
32
Preservation Metadata Issues
FGDC Metadata
Many flavors, incoming metadata needs processing
Cross-walk elements to PREMIS, MODS?
Metadata wrapper
METS (Metadata Encoding and Transmission
Standard) vs. other industry solutions
Need a geospatial industry solution for the ‘METSlike problem’
GeoDRM a likely trigger—wrapper to enforce
licensing (MPEG 21 references in OGIS Web
Services 3)
Note: Percentages based on the actual number of
respondents to each question
33
Harnessing Geospatial Web Services
Note: Percentages based on the actual number of
respondents to each question
34
Note: Percentages based on the actual number of
respondents to each question
35
Note: Percentages based on the actual number of
respondents to each question
36
Note: Percentages based on the actual number of
respondents to each question
37
Note: Percentages based on the actual number of
respondents to each question
38
Note: Percentages based on the actual number of
respondents to each question
39
Harnessing Geospatial Web Services
Automated content identification
‘capabilities files,’ registries, catalog services
WMS (Web Map Service) for batch extraction of
image atlases
last ditch capture option
preserve cartographic representation
retain records of decision-making process
… feature services (WFS) later.
Rights issues in the web services space are
ambiguous
Note: Percentages based on the actual number of
respondents to each question
40
Preserving Cartographic Representation
Note: Percentages based on the actual number of
respondents to each question
41
Preserving Cartographic Representation
The true counterpart of the old map is not the GIS
dataset, but rather the cartographic representation that
builds on that data:
Intellectual choices about symbolization, layer combinations
Data models, analysis, annotations
Cartographic representation typically encoded in
proprietary files (.avl, .lyr, .apr, .mxd) that do not lend
themselves well to migration
Symbologies have meaning to particular communities at
particular points in time, preserving information about
symbol sets and their meaning is a different problem
Note: Percentages based on the actual number of
respondents to each question
42
Preserving Cartographic Representation
Image-based approaches
Generate images using Map Book or similar tools
Harvest existing atlas images
Capture atlases from WMS servers
Export ‘layouts’ or ‘maps’ to image
Vector-based approaches
Store explicitly in the data format (e.g. Feature Class
Representation in ArcGIS 9.2)
Archive and upward-migrate existing files .avl, .apr, .lyr,
.mxd, etc.
SVG, VML or other XML approaches
Other?
Note: Percentages based on the actual number of
respondents to each question
43
Preserving Cartographic Representation
Note: Percentages based on the actual number of
respondents to each question
44
Preserving Cartographic Representation
Note: Percentages based on the actual number of
respondents to each question
45
Repository Architecture Issues
Interest in how geospatial content interacts with
widely available digital repository software
Focus on salient, domain-specific issues
Challenge: remain repository agnostic
Avoid “imprinting” on repository software environment
Preservation package should not be the same as the
ingest object of the first environment
Tension between exploiting repository software
features vs. becoming software dependent
Note: Percentages based on the actual number of
respondents to each question
46
Preserving Geodatabases
Spatial databases in general vs. ESRI Geodatabase
“format”
Not just data layers and attributes—also topology,
annotation, relationships, behaviors
ESRI Geodatabase archival issues
XML Export, Geodatabase History, File Geodatabase,
Geodatabase Replication
Growing use of geodatabases by municipal, county
agencies
Some looking to Geodatabase as archival platform
(in addition to feature class export)
Note: Percentages based on the actual number of
respondents to each question
47
Questions?
Contact:
Steve Morris
Head, Digital Library Initiatives
NCSU Libraries
Steven_Morris@ncsu.edu
Note: Percentages based on the actual number of
respondents to each question
48
Download