Project Prism

advertisement
Project Prism
Virtual Remote Control:
Preservation Risk Management for Web Resources
Nancy Y. McGovern, ECURE 2002
Project Prism
The Project
• Part of a 4-year NSF-funded project
– supported by the Digital Libraries Initiative, Phase 2
(Grant No. IIS-9905955, the Prism Project)
• An umbrella project that includes
– Digital Libraries research team (Computer Science)
– Human Computer Interface (HCI)
– Cornell University Library (CUL)
• For updates:
– http://www.library.cornell.edu/iris/research/prism/index.html
Project Prism
The Team
Anne R. Kenney
Nancy Y. McGovern
Peter Botticelli
Richard Entlich
William R. Kehoe
Carl Lagoze
Sandra Payette
Project Prism
Preservation Risk Management
• Increased reliance by research libraries
on Web resources not owned or
controlled
• Need to monitor and evaluate resources
• Identify risks to resources and
appropriate responses
• Technology introduces new threats,
enables new solutions
Project Prism
The Research Agenda
see, "Preservation Risk Management for
Web Resources: Virtual Remote Control
in Cornell's Project Prism,"
by Anne R. Kenney, Nancy Y. McGovern,
Peter Botticelli, Richard Entlich, Carl
Lagoze, and Sandra Payette
in DLib Magazine, January 2002
http://www.dlib.org/dlib/january02/kenney/01kenney.html
Project Prism
1.
2.
3.
4.
5.
6.
7.
The Approach
Process
Identification
Analysis
Appraisal
Strategy
Detection
Response
Project Prism
Process
Adapt the Risk Management Model stages:
Project Prism
Identification
Establish boundary; Characterize content:
example: parse the URL
Project Prism
Analysis
Define risks associated with:
• A Web page:
– as a stand-alone object, ignoring its hyperlinks
– in local context, considering the internal and
external links
• A Web site:
– as a semantically coherent set of linked Web
pages
– as an entity in a broader technical and
organizational context
Project Prism
Contextual Layers
Project Prism
Page-level Monitoring
• Formatting: TIDY
• Standards
compliance
• Document structure
• Metadata:
– HTTP headers
– HTML headers
• Changes
– Content
– Location
• Links
–
–
–
–
–
Out-link structure
In-link structure
Intra-site
Hub
Volatility
• Page provenance
– URL parsing
• Log analysis
Project Prism
•
•
•
•
Site-level Monitoring
Graph analysis
Static site analysis and Longitudinal study
Aggregate page analyses
Site maintenance indicators
– Backup and archiving policies and procedures
– Hardware and software environment
– Network configuration and maintenance
Project Prism
Appraisal
Enable portfolio management:
Hypothetical appraisal of a Web resource:
Scope: highly relevant
Value: high value, not essential; numerous links to page
Relationship: secondary archives; informal agreement
Maintenance: key indicators of good management
Redundancy: captured by more than one archive
Risk response: very responsive to risk notifications
Capture: complex structure; cyclical updates; formats
Size: medium-sized; 3-level crawl
Project Prism
Portfolio Management
Project Prism
Strategy
Develop an organization-specific program:
Project Prism
Detection
Monitor change; initiate response:
Track indicators of management practices:
- markup language: version, formatting, compliance
- HTTP: status codes, header content
- changes: content, location
- links: internal, external, volatility
- server: security, version, upgrades, responsiveness
Project Prism
Detection (cont.)
Monitor change; initiate response:
Identify potential risks
- probable occurrence
- frequency of occurrence
- degree of impact
Correlate to program-define response levels
Identify appropriate risk/response scenario(s)
Project Prism
Response
Develop a toolkit:
Inventory and evaluate existing tools
Assess functionality for Prism stages
Adopt/adapt existing tools
Develop new tools
Apply to appropriate contextual layers
Integrate tools into customizable toolkit
Project Prism
•
•
•
•
•
Types of Tools
link analyzers
log analyzers
Web crawlers
Web visualization programs
Web site management utilities
Project Prism
Future Directions
• Preservation Risk Management Program:
– Develop program using Prism framework
– Provide organizational scenarios
• Toolkit:
– Complete inventory of tools
– Build toolkit demonstrator
• Applications:
– Develop presentation techniques for stored resources
– Enable risk/response scenario development
Download