Project Prism Virtual Remote Control: Preservation Risk Management for Web Resources Nancy Y. McGovern, ECURE 2002 Project Prism The Project • Part of a 4-year NSF-funded project – supported by the Digital Libraries Initiative, Phase 2 (Grant No. IIS-9905955, the Prism Project) • An umbrella project that includes – Digital Libraries research team (Computer Science) – Human Computer Interface (HCI) – Cornell University Library (CUL) • For updates: – http://www.library.cornell.edu/iris/research/prism/index.html Project Prism The Team Anne R. Kenney Nancy Y. McGovern Peter Botticelli Richard Entlich William R. Kehoe Carl Lagoze Sandra Payette Project Prism Preservation Risk Management • Increased reliance by research libraries on Web resources not owned or controlled • Need to monitor and evaluate resources • Identify risks to resources and appropriate responses • Technology introduces new threats, enables new solutions Project Prism The Research Agenda see, "Preservation Risk Management for Web Resources: Virtual Remote Control in Cornell's Project Prism," by Anne R. Kenney, Nancy Y. McGovern, Peter Botticelli, Richard Entlich, Carl Lagoze, and Sandra Payette in DLib Magazine, January 2002 http://www.dlib.org/dlib/january02/kenney/01kenney.html Project Prism 1. 2. 3. 4. 5. 6. 7. The Approach Process Identification Analysis Appraisal Strategy Detection Response Project Prism Process Adapt the Risk Management Model stages: Project Prism Identification Establish boundary; Characterize content: example: parse the URL Project Prism Analysis Define risks associated with: • A Web page: – as a stand-alone object, ignoring its hyperlinks – in local context, considering the internal and external links • A Web site: – as a semantically coherent set of linked Web pages – as an entity in a broader technical and organizational context Project Prism Contextual Layers Project Prism Page-level Monitoring • Formatting: TIDY • Standards compliance • Document structure • Metadata: – HTTP headers – HTML headers • Changes – Content – Location • Links – – – – – Out-link structure In-link structure Intra-site Hub Volatility • Page provenance – URL parsing • Log analysis Project Prism • • • • Site-level Monitoring Graph analysis Static site analysis and Longitudinal study Aggregate page analyses Site maintenance indicators – Backup and archiving policies and procedures – Hardware and software environment – Network configuration and maintenance Project Prism Appraisal Enable portfolio management: Hypothetical appraisal of a Web resource: Scope: highly relevant Value: high value, not essential; numerous links to page Relationship: secondary archives; informal agreement Maintenance: key indicators of good management Redundancy: captured by more than one archive Risk response: very responsive to risk notifications Capture: complex structure; cyclical updates; formats Size: medium-sized; 3-level crawl Project Prism Portfolio Management Project Prism Strategy Develop an organization-specific program: Project Prism Detection Monitor change; initiate response: Track indicators of management practices: - markup language: version, formatting, compliance - HTTP: status codes, header content - changes: content, location - links: internal, external, volatility - server: security, version, upgrades, responsiveness Project Prism Detection (cont.) Monitor change; initiate response: Identify potential risks - probable occurrence - frequency of occurrence - degree of impact Correlate to program-define response levels Identify appropriate risk/response scenario(s) Project Prism Response Develop a toolkit: Inventory and evaluate existing tools Assess functionality for Prism stages Adopt/adapt existing tools Develop new tools Apply to appropriate contextual layers Integrate tools into customizable toolkit Project Prism • • • • • Types of Tools link analyzers log analyzers Web crawlers Web visualization programs Web site management utilities Project Prism Future Directions • Preservation Risk Management Program: – Develop program using Prism framework – Provide organizational scenarios • Toolkit: – Complete inventory of tools – Build toolkit demonstrator • Applications: – Develop presentation techniques for stored resources – Enable risk/response scenario development