Arizona 'Lectronic Records Taskforce

advertisement
An Arizona Model for Capturing and
Describing Documents on the Web
Richard Pearce-Moses
Director of Digital Government Information
Arizona State Library, Archives and Public Records
rpm at lib.az.us
What Does WWW Stand For?
[Collage of Robert Conrad as James West in
the Wild, Wild West removed to avoid
violation of copyright.]
They both abbreviate to WWW
Rugged Individualism
Lack of standards ~ Lawlessness
The Dream
To collect, manage, preserve, and make useful the
enormous amount of digital information
our culture is now producing
The Reality
Two Approaches
Bibliocentric (Item-by-Item)
Tech-centric (Capture-It-All)
Emphasis on Software Tools and Technology
Limited Assistance from Content Providers
Library of Congress & NDIIPP
University of Illinois at
Urbana-Champaign
School of Library • Information Science
OCLC
Content Providers
Tufts University Perseus Project • Michigan State University
Library • State libraries: Arizona Connecticut, Illinois, North
Carolina, Wisconsin • UIUC partners: NCSA • WILLAM/FM/TV • Information Management Services
Digital Archives
Libraries
Artificial collections • Item Level Control
Archives
Provenance • Original Order • Hierarchy • Aggregate Control
Websites as Archival Collections
Documents of Common Provenance
Organized into Directories (Archival Series)
Publications v. Records
The Art and Craft of Building a Collection
What we do remains the same
How we do it will change
※
Identification/Selection
Acquisition
Description
Reference
Preservation
Identification — Where Do We Look?
Finding the Forest
az.gov • state.az.us
※
Domain Tool
Identifies all distinct domains Reports
new sites since previous spider
Reports when sites disappear
Selection: Which Collections Do We Harvest?
Collection-Level Analysis
Macro appraisal sets priorities
Materials appraised as series
Content Providers Taxonomy Tool
Names • Administrative history
Relationships • Subjects • Functions
Selection: Which Documents Do We Harvest?
Identify Series
Aggregate selection
Set frequency of harvests
Site Analysis Tool
Display structure
Harmonize physical, intellectual structure
Identify inaccessible content
Show what’s new
Show significant changes
Description
To be able to locate documents
• when the creator or provenance is known
• when the subject is known
• and to aid in selection as to character
Series Description
• Make directory name a meaningful title
• Scope and contents note
• High-level subject headings
• Recorded in site analysis tool database
Document Description
• Creator: taxonomy, internal metadata
• Title: from internal metadata, noun
phrases
• Subject: from series metadata, internal
metadata
Access
Finding Aids
A valuable bird’s-eye view for archivists
Of limited value to patrons . . .
Unless they’re transformed into topic maps
Full Text Search Engines
Ranking Algorithms
Categorization / Packaging Results
Based on series-level metadata
Based on autoclassification
Description and Access
Series-Level Description
name=“Creator”
Governor’s Drought Task Force
Rural Watershed Alliance
name=“Subject”
reservoirs
ground water
name=“Subject”
drought
water conservation
name=“Subject”
potable water
agriculture
name=“Type”
planning
reports
Categorized Results
Your search for water, Phoenix
Found documents in the following categories
water (500+)
water conservation (357)
drought (110)
flood control (98)
Found documents from the following agencies
Water Resources (135) Governor's Drought Task Force (102)
Maricopa County (84)
Corporation Commission (35)
Salt River Project (210)
xeriscape (25)
Phoenix (87)
Administration / Curation / Stewardship
Systematic
Regular Workflows
Not idiosyncratic
Collaborative
Consensual , Not Idiosyncratic
Avoid Redundant Efforts
Quality Control
Need for Good Metrics
Need for Regular Audits
Stay Tuned . . . .
Download