Ithaka Sustainable Scholarship Presentation

advertisement
Stephen Rhind-Tutt, President
ITHAKA Sustainable Scholarship Conference
September 2011
The Challenge
By 2020 the Web will contain…???
• 90% of published works prior to 1923
• Majority of works published to 2020
• > 20 billion pages of e-mail, phone logs, databases,
blogs and websites (currently 12 billion)
• > 10 billion photographs
• > 40 million pages of facsimiles of manuscripts
• > 50 million audio files
• > 500 million video files
A Darwinian environment
What I remember of the environment in
the early 1990s
– SilverPlatter MEDLINE (>$10m in sales)
– Royalties to the NLM (<$200k)
– Seven other vendors also making $$$
– SilverPlatter ERIC ($1.5m in sales)
– Royalties to Dept. of Education (<$100k)
– Many other vendors
– SilverPlatter SEC Online
– No royalties going back to the SEC
Environment in 2011
– PubMed provides free access to the world
– ERIC offered free to the world
– SEC filings offered free to the world
– What’s happened to the vendors?
Environment in 2011
– Ovid and others continue to profit from
public domain MEDLINE
– New entrants – SilverChair, Collexis…
– SEC filings continue to sell – Bloomberg,
Yahoo and many new entrants
– Aries Systems moved into publisher
services
– CSC provides free access to all for ERIC
with a 5 year contract for $29m
What’s going on?
This is a commodity…
This is not a commodity
Information isn’t a commodity!
Black & White
Dirty OCR
100 dpi
Page
Letter
JPG
99.995% rekeying
Grayscale
Transcriptions
100 dpi
600 dpi
Citation
TIFF
MARC Record
EAD Finding Aid
24 bit color
Thumbnails
48 bit color
Facsimiles
Collection
TCP-IP
Mobile Web
Semantic Indexing
Repository
Information isn’t a commodity
Why?
Therefore
Who, What, When, Where?
Source: Data, Information, Knowledge, and Wisdom, Gene
Bellinger, Durval Castro, Anthony Mills. http://www.systemsthinking.org/
Evolution of tasks
Process integration
Workflow tools
Semantic indexing
Commissioning?
Linking
Community building
Asset management
Free materials
Rare and unpublished material
Editorial?
Licensing
Speed?
Unified search software
Simple, one database search
Public domain reprints
Warehousing
Quality?
Print directory
Selection?
Print monograph
Printing
Typesetting
Fading
Growing
With literally billions of pages…
What tools will we need ?
• Beyond paper
• Higher editorial value
• High functionality
• Semantically organized
• More comprehensive
• Individually customizable
• Discipline, community centric
• Web/network centric
ASP experience…
• Add value to public domain
–
–
–
–
Rare, hard to find materials
Contextual essays and supporting material
Semantic Indexing
Unique functionality
• Go beyond public domain
– Publish copyright material
– Persuade publishers to release key content for electronic
publication
– Commission new material ourselves
The American Civil War Research
Database
Great functionality
Women and Social Movements
• Collaboration with the Center for the
Historical Study of Women and Gender
at SUNY Binghamton and ASP
• Original site is free –new content is for
fee.
• Usage across the free site dipped only
slightly – more usage following
commercial launch.
• Added video, audio, > 200k pages, new
functionality.
Be of the web
Websites
Music
Newspapers
Primary Works
Monographs
Journals
Building the network…
Unhelpful
Helpful
• Legal warnings not to link
• Visibility
• Changing links constantly
• Permanent URLs
• Disabling links
• RSS feeds
• No permanent URLs
• OpenURL
• No crawling
• Design for multiple interfaces
• Randomly changing URLs
• Open to crawling
• Insisting on one interface and
one access point
• Published open APIs
• Unattached pages
• Welcome linking
• Ask others to do the same
A Darwinian environment
Download