Stephen Rhind-Tutt, President ITHAKA Sustainable Scholarship Conference September 2011 The Challenge By 2020 the Web will contain…??? • 90% of published works prior to 1923 • Majority of works published to 2020 • > 20 billion pages of e-mail, phone logs, databases, blogs and websites (currently 12 billion) • > 10 billion photographs • > 40 million pages of facsimiles of manuscripts • > 50 million audio files • > 500 million video files A Darwinian environment What I remember of the environment in the early 1990s – SilverPlatter MEDLINE (>$10m in sales) – Royalties to the NLM (<$200k) – Seven other vendors also making $$$ – SilverPlatter ERIC ($1.5m in sales) – Royalties to Dept. of Education (<$100k) – Many other vendors – SilverPlatter SEC Online – No royalties going back to the SEC Environment in 2011 – PubMed provides free access to the world – ERIC offered free to the world – SEC filings offered free to the world – What’s happened to the vendors? Environment in 2011 – Ovid and others continue to profit from public domain MEDLINE – New entrants – SilverChair, Collexis… – SEC filings continue to sell – Bloomberg, Yahoo and many new entrants – Aries Systems moved into publisher services – CSC provides free access to all for ERIC with a 5 year contract for $29m What’s going on? This is a commodity… This is not a commodity Information isn’t a commodity! Black & White Dirty OCR 100 dpi Page Letter JPG 99.995% rekeying Grayscale Transcriptions 100 dpi 600 dpi Citation TIFF MARC Record EAD Finding Aid 24 bit color Thumbnails 48 bit color Facsimiles Collection TCP-IP Mobile Web Semantic Indexing Repository Information isn’t a commodity Why? Therefore Who, What, When, Where? Source: Data, Information, Knowledge, and Wisdom, Gene Bellinger, Durval Castro, Anthony Mills. http://www.systemsthinking.org/ Evolution of tasks Process integration Workflow tools Semantic indexing Commissioning? Linking Community building Asset management Free materials Rare and unpublished material Editorial? Licensing Speed? Unified search software Simple, one database search Public domain reprints Warehousing Quality? Print directory Selection? Print monograph Printing Typesetting Fading Growing With literally billions of pages… What tools will we need ? • Beyond paper • Higher editorial value • High functionality • Semantically organized • More comprehensive • Individually customizable • Discipline, community centric • Web/network centric ASP experience… • Add value to public domain – – – – Rare, hard to find materials Contextual essays and supporting material Semantic Indexing Unique functionality • Go beyond public domain – Publish copyright material – Persuade publishers to release key content for electronic publication – Commission new material ourselves The American Civil War Research Database Great functionality Women and Social Movements • Collaboration with the Center for the Historical Study of Women and Gender at SUNY Binghamton and ASP • Original site is free –new content is for fee. • Usage across the free site dipped only slightly – more usage following commercial launch. • Added video, audio, > 200k pages, new functionality. Be of the web Websites Music Newspapers Primary Works Monographs Journals Building the network… Unhelpful Helpful • Legal warnings not to link • Visibility • Changing links constantly • Permanent URLs • Disabling links • RSS feeds • No permanent URLs • OpenURL • No crawling • Design for multiple interfaces • Randomly changing URLs • Open to crawling • Insisting on one interface and one access point • Published open APIs • Unattached pages • Welcome linking • Ask others to do the same A Darwinian environment