The Impact of the Internet on Research Universities Examples from Distance Education & Digital Libraries William Y. Arms Department of Computer Science Cornell University 1 Universities and Cost In 1978, a Cornell education cost one Chevrolet per year. In 2001, a Cornell education costs one BMW per year. Every year, costs have gone up faster than average income. The costs of research universities are dominated by personnel. Major reductions in unit costs require different use of personnel. 2 Technology in Education and Distance Education By creative use of technology: Can we teach more students, to a high level, with less faculty per student? 3 Technology in Education Technology Example Date Time sharing Dartmouth Basic 1964 Television Open University 1972 History Personal computers Apple University Consortium 1984 Campus networks Carnegie Mellon Andrew 1986 Internet Digital libraries 1991+ Web Distance learning 2000+ Current 4 Course Web Sites One third of Cornell courses have web sites 5 eCornell For profit, non-degree executive and professional courses 6 Technology in Education and Distance Education Question 1: Quality Is it good education? 7 Skepticism In a recent survey by JSTOR of faculty in social sciences and humanities, only 17% thought that distance education was as good as conventional campus-based education. (Preliminary data; please do not quote) What is the evidence? 8 The British Open University Distance education: Students at home, with limited access to tutors, summer schools. Technology used as appropriate: Printed materials, home experimental kits, videos, computing, etc. Academic standards: Full degree programs, external control of quality. Longevity: First students in 1972. 9 The Open University • Currently 215,000 students. • Over 2 million students since 1972. • Ranked in the top 10% of all UK universities, for teaching quality. [Ranked after Cambridge, York, Oxford, Imperial College, London School of Economics, Warwick, University College London, Durham and Sheffield.] Higher Education Funding Council, 1997 10 Technology in Education and Distance Education Question 2: Capital Intensive Education What are the organizational options? 11 Capital Intensive Education Conventional course: • • Major cost is faculty time. Costs are repeated every year. Technology in education and distance education: • • Course materials are a major expense. Marginal cost of delivering course is low. Consequences: • • • Economies of scale Universities need access to capital Course materials are an asset 12 Columbia University Cambridge University Press London School of Economics New York Public Library University of Chicago University of Michigan British Library American Film Institute RAND Woods Hole Victoria and Albert Museum Science Museum Natural History Museum 13 14 Technology in Education and Distance Education Question 3: Ownership and Intellectual Property If course materials are assets, who owns them? 15 Recommendations of a Cornell Committee 1. The university policies on intellectual property should be independent of the media in which ideas are expressed. 2. Creators of works should have control over the intellectual output resulting from their research, teaching, and writing. 3. When there are multiple creators of an individual work, the control should be shared among the creators. 4. When the university contributes substantial resources to the development of specific materials, it has a right to share in the control and returns. 16 MIT to make nearly all course materials available free on the World Wide Web Unprecedented step challenges 'privatization of knowledge' CAMBRIDGE, Mass. -- MIT President Charles M. Vest has announced that the Massachusetts Institute of Technology will make the materials for nearly all its courses freely available on the Internet over the next ten years. He made the announcement about the new program, known as MIT OpenCourseWare (MITOCW), at a press conference at MIT on Wednesday, April 4th. MIT Press Release, April 4, 2001 17 Digital Libraries By creative use of technology: Can we build libraries that are of high quality at much lower costs? 18 Research Libraries are Expensive library materials buildings & facilities staff 19 The Open Access Web Before the web • Few people had access to scientific, medical, legal information With the web • • Much high quality information is available with open access Free services organize this information and provide access to it "Please can I use the web? I don't do libraries." Anonymous Cornell student, circa 1996. 20 The Potential of Digital Libraries open access ? materials computers & networks staff 21 Digital Libraries Question 1: Economic Models for Open Access Who pays for open access? 22 A False Assumption Incorrect thinking The only incentive for creating information is to make money -- royalties to authors and profits for publishers Correct thinking Many creators do not require revenue • • • Marketing and promotion Government information Academic research They want their materials to be used 23 Examples Old New Books in Print (subscription) Amazon.com (advertising) Medline (pay-by-use) Grateful Med (external) Journal (subscription) ePrint archives (external) Westlaw (pay-by-use) Legal Information Institute (external) Inspec (subscription) Google (advertising) 24 Before You Ask ... • The open access information is sometimes a poor substitute • Much good information is not available with open access But every year the proportion of important information that is available with open access increases 25 Open Letter We support the establishment of an online public library that would provide the full contents of the published record of research and scholarly discourse in medicine and the life sciences in a freely accessible, fully searchable, interlinked form. Establishment of this public library would vastly increase the accessibility and utility of the scientific literature, enhance scientific productivity, and catalyze integration of the disparate communities of knowledge and ideas in biomedical sciences. 26 Hypotheses for Scholarly Information The dominant force is author pressure, which emphasizes open access rather than closed access. 1. A mixture of economic models will coexist. 2. Eventually, we will have open access to most scientific and professional information. 3. The most common economic model will be that information is published by the producing organization. 27 Digital Libraries Question 2: Quality What are the alternatives to peer review? 28 29 Observations about Peer Review At its best, it is superb. At its worst, it validates junk. Some topics can be reviewed from a paper, e.g., mathematics. Some topics cannot be reviewed from a paper, e.g., computer systems. "Whatever you do, write a paper. Some journal will publish it." Advice to young faculty member, University of Sussex, 1969. 30 Quality without Peer Review How can readers recognize good quality materials? How can publishers maintain high standards and let readers know? How can a scientist build a reputation outside the traditional peer-reviewed journals? A sample of one: William Y. Arms 31 Digital Libraries Question 3: Brute Force Computing How far can computers be used for the skilled tasks of professional librarianship? 32 Brute Force Computing Few people really understand Moore's Law -- Computing power doubles every 18 months -- Increases 100 times in 10 years -- Increases 10,000 times in 20 years Simple algorithms + immense computing power may outperform human intelligence 33 Brute Force Computing Example Creators of the world champion chess program (Deep Thought later Deep Blue) -- moderate chess players -- simple tree-search algorithm -- very, very fast computer hardware 34 Example: Catalogs and Indexes Catalog, index and abstracting records are very expensive when created by skilled professionals -- only available for certain categories of material (e.g., monographs, scientific journals) -- contain limited fields of information (e.g., no contents page) -- restricted to static information 35 Equivalent Services Information discovery I used to be a heavy user of Inspec. Now I use Google instead. Why are web search services the most widely used information discovery tools in universities today? 36 Thinking out of the Box For information discovery, particularly with untrained users: automated indexing of full text is at least as effective as manually produced indexes and catalogs [Demonstrated repeatedly in experiments going back to the original Cranfield experiments.] 37 Digital Libraries Question 4: Automated Digital Libraries What is the state of the art in automated digital libraries? 38 Automated Digital Libraries: Examples Automatic indexing Lycos, Infoseek, Altavista, Google, ... Query matching Vector methods (Salton) Ranking importance Google (Page and Brin) Archiving Internet Archive (Kahle) Collection development ResearchIndex (Lawrence) Metadata extraction Informedia (Wactlar) 39 Digital Libraries Question 5: A National Science Library (NSDL) Can we build a very low cost national science library using the methods of automated digital libraries? 40 One of Six Core Integration Demonstration Projects for the NSDL 41 How Big might the NSDL be? The NSDL aims to be comprehensive -- all branches of science, all levels of education, very broadly defined. Five year targets: 1,000,000 different users 10,000,000 digital objects 100,000 independent sites Requires: low-cost, scalable, technology automated collection building and maintenance 42 Levels of Interoperability: Metadata Harvesting Agreements on simple protocol and metadata standard(s) Example: Metadata harvesting protocol of the Open Archives Initiative (MHP) • Moderate-quality services • Low cost of entry to participating sites Moderately large numbers of loosely collaborating sites Promising but still an emerging approach 43 Levels of Interoperability: Gathering Robots gather collections automatically with no participation from individual sites Examples: Web search services (e.g., Google) CiteSeer (a.k.a. ResearchIndex) • Restricted but useful services • Zero cost of entry to gathered sites Very large numbers of independent sites Only suitable for open access collections 44 Technology Demonstrations 1. One Library, Many Portals 2. Coherent Services across Heterogeneous Collections 3. Easy Integration of Participating Collections 4. Variable Levels for Integrating Collections 5. Tools to Create New Collections 45 Some Light Reading William Y. Arms, "Automated digital libraries." D-Lib Magazine, July/August 2000. http://www.dlib.org/dlib/july20/07contents.html William Y. Arms, "Economic models for open-access publishing." iMP, March 2000. http://www.cisp.org/imp/march_2000/03_00arms.htm 46