PowerPoint slides - Computer Science

advertisement
The Impact of the Internet
on Research Universities
Examples from Distance Education
& Digital Libraries
William Y. Arms
Department of Computer Science
Cornell University
1
Universities and Cost
In 1978, a Cornell education cost one Chevrolet per year.
In 2001, a Cornell education costs one BMW per year.
Every year, costs have gone up faster than average income.
The costs of research universities are dominated by
personnel.
Major reductions in unit costs require different use of
personnel.
2
Technology in Education
and Distance Education
By creative use of technology:
Can we teach more students, to a high level, with less faculty per
student?
3
Technology in Education
Technology
Example
Date
Time sharing
Dartmouth Basic
1964
Television
Open University
1972
History
Personal computers Apple University Consortium 1984
Campus networks
Carnegie Mellon Andrew
1986
Internet
Digital libraries
1991+
Web
Distance learning
2000+
Current
4
Course Web Sites
One third of
Cornell
courses have
web sites
5
eCornell
For profit, non-degree executive and professional courses
6
Technology in Education
and Distance Education
Question 1: Quality
Is it good education?
7
Skepticism
In a recent survey by JSTOR of faculty in social sciences
and humanities, only 17% thought that distance education
was as good as conventional campus-based education.
(Preliminary data; please do not quote)
What is the evidence?
8
The British Open University
Distance education: Students at home, with limited access
to tutors, summer schools.
Technology used as appropriate: Printed materials, home
experimental kits, videos, computing, etc.
Academic standards: Full degree programs, external control
of quality.
Longevity: First students in 1972.
9
The Open University
•
Currently 215,000 students.
•
Over 2 million students since 1972.
•
Ranked in the top 10% of all UK universities, for
teaching quality.
[Ranked after Cambridge, York, Oxford, Imperial College,
London School of Economics, Warwick, University
College London, Durham and Sheffield.]
Higher Education Funding Council, 1997
10
Technology in Education
and Distance Education
Question 2: Capital Intensive Education
What are the organizational options?
11
Capital Intensive Education
Conventional course:
•
•
Major cost is faculty time.
Costs are repeated every year.
Technology in education and distance education:
•
•
Course materials are a major expense.
Marginal cost of delivering course is low.
Consequences:
•
•
•
Economies of scale
Universities need access to capital
Course materials are an asset
12
Columbia University
Cambridge University Press
London School of Economics
New York Public Library
University of Chicago
University of Michigan
British Library
American Film Institute
RAND
Woods Hole
Victoria and Albert Museum
Science Museum
Natural History Museum
13
14
Technology in Education
and Distance Education
Question 3: Ownership and Intellectual Property
If course materials are assets, who owns them?
15
Recommendations of a Cornell
Committee
1. The university policies on intellectual property should
be independent of the media in which ideas are
expressed.
2. Creators of works should have control over the
intellectual output resulting from their research,
teaching, and writing.
3. When there are multiple creators of an individual
work, the control should be shared among the creators.
4. When the university contributes substantial resources
to the development of specific materials, it has a right to
share in the control and returns.
16
MIT to make nearly all course materials available free
on the World Wide Web
Unprecedented step challenges 'privatization of knowledge'
CAMBRIDGE, Mass. -- MIT President Charles M. Vest has
announced that the Massachusetts Institute of Technology will
make the materials for nearly all its courses freely available on the
Internet over the next ten years. He made the announcement about
the new program, known as MIT OpenCourseWare (MITOCW), at
a press conference at MIT on Wednesday, April 4th.
MIT Press Release, April 4, 2001
17
Digital Libraries
By creative use of technology:
Can we build libraries that are of high quality at much lower costs?
18
Research Libraries
are Expensive
library
materials
buildings
& facilities
staff
19
The Open Access Web
Before the web
•
Few people had access to scientific, medical, legal information
With the web
•
•
Much high quality information is available with open access
Free services organize this information and provide access to it
"Please can I use the web? I don't do libraries."
Anonymous Cornell student, circa 1996.
20
The Potential of Digital Libraries
open
access
?
materials
computers
& networks
staff
21
Digital Libraries
Question 1: Economic Models for Open Access
Who pays for open access?
22
A False Assumption
Incorrect thinking
The only incentive for creating information is to make money
-- royalties to authors and profits for publishers
Correct thinking
Many creators do not require revenue
•
•
•
Marketing and promotion
Government information
Academic research
They want their materials to be used
23
Examples
Old
New
Books in Print (subscription)
Amazon.com (advertising)
Medline (pay-by-use)
Grateful Med (external)
Journal (subscription)
ePrint archives (external)
Westlaw (pay-by-use)
Legal Information Institute
(external)
Inspec (subscription)
Google (advertising)
24
Before You Ask ...
• The open access information is sometimes a poor
substitute
• Much good information is not available with open
access
But every year the proportion of important
information that is available with open access
increases
25
Open Letter
We support the establishment of an online public library that
would provide the full contents of the published record of
research and scholarly discourse in medicine and the life
sciences in a freely accessible, fully searchable, interlinked
form. Establishment of this public library would vastly
increase the accessibility and utility of the scientific
literature, enhance scientific productivity, and catalyze
integration of the disparate communities of knowledge and
ideas in biomedical sciences.
26
Hypotheses for Scholarly Information
The dominant force is author pressure, which emphasizes open
access rather than closed access.
1. A mixture of economic models will coexist.
2. Eventually, we will have open access to most scientific and
professional information.
3. The most common economic model will be that information is
published by the producing organization.
27
Digital Libraries
Question 2: Quality
What are the alternatives to peer review?
28
29
Observations about Peer Review
At its best, it is superb.
At its worst, it validates junk.
Some topics can be reviewed from a
paper, e.g., mathematics.
Some topics cannot be reviewed from a
paper, e.g., computer systems.
"Whatever you do, write a paper. Some
journal will publish it." Advice to young
faculty member, University of Sussex, 1969.
30
Quality without Peer Review
How can readers recognize good quality
materials?
How can publishers maintain high standards
and let readers know?
How can a scientist build a reputation outside
the traditional peer-reviewed journals?
A sample of one: William Y. Arms
31
Digital Libraries
Question 3: Brute Force Computing
How far can computers be used for the skilled tasks of professional
librarianship?
32
Brute Force Computing
Few people really understand Moore's Law
-- Computing power doubles every 18 months
-- Increases 100 times in 10 years
-- Increases 10,000 times in 20 years
Simple algorithms + immense computing power
may outperform human intelligence
33
Brute Force Computing
Example
Creators of the world champion chess program
(Deep Thought later Deep Blue)
-- moderate chess players
-- simple tree-search algorithm
-- very, very fast computer hardware
34
Example: Catalogs and Indexes
Catalog, index and abstracting records are very
expensive when created by skilled professionals
-- only available for certain categories of material
(e.g., monographs, scientific journals)
-- contain limited fields of information
(e.g., no contents page)
-- restricted to static information
35
Equivalent Services
Information discovery
I used to be a heavy user of Inspec. Now I use Google
instead.
Why are web search services the most widely used
information discovery tools in universities today?
36
Thinking out of the Box
For information discovery, particularly with untrained
users:
automated indexing of full text
is at least as effective as
manually produced indexes and catalogs
[Demonstrated repeatedly in experiments going back to the
original Cranfield experiments.]
37
Digital Libraries
Question 4: Automated Digital Libraries
What is the state of the art in automated digital libraries?
38
Automated Digital Libraries: Examples
Automatic indexing
Lycos, Infoseek, Altavista, Google, ...
Query matching
Vector methods (Salton)
Ranking importance
Google (Page and Brin)
Archiving
Internet Archive (Kahle)
Collection development
ResearchIndex (Lawrence)
Metadata extraction
Informedia (Wactlar)
39
Digital Libraries
Question 5: A National Science Library (NSDL)
Can we build a very low cost national science library using the
methods of automated digital libraries?
40
One of Six Core Integration
Demonstration Projects
for the NSDL
41
How Big might the NSDL be?
The NSDL aims to be comprehensive -- all branches of science,
all levels of education, very broadly defined.
Five year targets:
1,000,000
different users
10,000,000
digital objects
100,000
independent sites
Requires: low-cost, scalable, technology
automated collection building and maintenance
42
Levels of Interoperability:
Metadata Harvesting
Agreements on simple protocol and metadata standard(s)
Example:
Metadata harvesting protocol of
the Open Archives Initiative (MHP)
• Moderate-quality services
• Low cost of entry to participating sites
Moderately large numbers of loosely collaborating sites
Promising but still an emerging approach
43
Levels of Interoperability:
Gathering
Robots gather collections automatically with no participation
from individual sites
Examples:
Web search services (e.g., Google)
CiteSeer (a.k.a. ResearchIndex)
• Restricted but useful services
• Zero cost of entry to gathered sites
Very large numbers of independent sites
Only suitable for open access collections
44
Technology Demonstrations
1. One Library, Many Portals
2. Coherent Services across Heterogeneous Collections
3. Easy Integration of Participating Collections
4. Variable Levels for Integrating Collections
5. Tools to Create New Collections
45
Some Light Reading
William Y. Arms, "Automated digital libraries." D-Lib Magazine,
July/August 2000.
http://www.dlib.org/dlib/july20/07contents.html
William Y. Arms, "Economic models for open-access
publishing." iMP, March 2000.
http://www.cisp.org/imp/march_2000/03_00arms.htm
46
Download