View/Open

advertisement
The 5S Framework for
Digital Libraries and
Two Case Studies:
NDLTD and CSTC
NIT’99 - Taipei
August 18-20, 1999
Edward A. Fox
fox@vt.edu
(Computer Science + Digital Library Research Lab
+ Internet Technology Innovation Center)
Virginia Tech, Blacksburg, VA, USA
OUTLINE
Introduction
5S
Framework
(toward a DL theory)
NDLTD case study
CSTC/CRIM case study
Conclusions/invitations
Acknowledgements (Selected)
 Sponsors: ACM, Adobe,
IBM, Microsoft,
NSF, OCLC, US Dept. of Education, …
 Co-PIs: Marc Abrams, Robert Akscyn,
John Eaton, Brian Kleiner, Gail McMillan
 Students:
Fernando Das Neves, Robert
France, Neill Kipp, Paul Mather,
Constantinos Phanouriou, James Powell,
Ohm Sornil, David Watkins, Chang Zhang,
Jianxin Zhao
Digital Libraries --- Virginia Tech
 MARIAN
(NLM)
 CS DL Prototype - ENVISION (NSF, ACM)
 TULIP (Elsevier, OCLC)
 BEV History Base (NSF, Blacksburg)
 DL for CS Education - EI (NSF, ACM)
 WATERS, NCSTRL (NSF)
 NDLTD (SURA, US Dept. of Education)
 CSTC (NSF, ACM), CRIM (NSF, SIGMM)
 WCA (Log) Repository (W3C)
 VT-PetaPlex-1 (Knowledge Systems)
DLs Shorten the Chain from
Editor
Reviewer
Publisher
A&I
Consolidator
Library
DLs Shorten the Chain to
Editor
Digital
Reviewer
A&I
Library
DLs Shorten the Chain to
Author
Teacher
Digital
Reader
Editor
Reviewer
Learner
Librarian
Library
SMETE Library
(from www.dlib.org)
 Context:
Global movement toward Digital
Libraries (see April 1998 Commun. of ACM)
 NSF effort: Science, Mathematics,
Engineering, and Technology Education
Digital Library (focussed on undergraduates)
– 3 workshops, yearly increasing funds / new calls
 SMETE
Library likely to operate as distributed
federation, with separate parts for each key
discipline, and to lead to a global effort
Enhancing
Learning
Digital
Libraries
Student Porfolios
Self-Archiving
Gray Literature
(Dept. of Educ.)
NDLTD
Networked DL
of Theses &
Dissertations
Interactive
Experiences
Computer
Science
(with NSF
and ACM)
CSTC
CS
Teaching
Center
CRIM
Curriculum
Resources
Inter. MM
OUTLINE
Introduction
5S
Framework
(toward a DL theory)
NDLTD case study
CSTC/CRIM case study
Conclusions/invitations
Neill Kipp Dissertation
 Training
interested groups about 5S and the Star
Methodology, refining the Framework to have a
solid mathematical & pattern language foundation
 Case studies of projects at Virginia Tech or
involving VT staff/students: CSTC, NDLTD,
National Archives (with SAIC), Lexis, ...
 Open also to study DL projects elsewhere
 Focusing too on the design artifacts developed and
related issues of patterns, efficient description and
representation (esp. with markup, hypermedia)
How to Build a Digital Library
 Understand
the problem (using the 5S
Framework)
 Solve
the problem (using the Star
Methodology)
– design, develop, evaluate,
– refine, operate
Star Methodology
Definition: Digital Libraries
are complex systems that help
 satisfy
info needs of users (societies)
 provide info services (scenarios)
 organize info in usable ways (structures)
 manage the location of info (spaces)
 communicate info with users (streams)
Note: “info” stands for data/information/knowledge;
“users” may include their agents too.
5S Layers
Societies
Scenarios
Spaces
Structures
Streams
5S Layers
Chinese 5 Element System
Societies
Scenarios
Spaces
Structures
Streams
Definition: 5S Framework
 Societies:
interacting people (, computers)
 Scenarios: services, functions, operations,
methods
 Spaces: domains + constraints (e.g., distance,
adjacency): 2D, vector, probability
 Structures: relations, trees, nodes and arcs
 Streams: sequences of items (text, audio,
video, network traffic)
5S: Components
 Societies:
roles, rituals, reasons, relationships,
artifacts
 Scenarios: acquire, index, consult, administer,
preserve
 Spaces: physical, temporal, functional,
presentational, conceptual
 Structures: architectures, taxonomies, schema,
grammars, links, objects
 Streams: granularities, protocols, paths, flows,
turbulences
OUTLINE
Introduction
5S
Framework
(toward a DL theory)
NDLTD case study
CSTC/CRIM case study
Conclusions/invitations
A Digital Library Case Study
Domain:
graduate
education, research
Genre:ETDs=electronic
theses & dissertations
Submission:
http://etd.vt.edu
Collection:
http://www.theses.org
Project:
Networked Digital
Library of Theses &
Dissertations
(NDLTD) http://
www.ndltd.org
Key Ideas:
Scalability
Networked infrastructure
University collaboration
Workflow, automation
Maximal access
Education is the rationale
8th graders vs. grads
Authors must submit
Standards
PDF, SGML, MM
MARC, DC, URNs
Federated search
NDLTD Layers
Societies
Scenarios
Spaces
Structures
Streams
ETDs Got Your Interest?
ETD Web Site
http://www.ndltd.org/
Graduate Students
U. Laval
Media
Singapore AM
Chronicle of Higher Ed.
National Public Radio
NY Times, Nature, Science,...
Who has helped / when?
mtg in Ann Arbor: UMI, VT, …
 1992 mtg in Washington: CNI, CGS, UMI, VT
and 10 universities with 3 reps each
 1993 mtg in Atlanta to start Monticello
Electronic Library (MEL): SURA, SOLINET
 1994 mtg in Blacksburg re ETD project: std of
PDF + SGML + multimedia objects
 1996 funding by SURA, US Dept. of Education
(FIPSE) for regional, national projects
 1997 meetings in UK, Germany, ...
 1987
Status of the Local Project
 Approved
by university governance
Spring 1996; required starting 1/1/97
 Submission & access software in place
 Submission workshops for students
(and faculty) occur often: beginner/adv.
 Faculty training as part of Faculty
Development Initiative
 Over 2000 ETDs in collection
Who are sponsors / cooperators?

Funding, Donations of hardware/software
–
–
–
–
–
–

SURA
US Dept. of Education (FIPSE)
Adobe Systems
IBM
Microsoft
OCLC
Others Serving on Steering Committee
– National/Regional Projects: Australia, French
speaking group, Germany, IberoAmerica
(ISTEC), UK (UTOG)
– CGS, National Lib. Canada, NSF, OAS,
SOLINET, UMI, UNESCO, ...
Institutional Members
Coalition
for Networked Information
(CNI)
Committee on Inst. Coop. (CIC)
Diplomica.com
Dissertation.com
National Library of Portugal
UNESCO
US University Members
Air
University (Alabama)
Cal Tech
Clemson University
College of William & Mary
Concordia University (Illinois)
East Tenn. State University
Florida Institute of Tech.
Florida International University
Michigan Tech
Naval Postgraduate School (CA)
North Carolina State U.
Penn. State University
Rochester Institute of Tech.
U. of Florida
U. of Georgia
University of Hawaii, Manoa













U. of Iowa
U. of Maine
U. of Oklahoma
U. of South Florida
U. of Tennessee, Knoxville
U. of Tennessee, Memphis
U. of Texas at Austin
U. of Virginia
U. Wisconsin - Madison
Vanderbilt U.
Virginia Tech - required since 1/97
West Virginia U. - required
beginning fall 1998
Worcester Polytechnic Inst.
Australian Project Members
U.
New South Wales (lead institution)
U. of Melbourne
U. of Queensland
U. of Sydney
Australian National University
Curtin U. of Technology
Griffith U.
German Project Members
Humboldt
University (lead institution)
3
other universities
5
learned societies
1
computing center
2
major libraries
Other International Members
Chinese
University of Hong Kong
Chungnam National U., Dept of CS (S. Korea)
City University, London (UK)
Darmstadt U. of Tech. (Germany)
Free University of Berlin (Germany - Vet. Med.)
Gyeongsang National U. (Korea)
Indian Institute of Technology, Bombay (India)
Nanyang Technological U. (Singapore, part)
National U. of Singapore (Singapore, part)
*National Library of Portugal
Polytechnic University of Valencia (Spain)
Rhodes U. (South Africa)
St. Petersburg St. Tech.U (Russia)
Univ. de las Américas Puebla (Mexico)
University of Padua (Italy)
U. Laval; U. of Guelph; U. Waterloo; Wilfrid Laurier U. (Canada)
Type 1 Members
University Requires ETDs
 Adobe Acrobat
and/or XML/SGML tools
 Automated submission & processing
 Archive/access through UMI, (OCLC,)
Virginia Tech, ...
 (Local) WWW site, publicity
 (Local) Assistance provided as requested:
email, phone, listserv(s)
Type 2 Members
University Agrees to Require ETDs
 Like
Type 1 but set date not yet reached
 Usually has an option or pilot
 May: wait for new AY; start with all who
enter after; …
 Build grass roots support
–
–
–
–
Advisory committee: representative? expert?
Champions to spread by word of mouth
Approval: Senates, Commissions, Deans, Students
Publicity to reach community
NDLTD Members, Types 3-7
3.
Part of university requires ETDs
4. University allows ETDs
5. University investigating, has pilot
6. University consortium joins:
– (Canadian group of 3 universities)
7.
Non-university organization joins
– CNI (Coalition for Networked Info.)
NUDL
 Proposal
to NSF under DLI-2 international
program
– VT: Library, Grad School, Industrial&Systems Eng.
– Partners: UK (2) , Singapore, Russia, Korea, Greece,
Germany, plus Iberoamerican group (Spain,
Portugal, Argentina, Brazil, Chile, Mexico)
– Problems: Multilingual search, multimedia
submissions, requirements/usability, …
 Start
with ETDs, then expand to other student
works, portfolios, data sets, (CS) courseware, ...
National Coverage (red/white)
Relationship with publishers
 Concern
of faculty and students that still
wish to publish books or journal articles,
voiced: campus, Chronicle, NPR, Times
 Solution: Approval Form gives students,
faculty choices on access, when to change
access condition; use IPR controls in DL
 Solution: by case, work with publishers and
publisher associations to increase access
–
–
AAP, AAUP
AAAS, ACM, ACS, Elsevier, ...
Some responses from publishers
 ACM:
need to acknowledge copyright
 Elsevier: need to acknowledge copyright
 IEEE-CS: endorse initiative
 ACS: After first publication, can release
 Textbook publishers: different market,
manuscript significantly reworked
 General: restricting access to local campus
will not cause any problems
ETD Initiative (and UMI)
Students
Learn about
DL, EPub
TDs
become more
expressive
Global TDs
become more
accessible,
archived
Universities
UMI
N. Amer. (T)Ds are
accessible, archived
NDLTD Layers
Societies
Scenarios
Spaces
Structures
Streams
What are we doing?
 Aiding
universities to enhance grad educ.,
publishing and IPR efforts
 Helping improve the availability and
content of theses and dissertations
 Educating ALL future scholars so they can
publish electronically and effectively use
digital libraries (i.e., are Information
Literate and can be more expressive)
What are the long term goals?
 400K
US students / year getting grad
degrees are exposed / involved
 200K/yr rich hypermedia ETDs that
may turn into electronic portfolios
 Dramatic increase in knowledge
sharing: lit. reviews, bibliographies, …
 Services providing lifelong access for
students: browse, search, prior
searches, citation links
Student Prepares Thesis or Dissertation
NDLTD
Literature
Computer Resources
Research
Student Defends and Finalizes ETD
My Thesis
ETD
Student Gets Committee Signatures
and Submits ETD
Signed
Grad School
Graduate School Approves ETD
Student is Graduated
Ph.D.
Library Catalogs ETD and New Students
Have Access to the New Research
WWW
NDLTD
Access Statistics
1996
Total successful requests: 37,171
Av. successful requests/day: 102
Requests for .PDF files:
4,600
Requests for .HTML file
28,225
Distinct hosts served
9,015
Total data transferred:
3,229M
Av. data transferred/day:
9M
1997
247,573
685
72, 854
129,831
22,725
25,953M
73M
1998
628,401
1,690
343,236
215,896
36,724
74,051M
222M
International Use
 1996
 850
 608
 346
 713
 387
 463
 250
 191
 183


22
83
1997
2992
2,501
2378
2367
1264
1161
725
867
1130
967
958
1998
8170 United Kingdom
4223 Australia
7373 Germany
3970 Canada
2201 South Korea
4431 France
2553 Italy
2781 Netherlands
1449 Brazil
1089 Thailand
1414 Greece
How can a university get
involved?
 Select
–
–
–
–
planning/implementation team
Graduate School
Library
Computing / Information Technology
Institutional Research / Educ. Tech.
 Send
us letter, give us contact names
 Adapt Virginia Tech solution
–
–
Build interest and consensus
Start trial / allow optional submission
Build Local ETD Site
ETD
Workshop/Training
Digital Library
Policies
Inspection/Approval
Support Offered
 Software,
documentation, tech support
 Email, listservs (etd-l@listserv.vt.edu,
eval, -grad, -library, -technical)
 Donations: Adobe, Microsoft
 Evaluation: instruments, analysis
http://scholar.lib.vt.edu - solutions/statistics
 (Temporary storage / archiving; aid - in
setting up an int’l service & archive)
Enhancements
 Dublin
Core spec, MARC crosswalk
 DTDs for SGML, XML(+ <discipline>ML)
 Annotation system (author, friends, notes)
 Routing system (based on Sift)
 Multilingual WWW site, training materials
(Spanish recently done in Valencia)
 Better federated search (w. Z39.50, planned
with Dienst and Harvest - maybe MARIAN)
Further Services
 Adding
services currently prototyped
– support with IBM DL, OCLC SiteSearch
 Adding other services planned
– building and using citation database (SFX)
– implementing plagiarism check (SCAM)
 Developing NUDL as a sustainable self
governing global institution (w. committees)
Everyone Learns
become “info literate”
Students learn about discovery, search,
categorization/classification, e-pub,
preservation, helping others find/reuse
Campus starts to think about IPR
Students
– e.g., Virginia Tech symposium
Faculty
and students improve quality as
reader base expands
NDLTD Layers
Societies
Scenarios
Spaces
Structures
Streams
Accessibility Activities / Plans
 Interface
design (simple, 3D, VR)
 Usability studies
 Generic multi-lingual support
 Support for those with disabilities
 Hybrid collection (paper, MARC,
abstracts, full-text, multimedia)
 Disciplinary classifications, tools
 Visualization of results, collection
SPIRE Visualization
NDLTD Layers
Societies
Scenarios
Spaces
Structures
Streams
Convene Local Planning Group
ETD
Build Local ETD Site
ETD
Workshop/Training
Digital Library
Policies
Inspection/Approval
Support Structures Developed
 WWW
site with > 300 Mb, CD, videotape
 Automated submission system (MySQL,
UNIX, WWW scripts - grad school/library)
 Student guidelines, style sheets, multimedia
training materials, FAQs, press info
 SGML and XML DTDs for ETDs
 SGML to HTML (web generator)
 LaTeX, Word templates
PetaPlex
 Digital
Library Machine (“super” object store)
 Parallel
computer / storage utility for scale of 1000
to 100,000,000 gigabytes (1 Tbyte - 100 Pbyte)
 Knowledge
Systems Incorporated is supplying
VT-PetaPlex-1 for $250,000 with
– high speed backbone connection (OC-12)
– 2.5 terabytes through 100 “Nanoservers”:
– Each = Network connection + IBM 25GB disk +
233 MHz Pentium II + Linux
Service
Machine 1
Service
PetaPlex Complex
Service
Machine 2
Nanoserver
FRONT END MACHINE
RS/6000, 1G RAM, 4 Proc.
Machine 3
Service
Machine 4
Nanoserver
Nanoserver Nanoserver Nanoserver Nanoserver
Nanoserver Nanoserver Nanoserver Nanoserver
Nanoserver Nanoserver Nanoserver Nanoserver
PetaPlex Top View
4 ft.
side
PetaPlex Side View
many
shelves /
Roles:
* Support
* Cooling
* Power
side
8 ft.
high
4 ft. wide
NDLTD Layers
Societies
Scenarios
Spaces
Structures
Streams
DL Submission Software
Similar
software developed for WCA,
CSTC, and NDLTD
CSTC version field-tested to manage
papers for ACM Digital Libraries ‘99
May generalize for
– conferences
– electronic journals
– resource description (e.g., courses, Web
content)
User Search Support
(multilingual, XML)
NDLTD World Federated
Search
User
Interface
Virginia Tech ...
(univ)
Germany
(Dissertations
Online)
Portugese NL ...
Australia
(national
(national
library)
project)
Note: All groups shown are connected with NDLTD.
www.theses.org
 James
Powell student project, D-Lib
Magazine description in Sept. 1998
 XML description of each site
– type of search engine / service
– language
– coverage (for resource discovery)
 Adding
Z39.50 gateway capability
Interoperability Testing
 IBM
DL: donated equipment, technical
support, powerful IPR (see TOIS, D-Lib)
 Z39.50: OCLC SiteSearch / VT tailored s/w
– university libraries w. catalogs of freely shared
MARC records pointing to archival copies
– via URNs: handles & PURLs
 Dienst
/ NCSTRL - www.ncstrl.org: CS depts.,
DARPA, NSF, CNRI, Cornell - UVA is
working on extensions for ETDs - Portugal is
studying use for Europe
OUTLINE
Introduction
5S
Framework
(toward a DL theory)
NDLTD case study
CSTC/CRIM case study
Conclusions/invitations
CSTC/CRIM Layers
Societies
Scenarios
Spaces
Structures
Streams
CS -> CSTC -> CRIM
 NSF
and ACM Education Committee are funding
a 2 year project “A Computer Science Teaching
Center” - CSTC - http://www.cstc.org/
 College of NJ, U. Ill. Springfield, Virginia Tech
 Focus initially on labs, visualization, multimedia
 Multimedia part is also supported by “Curriculum
Resources in Interactive Multimedia” - CRIM grant to Virginia Tech and George Washington
University: http://www.cstc.org/~crim/
CS Teaching Center (CSTC)
 Instead
of building large, expensive
multimedia packages, that become obsolete
and are difficult to re-use, concentrate on
small knowledge units.
 Learners
benefit from having well-crafted
modules that have been reviewed and tested.
 Use
DLs as a powerful base of support for
learners - for a variety of courses, self-study
tutorials & reference resources.
Solutions, Plans
 CSTC
will have a variety of focused centers
so that different types of resources can be
collected, tested, and suitably packaged:
– laboratory exercises, activities, assignments
– visualizations and visualization tools
– interactive multimedia resources (CRIM)
 ACM
may launch a “Transactions in
Courseware and Education in Computing” to
provide an ongoing infrastructure for CSTC.
CRIM Rationale
 MM
field needs properly trained personnel
 Support this with resources + curricula
 Together these help us move toward a DL
for Interactive MM -> CS -> SMETE
 Benefits will go to teachers (who have more
to build upon) and students (who will have a
richer environment for learning)
Concerns, Problems
 Motivating
educators to create modules that can
be used elsewhere is difficult without a suitable
reward structure and an infrastructure of testing,
packaging, discovery, reuse, and evaluation.
 There
is a unnecessary disconnect between
researchers (e.g., in laboratories) preparing
exciting demonstrations for conferences and
instructors interesting in helping students grasp
underlying concepts and innovations in their area.
CRIM Project Activities
 Workshops
 WWW
etc. to involve community
site including DL in CSTC re MM
– Cataloging schema, user interface
– Refers to MM syllabi and curriculum
– Inviting learning resources for the CRIM
DL, with reviews, reuse certifications
 Publish
report on MM curriculum
Dimensions / Categories (matrix+)








Level: K-12, ugrad (low, upper), grad (MS, PhD), prof.
Length: reference, short course, course unit, course, …
Academic orientation: science, engineering, art, communications,
multidisciplinary, marketing
Pedagogical orientation: mm use, survey/hands-on,
traditional/constructivist (design, develop)
Tool connection: course on <Photoshop>, compare animation
tools, assume know or can learn, use tool as example
(Target) audience (and background): culture
Learning style: visual/auditory, indiv/group
Relation: closed/complete, part of some structure (e.g., course,
program)
Virginia Tech Courses
 Art:
Digital Art and Design course (Photoshop)
 CS: 1604 Introduction to the Internet (1 cr.)
 CS: 3604 Professionalism in Computing
 CS: 4624 Multimedia, Hypertext and Information
Access (3 cr.)
 CS: 5604 Information Storage & Retrieval (3 cr.)
 CS: 6604 Digital Libraries (3 cr.)
CS4624
Units
–Applications&Authoring
–Capture&Representation
–Compression&Models
–Presentation&Interaction
–Communication&Networking
OUTLINE
Introduction
5S
Framework
(toward a DL theory)
NDLTD case study
CSTC/CRIM case study
Conclusions/invitations
Future Work - 1 of 2
 Working
with universities, authors,
publishers to increase level of use
 Interoperability tests of integrated services
 Study with testbed that emerges, to improve:
submission, information retrieval, browsing,
interface, and other types of user support
 Evaluation, improving learning experience,
spread to worldwide initiative, sustainable
support and coordination
Future Work - 2 of 2
 NDLTD
services currently prototyped
– annotation and SDI (routing) capabilities
– Dublic Core metadata, crosswalk to MARC
– support with IBM DL, OCLC SiteSearch
 Adding
other services planned
– building and using citation database (w. SFX)
– implementing plagiarism check (like “SCAM”)
 Developing
NUDL as a sustainable self
governing global institution (w. committees)
including active participation in Japan
Invitations!
Use
5S (to understand, build DLs)
Add
to, use CSTC, CRIM
Join
NDLTD and NUDL
Download