Final Report: Project to Implement a Scholarly Information

advertisement
Implementation of a Scholarly Information Portal Using the
Open Archives Initiative Protocol for Metadata Harvesting
University of Illinois at Urbana-Champaign
Final Report to the Andrew W. Mellon Foundation
25 July 2003
Timothy W. Cole, Principal Investigator
Thomas G. Habing, Co-Principal Investigator
William H. Mischo, Co-Principal Investigator
Christopher Prom, Co-Principal Investigator
Beth Sandore, Co-Principal Investigator
Joanne Kaczmarek, Visiting Project Coordinator, September 2001-August 2002
Sarah Shreeves, Visiting Project Coordinator, August 2002-October 2002
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Executive Summary
In June 2001, the University of Illinois at Urbana-Champaign was awarded funding from the
Andrew W. Mellon Foundation to create a portal to facilitate access to scholarly cultural heritage
information. The project was designed to investigate the efficacy of harvesting and aggregating
metadata using the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH).
Between its start date and its conclusion in May 2003, the project accomplished the following:







Developed robust, scalable tools supporting the harvest and aggregation of metadata.
Investigated issues relating to value-added metadata normalization, portal search
interface design, presentation of search results, and integration of Encoded Archival
Description (EAD) metadata with Dublin Core (DC) metadata
Sampled range of metadata authoring practices in domain of cultural heritage.
Demonstrated technical viability of search and retrieval across an aggregation of
descriptive metadata harvested using the OAI-PMH.
Identified critical issues relating to use of OAI-PMH in domain of cultural heritage.
Tested usefulness and usability of search portal with one target user population.
Implemented infrastructure to maintain metadata search portal after end of project.
This final report provides a summary of activities and results over the entirety of the project,
focusing especially on: technical implementation of metadata harvesting; efforts to integrate
EAD and DC metadata; creation, development, and testing of the prototype search portal; and
sustainability activities. While this report stands on its own as an overview of the project,
emphasis is on activities since publication of interim project report in August 2002. The reader is
referred to that report and to papers and presentations published over the entire course of the
project for further details of accomplishments. Project-related papers and presentations through
31 August 2002 were included in the interim report supplement volume. Papers and
presentations from 1 September 2002 are included in the supplement volume of this report.
Technical Results: In total, we collected metadata from thirty-nine repositories describing more
than 2.5 million discrete items held by approximately 500 institutions worldwide. These
institutions included museums, archives, historical societies, and academic and public libraries.
Consistent with other OAI-PMH projects, we found the protocol robust and scalable for use with
metadata collections of cultural heritage institutions. We demonstrated a capability to harvest
hundreds of thousands of metadata records per hour. Harvest rates were found to be determined
by capacity of metadata provider software. Optimally a single harvesting workstation runs
multiple harvests concurrently. Range of technical capability of those participating in OAI-PMH
supports claim that OAI-PMH is a low-barrier way to implement interoperability; however, we
did note a segment of potential content providers not yet technically able to implement OAIPMH. The soon-to-be released OAI Static Repository Protocol for Metadata Harvesting promises
to lower technical barrier even further.
Research Investigations: As described in interim project report, scalability and robustness of the
University of Michigan's DLXS / XPat search engine proved sufficient to search metadata
aggregated for this project; however, the heterogeneity of metadata records harvested (in spite of
the fact they all described cultural heritage information resources) impacted adversely on utility
of search portal. Quality and consistency of harvested metadata varied widely and in surprising
1
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
ways. Efforts to mitigate adverse impact of this heterogeneity on utility of search and discovery
services achieved only mixed results. We had success normalizing descriptive attributes such as
date and resource content type, but were generally unable to normalize or collate described
resources by subject or topic. Attempts to use automated subject analysis tools developed for
full-text records (e.g., NCSA's Themeweaver tool) were made difficult by the sparseness of
metadata records. Only 15% of the metadata records provided by academic libraries included one
or more instances of DC subject or description fields. Even when information was included in
these fields, it was generally brief (as would be expected for metadata records). Because the
NCSA tool was optimized for longer texts, effective clustering could not be achieved. Additional
customization of this tool might yield better outcomes, but generally our results support
hypothesis that human-generated descriptive metadata alone may not be enough for optimum
search and discovery. Clearly also, while some enhancements can be made post-harvest, better
metadata to start, created with interoperability in mind, would reduce heterogeneity of metadata
and facilitate aggregation. We also had mixed results in trying to aggregate metadata derived
from EAD finding aids with DC metadata. While we demonstrated the viability of deriving
multiple item-level metadata descriptive records from individual EAD finding aid files (while
still retaining ability to show hits retrieved in context of full EAD when desired), EAD-derived
records were generally sparser and different in character from natively harvested DC records, or
DC records derived from MARC catalog records. Repetition of information within EAD
description of subordinate component nodes tended to clutter results returned for naïve searches.
EAD authoring practices and traditions are unique. EAD and OAI-PMH are by no means
antithetical, but further work is needed to more effectively integrate EAD and DC metadata.
Utility, Usability, & Sustainability: OAI-PMH has lived up to billing as a low-barrier, low-cost
way to harvest and aggregate descriptive metadata. While sufficient for the job, the protocol is
focused in scope and rigorous and unambiguous in requirements. This makes it easy and
inexpensive to implement, and cheap to maintain. (The harvesting service implementation
developed for this project will require .25 FTE recurring staff and amortization of IT hardware to
maintain. We have undertaken to maintain our implementation indefinitely following end of this
project.) However, OAI-PMH by itself is not a magic bullet. OAI-PMH cannot compensate for
poor quality metadata. Limited testing with upperclass undergraduates who will soon be teaching
high school social studies showed that many search interface and aggregation issues remain. The
tendency to associate OAI-PMH almost exclusively with lowest common denominator simple
DC makes it difficult to implement more advanced search interface features. Content providers
should prefer more expressive metadata schema like MARC or qualified DC. Our testing also
suggests that mixing metadata describing analog resources with metadata describing online
resources is often undesirable. Additional effort is still needed to improve metadata quality and
consistency and to find ways to augment human-generated descriptive metadata. In sum, this
project has contributed to our collective understanding of what is required to implement and
sustain metadata harvesting services based on the OAI-PMH and has been instrumental in
shaping and directing the evolution of the protocol and its community of use. OAI-PMH has
proven to be technically viable and of good utility. But OAI-PMH is ultimately only as useful as
the metadata it transports. The harvesting service and portal created for this project will be
continued to provide a testbed for further research. The University of Illinois also has undertaken
a similar project in domain of engineering and science. New, related work has been undertaken
at Illinois under the sponsorship of IMLS, NSF / NSDL, and the CIC. There is good opportunity
during the next few years for follow-on research that will build on the results of this project.
2
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Table of Contents
Executive Summary ........................................................................................................................ 1
Table of Contents ............................................................................................................................ 3
Summary of Project Activities ........................................................................................................ 4
A. Implementation of metadata harvesting service..................................................................... 4
B. Efforts to integrate EAD metadata with DC metadata ........................................................... 8
C. Creation, development, and testing of search portal ............................................................ 11
D. Sustainability........................................................................................................................ 18
Accomplishments and Activities Listed by Month ....................................................................... 21
Proposal Objectives Accomplished .............................................................................................. 25
Metadata Providers Organized by Type of Institution (through March 2003) ............................. 27
Metadata Providers Organized by Type of Institution (April 2003- ) .......................................... 33
Bibliography of Publications & Selected Presentations ............................................................... 36
A. Publications .......................................................................................................................... 36
B. Presentations......................................................................................................................... 36
Supplemental Materials (Included in separate volume)
Appendix A: Project proposal (as submitted April 2001)
Appendix B: Papers published since release of interim report
Appendix C: Presentations given since release of interim report
Appendix D: White paper on the usefulness of OAI-PMH based search portal for K-12 teachers
Appendix E: OAI Metadata Harvesting Service Workshop for CIC member libraries
Appendix F: Introduction to OAI-PMH, half-day tutorial given at JCDL 2003
Appendix G: Project software tools available on SourceForge.Net
OAI-PMH harvesting & metadata provider Tools
XSLT stylesheets for EAD to DC transformations
ZMARCO Tool
Appendix H: Screenshots of Project Website
Appendix I: Illustrative Metadata Search
3
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Summary of Project Activities
A. Implementation of metadata harvesting service
OAI-PMH was designed to facilitate the sharing and discovery of scholarly information
resources. Descriptive information (metadata) about many of these resources is contained in
databases and XML documents not readily available to or easily indexed by current Web search
engines. OAI-PMH can be used to convey metadata describing resources that are analog as well
as digital, though the initial impetus for development of the protocol was to convey metadata
describing born-digital primary sources (e.g., e-prints). As a starting point for our project we
sought metadata describing materials that are culturally significant, such as rare books,
manuscripts, and personal papers held by library archives and special collections or in museums
and historical societies. We included metadata describing born-digital resources, resources
having digital representations (e.g., scanned images and pages of printed or handwritten texts),
and resources only available in analog format (e.g., only available as hardcopy). We sought
metadata in simple DC, DC variants, MARC, and EAD formats. To assess efficacy of OAI-PMH
in domain of cultural heritage we focused also on the development of a Web portal through
which end-users could search aggregated metadata to discover resources of interest.
Characteristics of metadata aggregation: The metadata aggregated for our project came from 39
metadata providers. However, the number of providers does not offer a full picture of the
heterogeneous nature of the aggregation. Figures 1 and 2 show the percentage breakdown of
metadata harvested by metadata provider institution type and by type of resource described.
Several of the data providers we harvested are aggregators themselves, that is, they collect the
metadata they provide from multiple institutions. Three of the repositories harvested (CIMI, the
Online Archives of California, and the Colorado Digitization Project) are large-scale aggregators
of metadata. Including them in our aggregation, meant that our aggregation contained metadata
describing content held in approximately 580 institutions worldwide. In addition to these
aggregators, several metadata providers have made available several distinct and separately
maintained collections of metadata using the "sets" concept inherent in the OAI-PMH. For
example the University of Tennessee Libraries makes available eleven distinct collections —
from an Appalachian photograph collection to scanned images of an emancipation newspaper to
electronic theses. A harvesting service may choose to harvest all or just some of these
collections. Where appropriate, we harvested from a given provider only those sets of metadata
describing cultural heritage resources (broadly defined). The issue remains that each of these
collections may use metadata differently than another collection. Variations are most notable
institution to institution, but are sometimes present even within a single institution's metadata.
Not all the metadata harvested was harvested directly from OAI-PMH compliant sites. EAD files
(approximately 8,700), for instance, were obtained by FTP or were captured directly (with owner
permission) from archive Web servers. Before indexing such 'captured' metadata records, we
transformed them to simple DC as appropriate for the native schema in which we obtained the
metadata. Transformed metadata records were then made available from surrogate OAI-PMH
metadata provider sites running on University of Illinois servers. Generally metadata obtained in
this way was not updated over the course of the project; however, the use of these surrogate
providers allowed us to better text robustness and scalability of the protocol.
4
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
5%
18%
5%
41%
36%
20%
25%
50%
Academic Libraries
Text & Sheet Music
Digital Libraries
Images
Museums/Cultural/Historical Orgs
Public Libraries
Figure 1 – Breakdown of metadata
providers by type of institution
Artifacts
Other
Figure 2 – Breakdown of resource
types described
The full aggregation for this project contained 1,101,523 original item-level metadata records. Of
these records, 339,331, or approximately 30%, provided a direct link to an online resource, e.g.,
digitized image, scanned page of text, etc., via a hyperlink. We also obtained 8,730 EAD finding
aid files. Each of these EAD files described a collection of items (e.g., a personal manuscript
archive) rather than an individual item. Because metadata provided in Dublin Core format
describes individual items, it was necessary to develop automated algorithms to tease out from
collection-level EAD descriptions item-level metadata records describing individual collection
components (see further discussion below). This process added another 1,524,325 item-level
records to our aggregation. Almost none of the primary resources described by these EADderived item-level records had digital representations available online.
Metadata harvesting and provider software (see also Appendix G): To acquire metadata
records, we developed software tools for harvesting metadata using OAI-PMH. As a way to
facilitate participation in our project, we also enhanced tools previously created to help providers
make metadata available via OAI-PMH. All software and middleware developed during this
project has been released under an approved OpenSource software license and is available for
download from the SourceForge.Net OpenSource software repository
<http://sourceforge.net/projects/uilib-oai/> (see also Appendix G of this report). Our work
spanned two versions of the OAI protocol: version 1.1. (released in July 2001) and version 2.0
(released in June 2002). Harvester and provider tools were built for and tested on both the
Microsoft Windows and RedHat Linux operating system platforms, though the greatest range of
development was done for the Microsoft Windows platform. The Microsoft Windows
implementations rely on standard proprietary components from Microsoft (e.g., Microsoft
Internet Information Server, Active Server Pages, and VBScript) while the Linux
5
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
implementations rely entirely on freely available components (e.g., Apache Webserver, Tomcat
Java serverlet host, and Java).
Provider tools were developed and modularized to support various metadata storage
architectures. Our Linux metadata provider assumes metadata items are stored in a JDBCaccessible relational database (e.g., MySQL). Our Microsoft Windows provider tools support
three storage architectures (e.g., one where metadata resides in an ODBC-accessible relational
database like Microsoft Access or SQL Server, one where metadata resides in XML files on the
server's file system, and one using a hybrid database-file system approach). For Microsoft
Windows we also developed tools that extract metadata from <meta> elements embedded in
HTML files and from Z39.50 servers compliant with certain Z39.50 application profiles. This
latter tool, called ZMARCO, is a distinct project on SourceForge.net. While ZMARCO
demonstrates that an OAI-PMH front-end can be implemented for some Z39.50 applications, the
potential of this approach was found to be limited. Only Z39.50 implementations fully compliant
with the Bath or similar Z39.50 application protocol expose enough information and
functionality to allow the add-on of an OAI-PMH gateway. Specifically the Z39.50
implementation must index and make searchable through Z39.50 a unique, persistent record
identifier (a surprisingly large number of Z39.50 implementations do not), must return MARC
format (or simple DC), must include in record return a date field indicating when catalog record
was last touched (i.e., created or modified), and must allow searches that can be used to
systematically extract all records in the catalog (e.g., allow search by publication year, assign a
publication year to all records, and not have a governor on maximum number of records that will
be returned per query that is smaller than largest return for largest single publication year
search).
The baseline harvesters developed (one for each operating system platform) include extensive
feature sets that supports (a) full and incremental harvests of complete and set-specific metadata;
(b) selective filtering of harvested metadata records (i.e., before saving for purposes of indexing)
based on field-specific regular expression pattern matches; and (c) harvests of records in specific
metadata schemas. Harvesting schedules and most configuration parameters are controlled
through a Web-based interface written in Java. Information about the harvesting activity and
records harvested is maintained in a relational database. The harvesters tolerate individual XML
metadata records that fail to validate during harvesting, and both can recover from shortduration, non-repeating network and provider service failures. We also have made available
XSLT stylesheets for transforming Encoded Archival Description (EAD) metadata into OAIcompatible, simple DC metadata. Our Linux harvesting tool was adopted for a related Mellonfunded project at the University of Michigan (a number of optimization and reliability
improvements have been made to the harvesters in response to initial experiences and feedback
from staff involved in the University of Michigan Mellon-funded OAI Harvesting project), and
served also as a starting point for similar tools that have been developed elsewhere (e.g., the
UCLA–Indiana University–Johns Hopkins OAI sheet music project).
Additional harvesting tool components were created and made available for the Microsoft
Windows platform. A general purpose OAI-PMH harvesting DLL (dynamic link library) with
API (application programming interface) was made available to facilitate creation of customized
harvesting implementations by third parties. In addition to the more complex, baseline Microsoft
Windows harvester described above, a simple command-line harvesting tool (relying on the
same DLL) was created to facilitate quick testing and harvesting of OAI-PMH provider services.
6
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
A companion command-line utility for indexing harvested metadata in Microsoft SQL server
was also created.
Harvesting performance: Testing has shown that harvest times vary according to a few specific
parameters. Harvesting time was consistently provider- or network-limited rather than harvesterlimited, even when relatively modest harvesting hardware was used (e.g., a Pentium IV
Windows 2000 workstation). Assuming no filtering at time of harvesting, up to 10 simultaneous
harvests can be conducted from a single workstation without significant impact on performance
of any individual harvesting thread. Moderate to large blocks of records (e.g., 1,000 to 10,000
records transmitted at once) tend to reduce the time needed to harvest a collection. Thus, harvests
performed using the OAI ListRecords command (i.e., multiple records are delivered by metadata
provider in response to each harvester request) are typically an order of magnitude or more faster
than those using the OAI GetRecord command (i.e., each record being harvested is requested and
delivered individually). Our harvester can be configured to use either method. Filtering of
records at time of harvesting (i.e., selectively saving certain records returned by the provider)
tends to slow harvest times, sometimes by as much as an order of magnitude.
Though harvest times vary due to variations in provider-side performance, typically more than
100,000 records can be harvested in an hour. Assuming five simultaneous harvests threads, this
suggests that one workstation could easily harvest 20 million new records daily. Since few
available providers have this many records available it was not possible to verify this number
directly; however, a single-threaded harvest of the OCLC theses provider site (more than 4
million records) was accomplished in less than 24 hours. This capacity is encouraging and
implies fairly aggressive harvesting schedules, even from multiple repositories. It also suggests
considerable excess capacity for harvesting the metadata currently available in the cultural
heritage domain (at most a few million records distributed across less than 100 repositories).
Managing OAI metadata harvesting does require ongoing attention and human intervention,
although the amount of time required is decreasing as more experience with the protocol is
gained and providers become more robust and reliable. Once initial test harvest of a site has been
performed and the site has been incorporated established harvesting schedule, we've found that
less than one day of staff time per week is needed to deal with anomalies and other problems that
arise even for reasonably ambitious operations harvesting 30 to 50 sites. Some ongoing effort
will always be required to deal with failed harvest jobs, update harvesting schedules and
parameters over time, and identify new sites that should be harvested. The last task is currently
the most open-ended since the OAI-PMH has still not yet identified a reliable and systematic
way to locate new, potentially relevant provider services.
Indexing and search system technologies: While preliminary estimates of harvesting scalability
are encouraging, there remain scalability issues associated with trying to index and effectively
search very large collections of aggregation metadata. We are using DLXS/XPAT software
(version 10) created and maintained by the University of Michigan, currently installed on our
dual Pentium IV Linux server, as the indexing and search engine for our aggregated metadata.
Though some compromises have been required, this indexing and search application has proven
generally very robust and capable. For our application, the only serious limitation discovered to
date has been the limitation on result set size that the application will sort (currently 2,000
records). Improvement in this functionality over time is anticipated. For comparison and in order
to explore other indexing and search features, we have also indexed harvested metadata in
Microsoft SQL server. This application has certain advantages, especially in ranking and certain
7
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
full-text truncation and adjacency search features, but single-server implementations can't search
large sets (e.g., over a million metadata items) as fast. Our judgment is that while very largescale OAI-PMH metadata harvesting services today are straining the limits of current state-ofthe-art indexing and search tools, continued progress and improvement in index capacity seem
likely. For some OAI-PMH implementations search-engine limits will be a consideration, but
over time this should be less of an issue, and is already not an issue for more modest OAI-PMH
projects.
B. Efforts to integrate EAD metadata with DC metadata
As noted above, we obtained 8,730 EAD finding aid files describing cultural heritage collections
held by 57 different institutions. Though the EAD schema supports linkages to digitized content,
the vast majority of the resources described by finding aids available today are accessible in
analog format only – i.e., are available only in hardcopy formats at the holding institutions. In
order to index EAD metadata alongside metadata harvested in simple DC and other item-level
metadata schemas like MARC, we developed algorithms to tease out from collection-level EAD
finding aids item-level metadata records describing attributes of individual items in the
collections defined. These item-level metadata records were then made available through
surrogate OAI-PMH provider services on University of Illinois servers, harvested, and
incorporated in our baseline metadata aggregation and search service.
Analysis of EAD schema and implications for use with OAI-PMH: The Encoded Archival
Description (EAD) schema is one of the most widely used metadata formats employed by
cultural heritage and archives projects. It is used to encode finding aid level information about a
wide range of collective resources (e.g., archival manuscript holdings). EAD acts as a wrapper
for collective archival description, but it does so with differing levels of specificity depending on
the nature of the collection and the finding aid. Archival practice in constructing finding aids
varies widely from institution to institution, and EAD was designed to accommodate differences
while encouraging as much uniformity as possible by standardizing commonly used data
elements. There may be hundreds or even thousands of individual item-level information
resources described in a single EAD finding aid. These aspects of the EAD metadata schema
present challenges when aggregating EAD metadata with DC and MARC metadata as in an OAIPMH context. We explored several options for transforming EAD records for use with OAIPMH and developed and performed proof-of-concept testing of a procedure for creating useful,
item-level simple DC metadata records from native EAD records.
An EAD record has two main components: (1) metadata about the finding aid (i.e. the electronic
document describing the collection), and (2) metadata about the collection described in the
finding aid. Metadata about the finding aid is contained within required elements nested within
the high-level element <eadheader>. The information encoded in <eadheader> provides
summary information about the finding aid including title, author, and date of creation. Metadata
about the collection described in the finding aid is encoded in the high-level element <archdesc>.
This element usually comprises the bulk of the finding aid and may include numerous subelements, most of which are repeatable. In general, however, <archdesc> uses two types of
elements—those that describe the collection as a whole (generally immediate children of
<archdesc>), and those that describe subordinate components of the collection, such as a series
of files, an individual file, or an item (generally children, often multiple times removed, of the
<dsc> element, which is itself an immediate child of <archdesc>).
8
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
The EAD Application Guidelines include two recommended mappings to Dublin Core, one for
the finding aid and another for the resources described in the finding aid. In EAD, metadata
about the finding aid is encoded in the <eadheader>, but while mapping <eadheader> attributes
to DC can be useful, attributes of the individual resources contained in the collection described
by the finding aid generally are not included in this mapping. Researchers searching a broad
item-level metadata aggregation resource are likely to be more interested in the individual
information resources described in the finding aid than in the finding aid itself. Mapping
attributes from the <archdesc> is necessary to capture this information, but the EAD Application
Guidelines recommended mapping, which is a one-to-one mapping only, focuses entirely on the
top-level elements of the <dsc> node. As a result, most pertinent information about the
individual items in the collection described is not included in the simple DC record created by
this mapping. Neither mapping given by the EAD Application Guidelines will generate records
adequate enough for use in a metadata aggregation containing a large number of DC and MARC
item-level metadata records. We therefore chose to create an alternative, one-to-many mapping,
that mapped both top-level and <dsc> child elements of <archdesc> to DC.
The issue of where (provider or harvester side) to transform XML metadata in EAD into Dublin
Core remains unresolved. An advantage of mapping between XML metadata schemas on the
harvesting side is that the harvester would have the option to display metadata records in native
context. Thus, while item-level records generated from an EAD finding aid might be the most
useful for indexing, search, and discovery, it can be desirable for the end user to be able to view
(on the harvesting service site) item-level records found in the context of the original EAD
finding aid from which they were derived. This is the model we investigated.
Implementation Details: In essence, we decided that in order to manipulate and search the
components of an EAD file, it would be necessary to produce many simple DC item-level
records from each EAD source. We did these transformations using XSLT (available from
http://uilib-ead.sourceforge.net/). For each EAD file, one XSLT stylesheet produced a "top level"
DC record containing the collection level description, and a second XSLT generates records for
each child node of the EAD description of subordinate components (<dsc>) element. Such an
approach runs the risk of losing context for the information drawn from the source EAD file. To
mitigate this potential problem, we provided enough information to allow our search and
discovery service to reconstruct and display hits in the context of the source EAD files. Each
item-level DC file also includes a relation element providing a link to parent node of the derived
record in the original EAD file. Location of the hit in the original EAD file is retained implicitly,
rather than explicitly, using XPointer syntax. XPointer is current recommendation of the World
Wide Web Consortium, and provides syntax for the identifying XML fragments using a superset
of the XPATH syntax. The identifier of each Dublin Core encoded metadata record produced
from a subordinate component node within an EAD file points to the exact node of the EAD file
which is being described by the DC record. For example, an identifier that reads:
<dc:identifier>http://xxx/1205.xml#xpointer(//dsc[1]/c01[7])</dc:identifier>
points to the seventh <c01> element within the first <dsc> element in the file located at
http://xxx/1205.xml.
The file mapping that we developed, including the calculation of XPointers, was applied
to 8,730 EAD finding aids contributed by 57 institutions. To allow for testing, these finding aids
were aggregated into multiple surrogate OAI repositories (by institution) which were then
harvested by our OAI harvester. OAI records for both the top level and the component levels of
9
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
the finding aid were harvested by our OAI harvester. The procedure produced 8,730 top level
records and 1,515, 595 component level records.
The XPointer created is only useful assuming the original EAD file is an accessible XML
resource with a persistent identifier, and assuming scripts are in place that can use the XPointer
provided to locate the correct spot in the EAD finding aid file. Though XPointer is now a
recommendation of the W3C, there is little in the way of off-the-shelf software that understands
XPointer syntax. We created our own server-side scripts (see Figure 3) to utilize XPointer strings
generated in deriving item-level metadata records from EAD.
Figure 3, showing in-context result of search for "gun control"
(For more details on how this process is currently implemented for use in our search portal, see
the JCDL 2002 paper, available http://dli.grainger.uiuc.edu/publications/jcdl2002/p14-prom.pdf
and was included in the supplemental materials to the interim report.)
In general, the simple DC metadata records we derived from the child nodes of EAD finding aid
<dsc> elements are very brief, typically containing only short, descriptive title and/or name
information and little if anything else additional. DC type information may or may not be present
(either explicitly or implicitly), depending (apparently) on whether the EAD author thought it
10
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
necessary to label the content type of subordinate components. In the context of an overall
finding aid, brief descriptive nodes within the <dsc> element make sense, but when indexed and
searched alongside richer records, such brief records may be under-represented in search results.
Conversely, each EAD finding aid, on average, generated large numbers of simple DC metadata
records. Depending on how the EAD finding aid was created, many of the simple DC records
derived from a given EAD finding aid may be redundant to one degree or another (e.g., many
derived DC records from a given finding may repeat the same name). Since the nodes from
which they are derived were intended to be interpreted in context, the descriptive titles assigned
to subordinate component nodes may contain mostly common words. For these reasons, some
searches will retrieve a disproportionate number of metadata records derived from a single EAD
finding aid. Potentially this difficulty could be obviated by enhancements to the search and
retrieval system that merge for presentation purposes those retrieved metadata records coming
from a single EAD finding aid file. We did not have sufficient resources to pursue this approach.
While further refinements to the transformation stylesheets developed for this project remain to
be made, work done to date demonstrates an ability to transform between XML metadata
formats, such as EAD and DC, and the potential to utilize a variety of metadata formats with
OAI, at least from a technical perspective. However, for more effective interoperability across
fundamentally different metadata schemas such as EAD and DC, communities of practice need
to become more cognizant of the differences between them, and adopt best practices that better
facilitate interoperability where possible to do so.
C. Creation, development, and testing of search portal
The metadata records harvested for this project were extensive in breadth, featuring highly
heterogeneous content that originated in different communities. These traits allowed us to
examine selected questions about how a harvesting service can best process metadata and present
it to the end user. Generally this work suggests that metadata authors need to be more cognizant
of best practices for interoperability when creating metadata. While a few DC elements (e.g.,
date and type) lend themselves to normalization that can improve discoverability for end users,
normalization of more complex DC elements (e.g., subject and description) is not practical.
Better consensus in the community on how to create metadata is needed.
Design of a portal for effectively searching harvested metadata in this heterogeneous domain is a
challenge. We approached this as an iterative process. Initial search portal design was repeatedly
refined based on feedback from librarians and end users. Optimal indexing and search portal
design features vary to a significant extent with type of end-user. We undertook limited in-depth
testing of our aggregated metadata search service with one user population (middle and high
school student teachers) and identified several desirable interface features in that context.
Metadata Authoring: Metadata authoring practices, including decisions on how to map into DC,
which DC elements to use and how, which controlled or local vocabularies to implement, and
how deeply to describe resources, have an impact on both the discoverability and usefulness of
the metadata within an aggregated resource like ours. In a paper presented at the 2002 Museums
and the Web Conference ("Now That We’ve Found the ‘Hidden Web,’ What Can We Do With
It? The Illinois Open Archives Initiative Metadata Harvesting Experience"), we documented the
range of variations in the metadata we harvested. A more complete tabulation of the frequency
with which individual Dublin Core fields are used in each participating repository’s metadata,
and the frequency with which fields are repeated within records (on average), is discussed in a
project white paper entitled “Analysis of Dublin Core Field Use by Repository Harvested.” This
11
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
report was included as Appendix H of this project's interim report and is available on the project
Website <http://oai.grainger.uiuc.edu/projectinfo.htm>. DC element usage varied greatly, even
within communities that had similar cataloging traditions (e.g., from library to library). Many
libraries in particular provided metadata records that were much less detailed and informative
than the records these same institutions create when cataloging books or other traditional analog
information resources. This variability in depth of description and in use of DC elements can
create problems when searching and when determining how best to display metadata records.
Similar variations in controlled vocabularies used to describe resources makes consistent and
thorough collocation of like items difficult. We experimented with normalization techniques for
specific fields to minimize the impact of these variations on searching and presentation (see
following section) which partially (but only partially) offset some of these limitations.
In addition to issues that arose because differences in the ways communities of practices made
use of / failed to make use of certain DC elements, differences in exactly what was being
described by metadata records harvested also impacted on searchability of metadata aggregation.
As noted above, we harvested metadata describing both analog and digital resources. What to
describe when creating metadata records for resources that only exist in analog format or only
exist in digital format is relatively straightforward. Less clear cut is what to describe when
creating metadata for an analog resource that has also a digital representation – e.g., an artifact
for which a digital image exists. Practice is inconsistent in such cases. Some institutions provide
metadata records that describe the analog object and make only passing reference to the digital
surrogate (i.e., by providing a URL to the image of the artifact). Other institutions provide
metadata records that describe primarily the digital representations of the artifact and only briefly
mention the artifact (e.g., in the source field of the records). Figures 4 and 5 show excerpts from
metadata records provided by two different museums.
Description: Digital image of a single-sized cotton coverlet for a bed with
embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin.
Source: Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in.
Markings: top right hand corner has 1 ½ in. x ½ in. label cut outs at upper left and
right hand side for head board; fabric is woven in a variation of a rib weave; color
each of yellow and gray; hand-embroidered cotton butterflies and flowers from
two shades of each color of embroidery floss – blue, pink, green and purple and
single top 20 in. bordered with blue and black cotton embroidery thread; stitches
used for embroidery: running stitch, chain stitch, French knot and back stitches;
selvage edges left unfinished; lower edges turned under and finished with large
gray running stitches made with embroidery floss.
Format: Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5;
300 dpi; 21-53K bytes. Available via the World Wide Web.
Coverage: —
Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 200104-05; Created: 1912-1920?
Type: Image
Figure 4: Excerpt of record describing image of a quilt
12
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Description: Materials: Textile--Multi, Pigment—Dye; Manufacturing Process:
Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen
coverlet, worked in overshot weave in plain geometric variant of a checkerboard
pattern. Coverlet is constructed from finely spun, indigo-dyed wool and undyed
linen, woven with considerable skill. Although the pattern is simpler, the overall
craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99. This
coverlet is an example of early "overshot" weaving construction, probably dating
to the 1820's and is not attributable to any particular weaver. -- Georgette
Meredith, 10/9/1973
Source: —
Format: 228 x 169 x 1.2 cm (1,629 g)
Coverage: Euro-American; America, North; United States; Indiana? Illinois?
Date: Early 19th c. CE
Type: cultural; physical object; original
Figure 5: Excerpt of record describing a quilt for which digital image exists
Metadata Normalization: Normalization can prove an effective means to provide context
internally and to enhance discoverability of metadata records in a cross-collection repository;
however, it is hard to do in automated fashion for some metadata fields. After considering all the
DC elements, we investigated the potential to normalize the content of the Type, Date, Coverage,
and Format DC elements as most amenable to normalization. For our metadata aggregation we
found that normalization of Type and the temporal aspects of Date and Coverage was beneficial.
Format was not normalized, for reasons discussed in the white papers on normalization included
in Appendix D of this project's interim report.
To effectively normalize metadata, it was necessary to:






Understand how the element was interpreted by metadata providers and which elements
in other metadata formats were mapped to the Dublin Core element.
Identify which, if any, vocabularies were used by data providers for these fields.
Determine whether there was an existing controlled vocabulary that could be successfully
applied to all metadata providers, or, if not, create a vocabulary specific to our repository.
Apply normalized vocabulary to the metadata to augment the ‘native’ vocabulary.
Build mechanisms into the search interface that would take advantage of the
normalization.
Measure how well normalization improved the end user’s ability to discover resources.
These goals translate into a five-step process:
1. Extract and analyze the element values (content).
2. Determine how each element is interpreted and what controlled vocabulary, if any, is
used.
3. Determine focus and vocabulary for normalization.
4. Normalize the data.
5. Provide services in the portal based on the normalization process.
13
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Normalization white papers describing this process for Type element normalization and
Date/Coverage element normalization were included in Appendix D of the interim report
supplemental materials volume and are available on the project Website
<http://oai.grainger.uiuc.edu/projectinfo.htm>. The paper presented at the 2002 Museums and
the Web Conference ("Now That We’ve Found the ‘Hidden Web,’ What Can We Do With It?:
The Illinois Open Archives Initiative Metadata Harvesting Experience") also provides more
information about the normalization procedures we applied.
Currently, the repository contains metadata for which the Date and Coverage elements have been
normalized. The Type normalization process was determined to be redundant with the way we
now arrange our collection of aggregated metadata for searching and browsing. (After last portal
restructuring, indexes are based on type of material as analyzed top down by collection.) This
approach to grouping like materials together for browse and search (rather than requiring explicit
entry of a type search string) is both more time efficient and more global, as some data providers
do not use the Type element. The degree to which end users will find this a more useful way to
present our metadata aggregation needs further measurement.
Initial Search Portal Interface Design: During the creation and development of our portal
(http://nergal.grainger.uiuc.edu/search/), we focused on how to provide searching capability and
present aggregated and heterogeneous metadata in a useful way. We examined basic interface
features and usability issues that arise when constructing such a portal. Scenario-based design
techniques (from the work of J.M. Carroll and others) were used during the initial planning of the
portal’s interface. Scenario-based design is an iterative process that takes as its starting place an
analysis of the work that targeted end users will conduct using the system. Several possible
scenarios were constructed which focused on a variety of end users. These decisions impacted
both the preliminary display of the aggregated repository, as well as the search results display
options. Basic decisions included the renaming of Dublin Core elements to more commonly used
terms ("Creator" to "Author/Artist") and providing a Google-like simple search entry option.
Recognizing that the portal provides a gateway to selected resources owned by a large variety of
institutions, we were concerned with how to best present the metadata within the context of its
owning institution. We have attempted to allow end users to easily move from the metadata to
the owning institution or to the digital object itself (if available) by providing hyperlinks to
collections as well as an online access links in the metadata. In addition we have added an About
Collections page that describes each collection and links to contributing institution's Website.
We initially provided a list of data providers as part of the initial results list (on the left-hand side
of the screen), but found that this had two problems. The first was that the screen became
cluttered and confusing to the end user. We conducted a series of preliminary usability tests with
librarians at the University of Illinois and with interface design students in the Graduate School
of Information and Library Science (GSLIS). The second group emphasized in particular their
preference for a clean and uncluttered display.
The second interface problem was that users had difficulty interpreting what the list of
institutions meant. Users did not intuit that the repository aggregated resources from many
different institutions. As a result of these preliminary usability tests, we removed the list of
contributing data providers from the left-hand side and moved instead to an optional grouping
(pull down menu) of search results by types of resource described (e.g., text, image, audio,
archival, physical objects).
14
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Targeted Usability Testing: We followed up initial usability testing with librarians, library
school faculty, and library science students with more in-depth usability testing with a group of
23 college students training in an honors-level curriculum and instruction course to become
middle school and high school social studies teachers. We chose this group because (1) we did
not have the resources to identify and study working educators; (2) the professor and students of
this class were willing and in fact eager to participate; and (3) these users were already
comfortable using the Internet. During the fall of 2002, students were assigned to use the UIUC
Digital Gateway to Cultural Heritage Materials to find primary sources for a lesson plan on a
specific social sciences topic, and then to submit short papers about their experience.
For purposes of this test, we created a duplicate portal for use by these students and we provided
them with a unique URL. This enabled us to conduct a transaction log analysis after the test.
Before beginning the assignment, users were introduced to the concept of metadata aggregation
and were informed that the search portal would provide pointers to digital content held
elsewhere. They were also told that some records referred to analog resources. After the students
completed the assignment, we conducted focus group interviews. These interviews were taped,
transcribed, and coded. We also received copies of the students’ papers (with names removed);
however, because the papers reiterated comments made during focus groups, we did not code
them.
We found that, despite their prior introduction to the nature of the portal, in practice the test
group expected all records to point directly to corresponding digital objects. They reported
feelings of frustration in finding analog resources when they expected digital resources. This was
exacerbated by the large number of item-level records derived from EAD files that described
analog resources. Thus, a user who selected a result for “letters from a WWI soldier” might find
that the record referred to the holding institution’s finding aid instead of to the letters themselves.
Likewise, they reported a significant slowing of their efforts when the pointers (the URLs within
the record) went to a top-level or intermediate page, where they might have to resubmit their
request using the institution’s own search engine.
The lack of a ranking facility in our portal resulted in the test group feeling overwhelmed by the
quantity of unsorted results. Because of the lack of consistent metadata caused by variations in
controlled vocabularies and disparities in the use of DC, we had enabled greater recall by
designing the default search screen as a keyword search on all elements. This exacerbated the
lack of a ranking facility. In an attempt to address these known limitations we provided an
advanced search screen, which included standard methods for refining a search, such as
restricting searches to specific groups of fields and setting limits. However, the test group seldom
used the advanced search tools, and the few users who did attempt to refine their searches were
unfamiliar with the types of entries required by metadata fields like “Format.” This suggests that
a robust ranking facility is of great importance.
We also found that the test group accorded equal credibility to all contributing collections. They
reported that they made no decisions about which items to examine based on the name of the
holding institutions. Feelings of frustration around failed searches were directed at the search
portal rather than at individual institutions. Thus, users held the portal responsible for the
usability of its aggregated metadata, even when that metadata originated elsewhere and remained
outside the control of the Illinois project.
As a result of these results, we eliminated EAD records from the UIUC Gateway to Cultural
Heritage Resources. We are currently investigating creating a portal that is specific to EAD
15
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
finding aids that will enable further research into the use of EAD with OAI-PMH. In addition,
the tests led to several changes in the interface. We combined the simple and advanced search
screens, improved labeling, and combined several resource-type categories into a simpler set of
options (see Figure 6).
Figure 6 – Revised search screen
The single Online Access Available link in search results was replaced by two, more specificallyworded links. (1) View Item was applied to resources that are directly viewable online from the
search result. (2) Learn more about this item was applied to results that would lead the user to a
collection’s web site or to descriptive information about the resource. We also attempted to
clarify for users which resources offered direct online access and which did not (see Figure 7).
Unfortunately, time did not allow us to conduct a second assessment of the portal’s usefulness
once these changes were made. However, we continue to work with OAI-enabled aggregations
of metadata, and we expect that future research will build on the baseline work done here.
16
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Figure 7 - Revised wording in search results
Implications for Best Practices: Based on our experience, we feel able to make some
observations about what would constitute best practices for both data providers and harvester
services.
Quality and consistency of metadata provided by metadata providers is key. We suggest that
providers tend carefully to the task of assigning metadata to resources. Community metadata
standards and controlled vocabularies should be adhered to. In addition, metadata providers need
to have a clear understanding of the purpose of the metadata so that it can be valuable to both the
local user community and to a wider audience.
We also suggest that metadata providers utilize the option to divide their metadata into sets, as
provided by the OAI-PMH. While there are any number of logical sets, we have found the most
useful to be by subject area, sub-collection, or type of material. Metadata aggregators may also
want to use sets to indicate the institutions included within their collection. However, this
division of the collection may not be as useful for the end user.
OAI harvesting services can be used for resources intended to be shared among narrowly defined
community groups. They can also enable a more broad-reaching portal that serves a variety of
users. In either case, strategies for scheduling regular harvests and indexing of metadata need to
be established. These strategies should take into consideration the frequency of updates made to
the metadata provider sites as well as the amount of post-harvesting processing the data will
17
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
require. While OAI minimizes the need for communications between metadata provider and
harvesting service, discussions of scheduling issues with providers is desirable.
A clear and obvious finding of our work is that, while the OAI-PMH itself is readily
implemented, the challenges posed by large amounts of heterogeneous metadata are significant.
Certainly the application of more sophisticated pre-processing tools as well as robust, scalable
search tools and ranking of results would make the portal a more effective tool for users. Other
options include the development of thematic exhibits (based on human and/or machine analysis
of metadata) that would offer glimpses into the range and type of materials available, and
offering users the ability to annotate individual records to highlight particularly useful resources.
Providing a quick-browse feature to give users a preview of what is — and is not — available in
the portal would make it more useful to educators. In general, the interface challenge extends to
any tools that help adjust user expectations.
The inclusion of EAD finding aids and their decomposed item-level records was an obstacle for
these users. They did not understand why the records were included and were confused by
opaque labels, such as “Box 23.” Several members of the test group commented that finding aids
may be useful for researchers or scholars but not for educators. As a result of the test, we
eliminated EAD records from the UIUC Gateway to Cultural Heritage Resources. We are
currently investigating creating a portal that is specific to EAD finding aids that will enable
further research into the use of EAD with OAI-PMH.
D. Sustainability
Research conducted during the course of this project has informed multiple follow-on activities
at Illinois. Two aspects of sustainability are considered here: i) maintenance and continued
development of the prototype portal for searching aggregated metadata describing cultural
heritage resources; and ii) identification and exploration of additional, more in-depth research
relating to aspects of OAI-PMH.
Continuation of prototype search portal: Consistent with results from several other OAI-PMH
projects, our research confirmed that an OAI-PMH based harvesting service is relatively easy
and inexpensive to implement. Metadata provider services are within the means of many if not
most libraries and museums to implement as an adjunct to the Web services they already offer.
As more and more OAI-PMH freeware becomes available and as the protocol is integrated into
commercial products, basic metadata harvesting services also will be increasingly within the
means of many academic library to implement and maintain. As this project concluded we made
changes in our prototype portal interface design, scope, coverage, and contents to facilitate
sustainability. Most notably, we decided to focus attention for the version of the portal that will
be continued on resources and a search interface design appropriate for supporting curricular
needs (middle school through undergraduate). Also, the prototype portal developed as part of this
project now focuses exclusively on metadata describing readily available digital cultural heritage
resources and aggregates only metadata harvestable via OAI-PMH. This reduced total number of
metadata providers to 25 and total number of metadata records indexed to approximately
412,000 as of June 2003.
Incremental harvesting of provider services is done every three weeks, and full harvesting of
each provider services is done quarterly. Scripts have been written to make routine the steps of
harvesting, limited refinement of harvested metadata (e.g., date normalization), and metadata
indexing. The Grainger Engineering Library Information Center has dedicated server space and
18
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
software and ongoing resources equivalent to approximately 0.25 FTE for staff who will monitor
harvesting processes (resolving issues as they arise and adding additional provider services that
come on line as able).
This maintained, live testbed of aggregated metadata, regularly refreshed, will be available to
support ongoing research into metadata and metadata harvesting services. For instance, metadata
from this harvesting service will be among the resources used by faculty at the Graduate School
of Library and Information Science at Illinois to study to what extent and in what ways do
metadata providers modify their metadata records over time. Though not yet containing a
sufficient critical mass of metadata content to qualify as a major search utility for the casual
searcher, our portal will be a resource for those interested in more comprehensively identifying
online cultural heritage resources. As additional content repositories come online as OAI-PMH
metadata providers, the portals importance in this regard will increase.
Related new research: Several internally and externally funded follow-on research activities
have been undertaken at Illinois or are planned for the near future.
We continue to study the integration of EAD and OAI. We are maintaining and doing further
research to upgrade and extend the stylesheets developed for this project to transform EAD
finding aids into multiple item-level records more suitable for use with OAI-PMH. We are also
looking at how proposed changes in EAD authoring practice and in the EAD standard itself
might affect ability to integrate EAD metadata with item-level metadata records such as those
harvested using OAI-PMH. We will continue to provide feedback to the archival community
regarding these issues. This work is centered within our University Archives unit, which also has
ongoing research efforts looking at, among other things, the potential uses of OAI-PMH in the
context of statewide and national information resource archiving and aggregating initiatives.
We have migrated technologies and operational lessons from our Mellon-funded research into
other subject domains of interest. Using the harvesting tools we developed to harvest metadata
describing cultural heritage resources, the Grainger Engineering Library Information Center has
created and is maintaining an OAI-PMH based portal for searching aggregated metadata
describing physical science and engineering academic and research information resources. This
project has provided an opportunity to experiment with Microsoft SQL server as an alternative
tool for searching metadata harvested using OAI-PMH. (See
http://g118.grainger.uiuc.edu/engroai/)
Later this summer we will begin a three-year collaboration with the research libraries of 9 other
Committee on Institutional Cooperation (CIC) institutions to study the potential of OAI-PMH to
facilitate resource sharing within an academic consortium like CIC. The University of Illinois
Library at Urbana-Champaign will host the harvest and search and discovery services for this
project. This work will allow us to explore aspects of using OAI-PMH in a consortial context
and will offer a way to explore reinventing the CIC's Virtual Electronic Library in order to better
unlock the hidden Web of resources that are available at CIC institutions.
19
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Among other things, this collaboration will allow the consortium to:

explore ways to improve access to selected resources at CIC member libraries;

develop methods to better advertise these resources to end-users both within and external to
the consortium;

test approaches for using OAI-PMH with licensed and restricted access content and
metadata;

prepare member institutions for future grant-mandated OAI-based resource sharing; and,

create a unique metadata testbed and aggregation, useful for a range of more in-depth,
fundamental metadata and OAI-PMH research, funded both internally and externally.
From October 2002 the University of Illinois Library at Urbana-Champaign has also undertaken
to build a collection registry and OAI-PMH-based item-level metadata repository describing
digital collections created under the auspices of the Institute of Museum and Library Services
National Leadership Grant (NLG) program since its inception in 1998. Given the range of
content created under the NLG program (though still heavily biased towards the domain of
cultural heritage), this project will allow us to greatly expand the scope of our OAI-PMH
research and test preliminary models developed in the course of our Mellon-funded project. (See
http://imlsdcc.grainger.uiuc.edu/)
These and other projects (e.g., our 2nd Generation Digital Mathematics Resources project being
conducted under the auspices of the National Science Digital Library program, see
http://nsdl.grainger.uiuc.edu/) suggest that OAI-PMH has reached a maturity such that it is
taking its place as one among several essential tools with the potential to enhance discoverability
of digital content. The challenge has moved from focused proof-of-concept to one of learning
how best to integrate and use OAI-PMH in conjunction with other tools and protocols – e.g.,
advanced search architectures, data warehousing models, data mining tools. Considerable work
remains to be done in this regard, but a foundation of practice and success has now been laid and
the indications are that OAI-PMH will in fact play an important role in future digital library
interoperability implementations.
20
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Accomplishments and Activities Listed by Month
July 2001
 Project Start
 Project Research Programmer hired (YuPing Tseng)
 Work begins on OAI Harvesting tools
 Work begins on updating of OAI Provider tools developed during Alpha Test
August 2001
 Project Website established (http://oai.grainger.uiuc.edu/)
 Meeting in Ann Arbor with Michigan Project Team
 Test harvesting of Illinois & selected OAI-Registered Provider sites
 Project Research Assistant hired (Sarah Shreeves 50%)
September 2001
 Project Coordinator hired (Joanne Kaczmarek)
 Michigan’s DLXS /XPAT indexing software installed
 Initial interface for searching harvested metadata
(http://oai.grainger.uiuc.edu/cgi/b/bib/bib-idx) available to project team.
 Updated OAI Provider Tools made publicly available
 OAI Workshop at ACM SIGIR
(http://bolder.grainger.uiuc.edu/AMSIGIR2001/UIUC_OAIExperiences_files/frame.htm)
 UIUC and Michigan Host OAI Provider workshop
 Meeting of Steering Committee for Illinois & Michigan projects
October 2001
 Letter to CIC Library Directors from Paula Kaufman & Bill Gosling
 Began setting up surrogate OAI Provider Sites (see narrative)
 Began acquiring representative EAD finding aids from sites nationwide
 Created XSLT stylesheet (simplistic version) to transform EAD to DC
 Began production harvesting of OAI-registered & surrogate Providers
 Test harvest of relevant sites registered with http://www.openarchives.org.
November 2001
 Alpha version of Harvester tools made available to Michigan
 Preliminary mock-ups of public search interface screens developed
 Preliminary Harvester Tools made available to Michigan
 Project update & demonstration at DLF Forum
(http://oai.grainger.uiuc.edu/Events/IllinoisOAIMetadataHarvestingService.ppt)
 Began analysis of advanced EAD to Dublin Core transformations
December 2001
 Enhancements to Harvester tools for history and job tracking
 Updated releases of Data Provider tools (http://oai.grainger.uiuc.edu/ProviderTools/)
 Paper accepted for Museums and the Web 2002 conference
21
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
January 2002
 Released white paper describing harvester architecture:
http://oai.grainger.uiuc.edu/Papers/Harvester_Architecture/Harvester_Architecture.htm.
 Released simple search interface for the UIUC Cultural Heritage Repository:
http://oai.grainger.uiuc.edu/search.
 Developed UIUC normalization vocabulary for Dublin Core (DC) Type element
http://oai.grainger.uiuc.edu/projectinfo.htm.
 Developed usability test for repository interface.
 Converted 900,000 Illinois State Library MARC records to DC.
February 2002
 Provided feedback on design of Michigan online survey directed at end users:
http://oaister.umdl.umich.edu/surveyreport.html.
 Created stylesheet to convert Harvard VIA record format to DC.
 Began usability testing for the repository interface.
 Released advanced search interface for the repository.
 Ported VisualBasic OAI harvester to Java and delivered to Michigan.
March 2002
 Added normalized DC Type element content to harvested records.
 Applied changes to end user interface as suggested by the usability test results.
 Updated Java harvester tools and delivered to Michigan.
 Presented as expert practitioner at the OCLC "Steering by Standards" videoconference on
OAI: http://oai.grainger.uiuc.edu/Events/Kaczmarek_OAI_OCLC.ppt
 Presented to Information Systems Research Lab of the Graduate School of Library and
Information Science (GSLIS) and the OAI Sheet Music Planning meeting:
http://dli.grainger.uiuc.edu/publications/twcole/OAISheetMusic/Cole_OAITools.ppt.
April 2002
 Project highlighted in the In Brief Column of D-Lib Magazine:
http://www.dlib.org/dlib/april02/04inbrief.html#KACZMAREK.
 Presented to the CNI Spring Task Force Meeting and to the CIC Library Tech Directors:
http://oai.grainger.uiuc.edu/oaicnibeth.ppt.
 Presented paper to the Museums and the Web Conference 2002. Paper is published in the
print proceedings. Presentation: http://oai.grainger.uiuc.edu/MWOAIblue.ppt. Paper:
http://www.archimuse.com/mw2002/papers/cole/cole.html.
 Repository interface design reviewed by a MLS class in Interfaces to Information
Systems at GSLIS. Interface updated based on suggestions.
 Finalized Open Source license for UIUC.
 Released OAI 1.0 Harvester on SourceForge: http://www.sourceforge.net/projects/uiliboai/.
 Developed alpha version of OAI 2.0 Metadata Provider tools for both VisualBasic and
Java.
 Implemented OAI 2.0 alpha Metadata Provider service for Illinois (for alpha testing by
harvesting services).
22
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003


Developed EAD stylesheet for viewing top level and item level records in context of the
full finding aid.
Modified Java harvester tools to handle redirect responses.
May 2002
 Presented project overview at DLF Spring Forum.
 Developed alpha version 2.0 of OAI Harvester tools for both VisualBasic and Java.
 Provided initial data dump for data mining research to NCSA.
 Added annotation box to end-user interface.
June 2002
 Presented at ALA Etext Discussion Group in Atlanta, GA:
http://oai.grainger.uiuc.edu/ALA.ppt.
 Released OAI 2.0 of Metadata Provider and Harvester tools on SourceForge.
 Released Z39.50 OAI Metadata Provider tool.
 Developed and applied Date and Coverage normalization scripts as a post-harvesting
process.
 Developed "exhibits" search interface model.
 Begin scalability testing.
 Add item level records from the EAD files to the repository. As a result the number of
records in the repository is over three times what it previously was.
July 2002
 Presented at JCDL in Portland, Oregon:
http://dli.grainger.uiuc.edu/publications/jcdl2002/p14-prom_files/frame.htm
 Presented at two Illinois Digitization Institute Workshop at Grainger Library at UIUC.
 Delivered the one-year report to the Mellon Foundation.
August 2002
 Met with staff at NCSA to discuss preliminary results of data mining and to gain better
understanding of tools.
 NCSA data mining tools delivered to the Illinois project.
 Presented at the Society for American Archivists in Birmingham, AL:
 Met with Professor Brenda Trofanenko of the Education Department at UIUC to plan
work with her Social Studies Education class.
 Met with Tom Peters of the CIC to plan a half-day conference on the Illinois and
Michigan OAI projects.
 Sarah Shreeves, former graduate research assistant for project, joined as visiting project
coordinator until October 2002.
 Christine Kirkham joins project as graduate research assistant
September 2002
 Presented an overview of search portal to students in honors-level Curriculum &
Instruction course.
 Suspended ongoing harvesting in order to assure static testbed during testing.
23
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003

Curriculum and Instruction students began testing usability of portal and utility of
aggregated metadata.
October 2002
 With researchers from the University of Michigan’s OAIster project, presented
preliminary findings to members of the Committee on Institutional Cooperation (CIC)
Digital Library Initiatives Overview Committee. (See Appendix E)
 Presented XML tutorial to the Colorado Digitization Program and Colorado Alliance of
Research Libraries at University of Denver:
http://oai.grainger.uiuc.edu/Western_Trails.ppt.
 Conducted focus groups with Curriculum and Instruction students.
 Project Coordinator Sarah Shreeves left project.
November 2002
 As part of project sustainability effort, submitted proposal for OAI-based CIC metadata
harvesting, aggregation, and search service to CIC Library Directors
 Per usability tests, fixed bugs and identified interface changes for search portal.
December 2002
 Transcribed tapes of usability focus groups and analyzed transaction logs. Implemented
new OAI search portal: http://nergal.grainger.uiuc.edu/search/
 Submitted two articles on OAI PMH to Library Hi Tech.
 Demonstrated redesigned search portal to Curriculum and Instruction students
January 2003
 Attended the Open Forum on Metadata Registries, Santa Fe, NM.
 Coded transcriptions of focus groups and read student papers for usability testing.
February 2003
 Reinstated harvesting and identified collections for continued harvesting after May 31.
 Submitted short paper, “Utility of an OAI Service Provider Search Portal,” to 2003 Joint
Conference on Digital Libraries Conference (JCDL). (See Appendix B for published
paper.)
March 2003
 Continued to make improvements to OAI search portal.
April 2003

Attended the 5th Annual Illinois State GILS Conference, Lisle, IL.
May 2003
 Presented tutorial, “Introduction to the Open Archives Initiative Protocol for Metadata
Harvesting” at the 2003 JCDL in Houston, TX. (See Appendix F.)
 Presented paper, “Utility of an OAI Service Provider Search Portal,” at 2003 JCDL:
http://dli.grainger.uiuc.edu/Publications/JCDL2003/ShreevesJCDL.ppt.
24
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Proposal Objectives Accomplished
Our proposal narrative as submitted the Andrew W. Mellon foundation in May 2001 (see
Appendix A in the supplemental materials volume) laid out several project objectives. Below is a
brief summary highlighting the work done to complete each objective.
Objective: Construct and implement a standalone metadata middleware application
("spider") for harvesting metadata using the OAI protocols.

Harvesters have been implemented at University of Illinois and University of Michigan.
Objective: Demonstrate viability of search and retrieval of metadata harvested using OAI.

We’ve successfully demonstrated the technical ability to search and retrieve metadata
harvested using OAI. The heterogeneity of metadata records harvested impacted adversely on
the utility of search portal. Efforts to mitigate adverse impact of this heterogeneity on utility
of search and discovery services achieved only mixed results. We had success normalizing
descriptive attributes such as date and resource content type, but were generally unable to
normalize or collate described resources by subject or topic.
Objective: Investigate feasibility of implementing a variety of basic and advanced search
interface & value-added indexing features in the context of OAI metadata harvesting.

Our repository presents both basic and advanced search functions. XPat, the index and search
engine system for the repository, allows truncation, Boolean, and exact-phrase searching. In
addition users can limit searches by specific date ranges or by type of material. Attempts to
use automated subject analysis tools developed for full-text records (e.g., NCSA's
Themeweaver tool) were made difficult by the sparseness of metadata records.
Objective: Investigate effective methods of presenting harvested metadata records and the
linkages from those records to full-content and related external information resources.

Clear linkages are provided from metadata records to both collection-level and, when
available, item-level information online. Transformations developed for this project allow
end users to effectively view metadata describing components contained in EAD finding aids
in the context of parent finding aids.
Objective: Identify critical concerns and issues that arise when using OAI to reveal items
that are part of scholarly manuscript archives and digitized collections of cultural heritage
information.

Through our work with a number of different cultural heritage communities we have been
able to identify some key issues. The quality and consistency of metadata and the adherence
to community metadata and vocabulary standards (whether museum, archive, library, etc.) is
key to facilitating the creation of a useful aggregated database. Strategies to map other
metadata schemas to the requisite Dublin Core are essential to cleanly and accurately present
25
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
metadata from a variety of communities. Issues remain regarding how best to insure
presentation of metadata records in accord with providers’ wishes and how best to deal with
providers’ rights and permissions.
Objective: Define areas where "best practice" covenants and conventions could beneficially
supplement OAI protocols.

As outlined above the use of standards in metadata authoring can only benefit
interoperability efforts. In addition we have found that the use of sets – particularly subject
sets - within the OAI protocol aided in our indexing efforts.
Objective: Document usage patterns and benefits of OAI metadata harvesting approach
through evaluation, interactions with end-users, and analysis of detailed transaction logs.

Conducted usability and utility study with targeted group of end users through focus groups
and transaction log analysis. See Appendix D for full discussion.
26
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Metadata Providers Organized by Type of Institution
through March 2003
1,126,789 records (excluding DC records derived from EADs) 39 metadata providers (both OAIcompliant and not)
Museums
American Museum of Natural History
 2004 records
 Photographs
Consortium of Museum Intelligence (CIMI) Demonstration Repository
 197,233 records from 479 institutions (museums and historical societies)
 Artifacts, paintings
 OAI compliant data provider
Spurlock Museum (University of Illinois at Urbana-Champaign)
 46,612 records
 artifacts, images
Academic Libraries
Bentley Historical Library (University of Michigan)
 412 EAD collection level records
 187,977 EAD collection and item level records
 EAD finding aids
Cornell University Library Rare & Manuscript Collections
 88 EAD collection level records
 34,341 EAD collection and item level records
 EAD finding aids
Harry Ransom Humanities Center (University of Texas)
 91 EAD collection level records
 13,574 EAD collection and item level records
 EAD finding aids
Harvard University Libraries
 150,421 records
 418 EAD collection level records
o 190,540 EAD collection and item level records
 150,003 Visual Information Access records
 EAD finding aids, paintings, photographs, slides, and other images
Indiana University Digital Library Project
27
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003


938 records
Photographs
Iowa Women's Archives (University of Iowa)
 118 EAD collection level records
 3,695 EAD collection and item level records
 EAD finding aids
Michigan State University Libraries
 934 EAD collection level records
 8,075 EAD collection and item level records
 EAD finding aids
Northwestern University Library
 3,458 records
 Posters from various collections
Pennsylvania State University Libraries
 40 records
 Top level information from finding aids
University of Chicago Library
 100 EAD collection level records
 27,047 EAD collection and item level records
 EAD finding aids
University of Illinois Library
 97,927 records
 95,712 records - sheet music
 1,141 records - Teaching with Digital Content
 1,025 records - aerial photographs
 49 EAD collection level records
o 27,564 EAD collection and item level records
 Sheet music collection, photographs, artifacts, EAD finding aids
 OAI compliant data provider
University of Michigan Digital Library Text Collections
 114,212 records
 Text
 OAI compliant data provider
University of Minnesota Libraries
 14,409 records
 14,307 records - IMAGES database
 102 EAD collection level records
o 10,437 EAD collection and item level records
28
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003


Paintings, photographs, and other images, EAD finding aids
OAI compliant data provider - IMAGES collection only
University of Tennessee Special Collections
 379 records
 Photographs, text, finding aids (not in EAD)
 OAI compliant data provider
University of Wisconsin-Madison Library
 4,244 records
 Photographs, text, EAD finding aids
 OAI compliant data provider
Washington State University
 1,600 records
 Photographs
Cultural and Historical Societies
American Numismatic Society
 2,799 records
 Artifacts - coins
 OAI compliant data provider
American Philosophical Society
 6576 records
 Text (letters, monographs, etc.)
 OAI compliant data provider
Minnesota Historical Society
 487 EAD collection level records
 100,079 EAD collection and item level records
 EAD finding aids
Ohio Historical Society
 768 records
 Photographs from the OhioPix collection
Public Libraries
Illinois Library System Records provided by the Illinois State Library
 321,028 records (filtered from 1.1 million metadata records)
 252,271 records from the Alliance Library System
 68,757 records from the Lincoln Trails Library System
 Monographs
29
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Tacoma Public Library Photograph Collection
 24,200 records
 Images (photograph collection of Pacific Northwest history)
Digital Collections
Ackerman Archives
 151 records
 Letters, photographs from a personal archive
 OAI compliant data provider
AIM25 - Archives in London and the M25 Area
 4474 records
 Collection level descriptions of archival collections
 OAI compliant data provider
Celebration of Women Writers
 189 records
 Texts
 OAI compliant data provider
Colorado Digitization Project
 12,631 records from 17 different institutions
 Auraria Library
 Canon City Public
Library
 Colorado College, Tutt
Library
 Colorado State Archives
 Colorado Historical
Society
 Colorado Springs
Pioneers Museum
 Crow Canyon
Archaeological Center
 Colorado State
University Libraries
 Denver Museum of
Nature and Science









Lafayette Public Library
and Lafayette Miners
Museum
Larimer County
Digitization Project
Pikes Peak Library
District
Pueblo City-County
Library District
Pueblo Weisbrod
Aircraft Museum
University of Colorado,
Boulder
University of Denver
University of Northern
Colorado
Photographs
David Rumsey Map Collection
30
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003



6375 records
Maps
OAI compliant data provider
Formations
 23 records
 Articles
 OAI compliant data provider
ibiblio


262 records
Descriptions of web resources
Illinois Alive!
 186 records
 Descriptions of web resources
Library of Congress American Memory Project
 103,077 records
 Photographs, images, text
 OAI compliant data provider
National Library of Australia
 235 records
 Descriptions of online collections
Online Archive of California
 5,931 EAD collection level records from 47 different institutions
 2,265,692 EAD collection and item level records








Berkeley Art
Museum/Pacific Film
Archive
California Historical
Society
California Institute of
Technology
California State
Archives
California State Library
California State Railroad
Museum Library
Cal State Chico
Cal State Dominguez
Hills













Humboldt State University
Huntington Library
Japanese American National Museum
Labor Archives and Research Center
Mills College
NASA Ames History
Oakland Museum of California
Pasadena Historical Museum
Sacramento Archives and Museum
Collection Center
San Diego State Univ.
San Francisco Maritime National
Historical Park
San Francisco Public Library
San Joaquin County Historical Society
31
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003












Cal State Fresno
Center for Mennonite
Brethren Studies
Fowler Museum of
Cultural History
Fresno City and County
Historical Society
Gay, Lesbian, Bisexual,
Transgender Historical
Society
Getty Research Institute
Graduate Theological
Union
Grunwald Center for
Graphic Arts
Historical Sites Society
of Arcata
Holocaust Center of No.
California
Hoover Institution
















and Museum
Santa Clara Univ.
Sonoma State Univ.
Southern California Library for Social
Studies
Unemployment Insurance Division
Library
UC Berkeley
UC Davis
UC Irvine
UCLA
UC Riverside
UC Riverside Museum of Photography
UC San Diego
UCSF
UC Santa Barbara
UC Santa Cruz
USC
Western Jewish History Center
EAD finding aids
Open Video Project
 1,654 records
 Moving images from several special collections
 OAI compliant data provider
Perseus Digital Library
 1,407 records
 Text
 OAI compliant data provider
Schoenberg Center for Electronic Text/Image (University of Pennsylvania)
 54 records
 Text
 OAI compliant data provider
32
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Metadata Providers Organized by Type of Institution
April 2003 and forward
413,563 records from 25 OAI-compliant metadata providers
Academic Libraries
Auburn University Digital Library
 14 records
 Archival finding aids (collection level)
Indiana University Digital Library Project
 21,271 records
 Images, sheet music
Michigan State University Libraries
 1,409 records
 Text
University of Illinois Library
 99,130 records
 Sheet music collection, images, artifacts, archival finding aids (collection level)
University of Michigan Digital Library Text Collections
 112,665 records
 Text, images
University of Minnesota Libraries
 1786 records
 Paintings, photographs, and other images
University of North Carolina at Chapel Hill Manuscripts
 4791 records
 Archival finding aids (collection level)
University of Tennessee Special Collections
 2148 records
 Images, text, archival finding aids (collection level)
University of Wisconsin at Madison Library
 4151 records
 Images
Cultural and Historical Societies
American Numismatic Society
33
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003


3127 records
Artifacts – coins, text
Indiana Historical Society
 961 records
 Images and artifacts (ephemera)
Digital Collections
Ackerman Archives
 151 records
 Text, images from a personal archive
AIM25 - Archives in London and the M25 Area
 5701 records
 Collection level descriptions of archival collections
Alex Catalogue of Electronic Texts
 677 records
 Text
Celebration of Women Writers
 416 records
 Text
Heritage Colorado
 18,813 records from 17 institutions (academic and public libraries, historical
societies)
 Images
Documenting the American South
 3161 records
 Images, text
David Rumsey Map Collection
 2820 records
 Maps
Formations
 23 records
 Text
ibiblio

418 records
34
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003

Descriptions of web resources
Library of Congress American Memory Project
 126,045 records
 Images, text
Mundus: Gateway to Missionary Collections in the UK
 447 records
 Archival finding aids (collection level)
Open Video Project
 1938 records
 Moving images
Perseus Digital Library
 1,446 records
 Text
Schoenberg Center for Electronic Text/Image (University of Pennsylvania)
 54 records
 Text
35
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Bibliography of Publications & Selected Presentations
A. Publications
Cole, T.W., Kaczmarek, J., Marty, P.F., Prom, C.J., Sandore, B. and Shreeves, S.L. 2002. Now
that we've found the ‘hidden web’ what can we do with it? The Illinois Open Archives Initiative
Metadata Harvesting experience. In D. Bearman and J. Trant (eds.), Museums and the Web 2002:
selected papers from an international conference. Pittsburgh, PA: Archives & Museum
Informatics, 63-72. Available: http://www.archimuse.com/mw2002/papers/cole/cole.html
(accessed, 20 July 2003).
Prom, C. J. and Habing, T. G. 2002. Using the Open Archives Initiative Protocols with EAD. In
G. Marchionini & W. Hersch (eds.), JCDL 2002: Proceedings of the Second ACM/IEEE-CS
Joint Conference on Digital Libraries, July 14-18, 2002. New York: Association for Computing
Machinery, 171-180. Available: http://dli.grainger.uiuc.edu/publications/jcdl2002/p14-prom.pdf
(accessed, 20 July 2003).
Shreeves, Sarah L., Kirkham, Christine, Kaczmarek, Joanne, and Cole, Timothy W. 2003. Utility
of an OAI service provider search portal. In Catherine C. Marshall, Geneva Henry, and Lois
Delcambre (eds.), Proceedings of the 2003 Joint Conference on Digital Libraries May 27-31,
2003. Los Alamitos, CA: Institute of Electrical and Electronics Engineers, Inc., 306-308.
Cole, Timothy W. 2003. OAI: Innovations in the Sharing of Scholarly Information. Library Hi
Tech 21, no. 2: 115-117.
Prom, Christopher J. 2003. Reengineering archival access through the OAI protocols. Library Hi
Tech 21, no. 2: 199-209.
Shreeves, Sarah L., Kaczmarek, Joanne S., and Cole, Timothy W. 2003. Harvesting cultural
heritage metadata using the OAI protocol. Library Hi Tech 21, no. 2: 159-169.
Prom, Christopher J. Forthcoming. Does EAD play well with other metadata standards?
Searching and retrieving EAD using the OAI protocols. Journal of Archival Organization 1, no.
3: 51-72.
Cole, Timothy W. and Shreeves, Sarah L. Forthcoming. Lessons learned from the Illinois OAI
Metadata Harvesting Project. In Diane Hillman and Elaine Westbrooks (eds.), Metadata in
Practice. Chicago, IL: ALA Editions.
B. Presentations
Cole, T.W. and Habing, T.G. September 13, 2001. Experiences Implementing OAI Provider
Services. ACM SIGIR: Open Archives: Community, Interoperability, and Services, New
Orleans, LA.
Cole, T.W. November 18, 2001. University of Illinois OAI Metadata Harvesting Service. Digital
Library Federation Fall Forum, Pittsburgh, PA.
36
A Scholarly Information Portal using OAI, University of Illinois - Final Report, July 2003
Kaczmarek, J. March 26, 2002. Can OAI Enhance the Quality of Everyday Life? Steering by
Standards: OCLC Videoconference: "A New Harvest: Revealing Hidden Resources with the
Open Archives Metadata Harvesting Protocol," Dublin, OH.
Cole, T.W. March 28, 2002. OAI Tools and OAI Protocol Version 2. OAI Standards for Sheet
Music - Planning Meeting, Bloomington, IN.
Sandore, B., Cole, T.W., Kaczmarek, J., Mischo, B., Prom, C.J., and Habing, T.G. April 16,
2002. Developing a Domain Specific Metadata Search & Retrieval System Using OAI-PMH.
CNI Spring 2002 Task Force Meeting, Washington, D.C.
Cole, T.W., Kaczmarek, J., Sandore, B., Marty, P., Prom, C.J., and Shreeves, S.L. April 18,
2002. Now That We've Found the 'Hidden Web' What Can We Do With It?: The Illinois Open
Archives Initiative Metadata Harvesting Experience. Museums and the Web 2002. Boston, MA.
Kaczmarek, J. June 15, 2002. University of Illinois Experiences Using OAI Protocol for
Metadata Harvesting. E-Text Discussion Group, American Library Association Annual Meeting.
Atlanta, GA.
Prom, C.J. and Habing, T.G. July 16, 2002. Using the Open Archives Initiatives Protocols with
EAD. 2nd Annual Joint Conference on Digital Libraries, Portland, OR.
Prom, C.J. August 22, 2002. Does EAD Play Well with Other Metadata Standards? Searching
and Retrieving EAD Using the OAI Protocols. Society of American Archivists Annual
Conference. Birmingham, AL.
Cole, T.W. April 9, 2003. OAI: What it is & what it could mean for GILS projects. 5th Annual
States Government Information Locator Service Conference. Lisle, IL.
Shreeves, S.L. May 15, 2003. Green flags and yellow flags: opportunities and challenges of
implementing OAI services. Ohio Valley Group of Technical Services Librarians Conference,
Terre Haute, IN. (Invited Presentation)
Kirkham, T. and S.L. Shreeves. May 29, 2003. The utility of an OAI service provider search
portal. 3rd Joint Conference on Digital Libraries, Houston, TX.
Cole, T.W. June 22, 2003. Using OAI-PMH to aggregate metadata describing cultural heritage
resources. Combined American Library Association / Canadian Library Association Annual
Conference, Toronto, Canada.
37
Download