Long-term Digital Metadata Curation Arif Shaon Abstract

advertisement
Long-term Digital Metadata Curation
Arif Shaon
The Centre for Advanced Computing and Emerging Technologies (ACET)
The University of Reading, Reading, UK
Abstract
The rapid increase in data volume and data availability along with the need for continual quality
assured searching and indexing information of such data requires efficient and effective metadata
management strategies. From this perspective, the necessity for adequate, well-managed and high
quality Metadata is becoming increasingly essential for successful long-term high quality data
preservation. Metadata's assistance in reconstruction or accessibility of preserved data, however,
bears the same predicament as that of the efficient use of digital information over time: long-term
metadata quality and integrity assurance notwithstanding the rapid evolvements of metadata
formats and related technology. Therefore, in order to ascertain the overall quality and integrity of
metadata over a sustained period of time, thereby assisting in successful long-term digital
preservation, effective long-term metadata curation is indispensable. This paper presents an
approach to long-term metadata curation, which involves a provisional specification of the core
requirements of long-term metadata curation. In addition, the paper introduces “Metadata Curation
Record”, expressed in the form of an XML Schema, which essentially captures additional
statements about both data objects and associated metadata to aid long-term digital curation. The
paper also presents a metadata schema mapping/migration tool, which may be regarded as a
fundamental stride towards an efficient and curation-aware metadata migration strategy.
1. Introduction
Owing to an exponential increase in computing power
and communication bandwidth, the past decade has
witnessed a spectacular growth in volume of generated
scientific data. Major contributors to this phenomenal
data deluge include increasingly new avenues of
research and experiments facilitated by the virtue of eScience, which enables increasingly global interdisciplinary collaborations of people and of shared
resources needed to solve the new problems of
science, engineering and humanities. In fact, eScience data generated thus far from different areas,
such as sensors, satellites, high-performance computer
simulations and so on has been measured in the excess
of terabytes every year and is expected to inflate
significantly over the next decade. Several Terabytes
of scientific data, generated from different scientific
research and experiments conducted over the past 20
years and hosted by the Atlas Datastore of CCLRC’s
e-Science centre1 provide an ideal of example of this
data deluge.
This increasingly large volume of scientific data needs
to be maintained (i.e. preserved) and highly available
(i.e. published) over substantially long-periods of time
1
E-Science Centre, CCLRC - http://www.escience.clrc.ac.uk/web
in order to serve it to the future generations of
scientists and researchers. This will, amongst other
things, assist in avoiding the high cost of replicating
data that will be expensive to regenerate and thereby
aiding related experiments and researches in the
foreseeable and distant future. Understandably, failure
in ensuring continued access to good quality data,
could potentially lead to its under-utilisation to a
considerable extent. Conversely, efficiently managed
and preserved information ascertains its proper
discovery and re-use. This effectively bears the
desirable potential for assisting in high quality future
research and experiments in both same and crossdiscipline environments, as well as other productive
uses.
Evidently, one of the major challenges towards
achieving efficient and continued use of valuable data
resources is to ensure that their quality and integrity
remains intact over time. This is driven by the
changes in technologies and increased flexibility in
their use that result in transforming and putting the
integrity of very data they create at jeopardy.
Therefore, with rapid evolution and enhancements in
related technologies and data formats, the task of
ensuring data quality for long periods of time, i.e.
successful long-term2 data preservation, may seem
incredibly challenging.
Under the challenges set by the task of successful
long-term data preservation, the word ‘Metadata’ is
becoming increasingly prevalent, with a growing
awareness of the role that it can play in accomplishing
such a task.
In fact, the digital preservation
community has already envisaged the need of good
quality and well-managed metadata for reducing the
likelihood of important data becoming un-useable
over substantially long periods of time (Calanag,
Sugimoto, & Tabata, 2001). Furthermore, it has been
recognized that metadata can be used to record
information required to reconstruct or at the very least
understand the reconstruction process of digital
resources on future technological platform (Day,
1999).
Metadata’s assistance in reconstruction or accessibility
of preserved data, however, bears the same
predicament as that of the efficient use of digital
information over time: long-term metadata quality and
integrity assurance notwithstanding the rapid
evolvements of metadata formats and related
technology. The only solution to this problem is
employment of a well conceived, efficient as well as
scalable long-term curation plan or strategy for
metadata. In effect, curation has the ability to inhibit
metadata from becoming out of step with the original
data or undergoing additions, deletions or
transformations, which change the meaning without
being valid. In other words, in order to ascertain the
overall quality and integrity of metadata over a
sustained period of time, thereby assisting in ensuring
continued access to high quality data (particularly
within the e-Science context), effective long-term
metadata curation is indispensable. With the evident
necessity of long-term preservation of scientific data,
the task of long-term curation of the metadata
associated with that data is therefore equally important
and beneficial in the context of e-Science.
This paper endeavors to provide a concise discussion
of the main requirements of long-term metadata
curation. In addition, the paper introduces “Metadata
Curation Record”, expressed in the form of an XML
Schema, which essentially captures additional
statements about both data objects and associated
metadata to aid long-term digital curation. The paper
also presents a metadata schema mapping/migration
2
The time period over which continued access to
digital resources with accepted level of quality despite the
impacts of technological changes is beneficial to both the
user base and curatorial organization(s) of the resources.
tool, which may be regarded as a fundamental stride
towards an efficient and curation-aware metadata
migration strategy.
2. Motivation
Over the past few years, several organized and
arguably successful endeavors (e.g. The NEDLIB
project3) have been made in order to find an effective
solution for successful long-term data preservation.
However, the territory of long-term metadata curation,
although increasingly acknowledged, thus far, is even
somewhat unexplored, let alone conquered. In fact, in
most digital preservation or curation motivated
workgroups and projects, necessity of long-term
metadata curation is seen relegated to the backseat and
deemed secondary, mainly due to lack of awareness of
the criticality of the problem. As a result, no
acceptable methods exist to date for effective
management and preservation of metadata for long
periods of time. The Digital Curation Centre (DCC),
UK4 in conjunction with the e-Science Centre of the
CCLRC however, aims to solve this rather cultural
issue. The work presented in this paper contributes to
the long-term Metadata Curation activity of the DCC.
3. Metadata Defined
The word “metadata” was invented on the model of
meta-philosophy (the philosophy of philosophy),
meta-language (a language to talk about language)
etc. in which the prefix meta expresses reflexive
application of a concept to itself (Kent and
Schuerhoff, 1997). Therefore, at the most basic,
metadata can be considered as data about data.
However, as conformity of the middle term ("about")
of this definition is crucial to a common understanding
of metadata, this classical and simple definition of
metadata has become ubiquitous and is understood in
different ways by many different professional
communities (Gorman, 2004). For example, from the
bibliographic control outlook, the focus of "aboutness"
is on the classification of the source data for
identifying the location of information objects and
facilitating the collocation of subject content.
Conversely, from the computer science oriented data
management perspective, "aboutness" may well
emphasize on the enhancement of use in relation to the
source data. Moreover, this metadata or "aboutness"
is synonymous with its context in the sense of
contextual information.
3
Networked European Deposite Library http://www.kb.nl/coop/nedlib/
4
Digital Curation Centre, UK –
http://www.dcc.ac.uk
Nevertheless, in light of its acknowledged role in the
organisation of and access to networked information
and importance in long-term digital preservation,
metadata may be defined as structured, standardized
information that is crafted specifically to describe a
digital resource, in order to aid in the intelligent,
efficient and enhanced discovery, retrieval, use and
preservation of that resource over time. In the context
of digital preservation, information about the technical
processes associated with preservation is an ideal
example of metadata.
4. Metadata Curation
In effect, the phrase “Metadata Curation” is an integral
part of the phrase “Digital or Data Curation”, which
has different interpretations within different
information domains. From the museum perspective,
data curation covers three core concepts – data
conservation, data preservation and data access.
Access to data or digital information in this sense may
imply preserving data and making sure that the people
to whom the data is relevant can locate it - that access
is possible and useful. Another interpretation of the
phrase “Data Curation” may be the active
management of information, involving planning,
where re-use of the data is the core issue (Macdonald
and Lord, 2002).
Therefore, in essence, long-term data or digital
curation is the continuous activity of managing,
improving and enhancing the use of data or other
digital materials over their life-cycle and over time for
current and future generations of users, in order to
ensure that its suitability sustains for its intended
purpose or a range of purposes, and it is available for
discovery and re-use.
In light of the above construal of digital preservation,
Metadata curation may be defined as an inherent part
of a digital curation process for the continuous
management of metadata (which involves its creation
and/or capturing as well as assuring its overall
integrity) over the life-cycle of the digital materials
that it describes in order to ascertain its suitability for
facilitating the intelligent, efficient and enhanced
discovery, retrieval, use and preservation of those
digital materials over time.
5. Requirements of Metadata Curation
The efficacy of Metadata curation largely relies upon
successful implementation of a number of
requirements.
Although metadata curation
requirements may be quite different according to the
type of data described, the information outlined below
attempts to provide a general overview of the main
requirements.
5.1 Metadata Standards
The Digital Preservation professionals have already
perceived the necessity of metadata formats/standards5
in forestalling obsolescence of metadata (hence
obsolescence of the actual data or resource), due to
dynamic technological changes.
In the context of
long-term data curation, it is essential that the
structure, semantics and syntax of Metadata conform
to a widely supported standard(s), so that it is effective
for the widest possible constituency, maximises its
longevity and facilitates automated processing.
As it would be impractical to even attempt to
determine unequivocally what will be essential in
order to curate metadata in the future, the metadata
elements should reflect (along with other relevant
information such as, metadata creator, creation date,
version etc.) necessary assumptions about the future
requirements in that regard.
Furthermore, the
metadata elements should be interchangeable with the
elements of other approved standards across other
systems with minimal manipulation in order to ensure
metadata interoperability. This will consequently aid
in minimization of overall metadata creation and
maintenance cost. It may also be advantageous to
define specific metadata elements that portray
metadata quality.
5.2 Metadata Preservation
Metadata curation requires metadata to be preserved
along with data in order to ensure its accurate
descriptions over time. To date, the dominant
approach to long-term digital preservation has been
that of migration. Unfortunately, it does pose the
notable danger of data loss or in some cases the loss of
original appearance and structure (i.e. ‘look and feel’)
of data as well as being highly labour intensive.
However, in the context of metadata preservation,
‘look and feel’ of metadata (e.g. differing date/time
formats) is not as imperative as that of the original
data as long as it maintains its aptness for describing
the original data accurately over time. Therefore,
albeit the existence and availability of Emulation
(which seeks to solve the problem of data ‘look and
feel’ loss by mimicking the hardware/software of the
original data analysis environment) Migration would
appear to be a better solution for long-term Metadata
preservation. Having said that, if a superior or
5
Fundamentally, a metadata standard or
specification is a set of specified metadata elements or
attributes (mandatory or optional) based on rules and
guidance provided by the governing body or organisation(s).
alternative preservation strategy is proposed, this
would also be worth considering as both Emulation
and Migration have received criticism for being
costly, highly technical, and labour intensive.
However, a classic unresolved metadata migration
issue is that of tracking and migrating changes to the
metadata itself. This issue is likely to arise when
currently used metadata standards/formats change
and/or evolve in the future. For example, an element
contained within a contemporary metadata format
might be replaced in or even excluded from the newer
versions of that format, thus incurring the problem of
migrating the information under that element to its
corresponding element(s) (if any) of the new format.
Therefore, in order to successfully curate Metadata, a
curation aware migration strategy needs to facilitate
migration of (ideally from old formats to new formats)
and tracking/checking changes (i.e. new formats to old
formats) to metadata between different versions of its
format but also be flexible for addition of further
requirements.
5.3 Metadata Quality Assurance
As highlighted earlier in this paper, quality assurance
of metadata is an integral part of long-term metadata
curation. It needs to be ensured that appropriate
quality assurance procedures or mechanisms are in
place to eliminate any quality flaws in a metadata
record and thereby, ascertain its suitability for its
intended purpose(s). As identified in (JISC, 2003),
incorrect and inconsistent content of metadata
significantly lower the overall quality of metadata.
This inaccuracy and inconsistency in metadata often
occur due to lack of validation and sanity checking at
the time of metadata entry as well as metadata creation
guidelines respectively. Non-interoperable metadata
formats and erroneous metadata creation and/or
management tools are also considerable contributors
to such metadata quality flaws.
validity against metadata schema(s) and reusability of metadata.
While representational curation (i.e. structural
validation) typically begins during or after creation of
metadata records, semantic curation can in fact start
even before metadata records come into existence.
The importance of semantic curation is easily
deducible from the fact that in order to ensure
effective assistance of metadata in long-term digital
curation, metadata curation (i.e. semantic curation)
should begin at the very outset of metadata lifecycle
by developing or adopting a curation-aware metadata
framework or ontology. It is also useful to have a rich
set of functionality for metadata validation
incorporated
within
metadata
creation
and
management tools.
5.4 Metadata Versioning
Throughout the vibrant process of long-term metadata
curation, metadata is prone to be volatile. This
volatility may well be caused by updating of metadata
that can involve amendments to or deletions from
existing metadata records.
However, previous
versions of metadata may need to be retrieved in order
to obtain vital information (e.g. in the case of
annotation - who made the annotation and which
version(s) of a value it applies to) about the associated
preserved information if required. It is therefore
essential to be able to discriminate between metadata
in different states, which arise and co-exist over time
by versioning metadata information.
5.5 Other Requirements
Aside from the requirements outlined above, longterm metadata curation need take the following
additional issues into account:
ƒ
Metadata Policy: A set of broad, high-level
principles that form the guiding framework
within which the Metadata curation can operate,
must be defined. The Metadata Policy would
normally be a subsidiary policy of the
organizational data policy statement and should
reference the rules with reference to legal and
other related issues regarding the use and
preservation of data and metadata, as governed
by the data policy statement.
ƒ
Audit Trail & Provenance Tracking: Metadata
Curation Process should ensure recording of
information with required granularity and
facilitate necessary means to track any
significant changes (e.g. provenance change) to
both data and metadata over their life cycles.
In general, quality of a metadata record is measured
by the degree of consistency with and/or accuracy in
reference to the actual dataset and conformance to
some agreed standard(s). Therefore, digital metadata
curation process, at least the part or module
contributing to metadata quality assurance should be
executed on the following two levels:
ƒ
ƒ
Semantic curation - organizing and managing the
meaning of metadata, i.e. ensuring semantic
validity of metadata to ensure meaningful
description of dataset.
Representational curation - organizing and
managing the formal representation (and
utilization) of metadata, i.e. ascertaining structural
This will, amongst other things, help provide
assurance as to the reliability of the content
information of both data and associated
metadata.
ƒ
Access Constraints & Control: Appropriate
security measures should be adopted to ensure
that the metadata records have not been
compromised by unauthorized sources, thereby
ensure the overall consistency in the metadata
records.
Furthermore, verification of the
metadata records’ authenticity before they are
ingested into the system for long-term curation is
also a crucial requirement. Techniques such as
digital signatures, checksum may be employed
for this function. Fixity information, as defined
within the OAIS framework (OAIS, 2002), is an
ideal example that advocates the use of such
techniques or mechanisms.
6. Metadata Curation Record
It would not be an overstatement to regard the term
“information” as highly crucial as well as instrumental
in the context of long-term digital curation. To
elaborate, success of a long-term curation strategy
predominantly relies on sufficient and accurate
information about the resources being curated. Under
the influence of this observation, the “Metadata
Curation Record” (hereafter referred to as a MCR) has
been constructed in the form of an XML Schema,
which aims to record additional statements about both
data objects and associated metadata to aid long-term
digital curation.
In other words, the MCR essentially pursues two
primary objectives. First objective is to capture as
much information about a digital information object as
possible in order to assist its long-term preservation,
curation and accessibility. This objective may well
echo the objectives of many widely used long-term
preservation motivated XML schemas (e.g.
PREMIS6). The second objective, on the other hand,
is a feature that may not be discerned in most of
contemporary metadata standards. That is to provide
additional statements about the metadata record itself,
thereby supporting long-term curation of that record.
In a digital curation system, the metadata ingest
interface and/or metadata extraction tools/services can
be developed based on the MCR to ensure that
appropriate and sufficient metadata is acquired to aid
in the curation of both data and metadata.
6
PREservation Metadata: Implementation
Strategies -http://www.oclc.org/research/projects/pmwg/
The approach employed to construct the curation
record involved examining a range of different
existing well-known metadata schemas, such as
Dublin Core7, Directory Interchange Format (DIF)8,
DCC RI Label9, CCLRC Scientific Metadata Model
Version 2 (Sufi & Mathews, 2004) and IEEE
Learning Object Metadata (LOM)10 and importing
the most relevant elements (in terms of curation,
preservation and accessibility) from them.
The
rationale for this approach was to utilising existing
resources and thereby avoiding wheel reinvention as
much as possible.
6.1. Overview
In general, as depicted in figure 1, the elements
contained within the Metadata Curation Record, are
divided into four different categories: General,
Availability, Preservation and Curation.
Firstly, the “General” category represents all generic
information (e.g. Creator, Publisher, Keywords, etc.)
about a data object. This category of elements is
primarily required for presenting an overview of the
digital object to its potential users. In addition, the
elements (e.g. keywords, subject) that record
keywords related information; may well be used to aid
in keywords based searching for scientific data across
disparate sources.
Secondly, the “Availability”
elements provide information with regards to
accessing the data object, checking its integrity and
any access or use constraints associated with it.
The “Preservation” category presents information that
will assist in long-term preservation and accessibility
of digital objects. Of particular mention is the OAIS
compliant (OAIS, 2002) “Representation Information
Label”, which captures information required to enable
access to the preserved digital object in a meaningful
way. The use of RI label can be recursive, especially
in cases where meaningful interpretation of one RI
element requires further RI. This recursion continues
until there is sufficient information available to render
the digital object in a form the OAIS Designated
Community can understand.
7
Dublin Core Metadata Initiative http://dublincore.org/
8
DIF Writer’s Guide http://gcmd.gsfc.nasa.gov/User/difguide/difman.html
9
DCC Info Label Report http://dev.dcc.ac.uk/twiki/bin/view/Main/DCCInfoLabelRep
ort
10
IMS LOM http://www.imsproject.org/metadata/mdv1p3pd/imsmd_best
v1p3pd.html
Figure 1: Abstract level view of the Metadata Curation Record
Finally, the “Curation” category elements provide
information about life cycle and other related aspects
(e.g. version, annotation) of the digital object and the
metadata record itself, which may be used in efficient
and long-term curation of both the digital object and
its metadata.
Of particular significance is the
“LifeCycle” elements that are defined to record all
changes along with information about who/what (e.g.
factors, people etc.) conducted or was responsible for
the changes that a digital object undergoes throughout
its life cycle. This type of information is vital for
implementing some crucial curation related
functionalities, such as provenance tracking (as
specified in the OAIS model) and audit trailing that
are essential for checking and ensuring quality and
integrity of data objects.
Furthermore, “MetaMetadata” element, in this
category is dedicated towards capturing information
required to efficiently curate the MCR itself. It has its
own
“General”
(e.g.
Creator,
Indexing),
“Preservation” (e.g. Identifier, Representation
Information), “Availability” (e.g. Metadata Quality,
Access, Rights etc.) and “Curation” (e.g. Event)
category elements. In a Metadata Curation system,
the “MetaMetadata” elements would be implemented
as a separate schema complimenting the MCR
consisting of the rest of the elements.
metadata schema mapping tool that has been
developed using JAVA technologies, aims to resolve
this issue by effectively facilitating easy, semiautomatic migration of metadata between two coexisting versions of its format. The tool employs an
efficient regular expression driven matching algorithm
to determine all possible matches (direct or indirect)
between two versions of a metadata schema
irrespective of their types (i.e. XML or Relational),
calculates mapping rules based on the matches and
finally migrates metadata from the source schema to
the destination schema.
7.1 Rationale
There are numerous commercial and non-commercial
database schema mapping or migration tools available
at present. Most of these tools enable users, to
varying extent, to automatically find matches between
two database and/or XML schemas and migrate and/or
copy data across based on the matches. Examples of
such tools include, Altova Map force11, SwisSQL12,
etc. Many of these tools also facilitate interoperability
between different databases by allowing users to
perform cross-database schema migration, such as
migration from Oracle to DB2, MS SQL to Oracle etc.
Naturally, the existence of these tools may somewhat
question the necessity and the motivation for the
Metadata Schema Mapping Tool.
7. Metadata Schema Mapping Tool
Successful long-term metadata curation demands
curation-aware migration strategy in order to cope
with the metadata migration issue (see 5.2) that arises
from metadata schema/format evolution.
The
11
Altova MapForce http://www.altova.com/download_mapforce.html
12
SwissSQL - www.swissql.com/
Figure 2: Direct & indirect matches between two versions of DATAFILE table
In response to this question, it would not be an
overstatement to say that the inability of currently
available tools to find indirect or non-obvious matches
between two schemas essentially justifies the
necessity and confirms the uniqueness of the Metadata
Schema Mapping tool. To illustrate, two database
tables of two versions of ICAT schema of CCLRC13,
as shown in figure 2, may be considered. Commercial
tools will only be able to determine the direct matches
(as indicated by the thin arrows from source table to
destination table) between these two tables. Here, the
term “Direct Match” refers to as exact duplicate or
replica of a field or column in a database table in
terms of the name and type of that particular field or
column.
These two database tables together provide an ideal
example of the aforementioned metadata migration
issue, i.e. migrating changes to the metadata itself,
especially in cases, where one or more crucial
elements in the old format do not have any directly
corresponding elements in the new format. For
example, the field “DATAFILE_UPDATE_TIME” in
the version 1 of DATAFILE table has been changed to
“DATAFILE_MODIFY_TIME”
in
version
2
(indicated by a solid arrow in the diagram). Currently
available tools will not be able to establish any link
between these two fields as they only search for direct
matches as shown in diagram 2.
However, looking up in any English Dictionary will
help one easily determine that the word “UPDATE” in
the former field is in fact synonymous to the word
13
The ISIS ICAT is a metadata catalog of the
previous 20 years of data collected at the ISIS facility. The
database schema used by them is based heavily on the
CCLRC Scientific Metadata Model version 2 (Sufi &
Mathews, 2004).
“MODIFY” in the latter, hence declare the
aforementioned two fields as “in-direct match”. The
Metadata Schema Mapping tool is capable of doing
exactly that along with other features, such as
determining matches between two tables based on
their relationship (i.e. foreign key based relationship)
with other tables in corresponding schemas, in both
cross-database and cross-schema format oriented
settings.
In addition to its role in metadata curation, the
mapping tool has the ability to play a perceivably
significant role in any data management environment
administering considerably large and evolving data
sets. In such a dynamic environment, data evolution
often results in evolution of underlying schema(s) as
well as change of databases employed (e.g. MS
Access to Oracle), to assist and maintain the changes
to the datasets. The schema mapping tool provides an
easier and relatively less labour-intensive means (than
the commercial tools) of identifying and reconciling
complex and “non-obvious” differences between
schemas. Thereby, the tool effectively facilitates more
accurate migration of data from an old version of a
schema to its newer version irrespective of the schema
type and/or the database they reside in, while enabling
the use of both versions. This will effectively make
the accessibility of the datasets to the users more
declarative as they would be able to pose queries to
the datasets based on a version of the schema they are
familiar with. From this perspective, the use of the
tool is indeed beneficial to the efficient management
of ever-expanding scientific data generated from
various e-Science activities.
8. Future Work
There is certainly a great deal of scope available for
further advancement as well as innovation in terms of
development of an efficient metadata curation
strategy, concrete tool sets and so on. However, as the
domain of long-term metadata curation has yet to be
completely explored, it is difficult to unequivocally set
a limit for the work to be done for a fully potent longterm metadata curation strategy. Nevertheless, key
future activities of the project will include the
development of a metadata curation model, which will
effectively address the core requirements of long-term
metadata curation.
The model will essentially
encompass a curation-aware metadata framework
based on the MCR, efficient post-creation metadata
quality assurance mechanisms and suitable metadata
versioning techniques, amongst other things.
The first draft of the model has already been designed
as an extension to the OAIS reference model and is
currently being assessed for possible improvements.
The OAIS defined archival information preservation
functions are not, however, within the scope of the
Metadata Curation Model, although the model has
explicit reference to some of those functions.
Furthermore, the model is only focused on the
curation of metadata and does not assume the
responsibility of curation of the data that the metadata
describes.
9. Conclusions
Efficient and effective long-term metadata curation is
a key component of successful preservation,
enrichment and access of digital information in the
long term. A preliminary research (Shaon, 2005) for
this project revealed that majority of current metadata
standards, systems and approaches (relevant to the
context of metadata curation) in existence do not
address the full set of metadata curation requirements
as outlined in this paper. This profoundly addresses
the necessity of curation-aware metadata standards,
metadata management standards and system, which
would effectively aid in developing a viable strategy
for long-term metadata curation. Developing new
standards for both the metadata and metadata
management realm, however, would not be an
efficient strategy. Therefore, a specification of
extensions needed to aid metadata curation for
existing standards and systems was recommended and
seen as a fruitful area of both the work presented in
this paper and future work of the project. The
Metadata Curation Record and the Metadata Schema
Mapping tool may therefore be seen as a union of best
features of existing metadata standards and metadata
mapping tools respectively. Nevertheless, the work
should certainly be regarded as initial steps towards
developing an efficient strategy for long-term
metadata curation that would benefit any discipline
concerned with long-term data preservation, such as eScience.
References & Bibliography
Calanag, M.L., Sugimoto, S., & Tabata, K. (2001): A
metadata approach to digital preservation. In:
Proceedings of the International Conference on
Dublin Core and Metadata Applications 2001 (pp.
143-150). Tokyo: National Institute of Informatics
–
http://www.nii.ac.jp/dc2001/proceedings/product/p
aper-24.pdf
Day, M. (1999): Metadata for digital preservation: an
update, Ariadne Issue 22, 1999 http://www.ariadne.ac.uk/issue22/metadata/intro.ht
ml
Gorman, G. E. (2004): International Yearbook of
Library and Information Management, 2003-2004,
metadata applications and management, facet
publishing, 2004, part 1, pages 1-17.
JISC, (2003): Quality Assurance For Metadata, QA
Focus Document, QA Focus, a JISC-funded
advisory service supporting JISC 5/.99 projects
2003 -http://www.ukoln.ac.uk/qafocus/documents/briefings/briefing-43/briefing-43A5.doc
Kent, J. and Schuerhoff, M. (1997): Some Thoughts
About a Metadata Management System, 1997,
Statistics Netherlands –
http://www.vldb.org/archive/vldb2000/presentatio
ns/jarke.pdf
Macdonald, A. and Lord, P. (2002): Digital Data
Curation Task Force Report of the Task Force
Strategy Discussion Day, November 2002 http://www.jisc.ac.uk/uploaded_documents/Curati
onTaskForceFinal1
OAIS (2002): Reference Model for an Open Archival
Information System (OAIS), CCSDS Blue Book.
Issue 1. January 2002 http://public.ccsds.org/publications/archive/650x0b
1.pdf
Shaon, A. (2005): Long-term Metadata Management
& Quality Assurance in Digital Curation, MSc
Dissertation, CCLRC ePublications Archive, 2005http://epubs.cclrc.ac.uk/bitstream/897/MSc_Disser
tation.pdf
Sufi, S. and Matthews, B. (2004): The CLRC Scientific
Metadata Model Version 2, CCLRC ePublications
Archive, August 2004, –
http://epubs.cclrc.ac.uk/bitstream/485/csmdm.versi
on-2.pdf
Download