Long-term Digital Metadata Curation Arif Shaon The Centre for Advanced Computing and Emerging Technologies (ACET) The University of Reading, Reading, UK Abstract The rapid increase in data volume and data availability along with the need for continual quality assured searching and indexing information of such data requires efficient and effective metadata management strategies. From this perspective, the necessity for adequate, well-managed and high quality Metadata is becoming increasingly essential for successful long-term high quality data preservation. Metadata's assistance in reconstruction or accessibility of preserved data, however, bears the same predicament as that of the efficient use of digital information over time: long-term metadata quality and integrity assurance notwithstanding the rapid evolvements of metadata formats and related technology. Therefore, in order to ascertain the overall quality and integrity of metadata over a sustained period of time, thereby assisting in successful long-term digital preservation, effective long-term metadata curation is indispensable. This paper presents an approach to long-term metadata curation, which involves a provisional specification of the core requirements of long-term metadata curation. In addition, the paper introduces “Metadata Curation Record”, expressed in the form of an XML Schema, which essentially captures additional statements about both data objects and associated metadata to aid long-term digital curation. The paper also presents a metadata schema mapping/migration tool, which may be regarded as a fundamental stride towards an efficient and curation-aware metadata migration strategy. 1. Introduction Owing to an exponential increase in computing power and communication bandwidth, the past decade has witnessed a spectacular growth in volume of generated scientific data. Major contributors to this phenomenal data deluge include increasingly new avenues of research and experiments facilitated by the virtue of eScience, which enables increasingly global interdisciplinary collaborations of people and of shared resources needed to solve the new problems of science, engineering and humanities. In fact, eScience data generated thus far from different areas, such as sensors, satellites, high-performance computer simulations and so on has been measured in the excess of terabytes every year and is expected to inflate significantly over the next decade. Several Terabytes of scientific data, generated from different scientific research and experiments conducted over the past 20 years and hosted by the Atlas Datastore of CCLRC’s e-Science centre1 provide an ideal of example of this data deluge. This increasingly large volume of scientific data needs to be maintained (i.e. preserved) and highly available (i.e. published) over substantially long-periods of time 1 E-Science Centre, CCLRC - http://www.escience.clrc.ac.uk/web in order to serve it to the future generations of scientists and researchers. This will, amongst other things, assist in avoiding the high cost of replicating data that will be expensive to regenerate and thereby aiding related experiments and researches in the foreseeable and distant future. Understandably, failure in ensuring continued access to good quality data, could potentially lead to its under-utilisation to a considerable extent. Conversely, efficiently managed and preserved information ascertains its proper discovery and re-use. This effectively bears the desirable potential for assisting in high quality future research and experiments in both same and crossdiscipline environments, as well as other productive uses. Evidently, one of the major challenges towards achieving efficient and continued use of valuable data resources is to ensure that their quality and integrity remains intact over time. This is driven by the changes in technologies and increased flexibility in their use that result in transforming and putting the integrity of very data they create at jeopardy. Therefore, with rapid evolution and enhancements in related technologies and data formats, the task of ensuring data quality for long periods of time, i.e. successful long-term2 data preservation, may seem incredibly challenging. Under the challenges set by the task of successful long-term data preservation, the word ‘Metadata’ is becoming increasingly prevalent, with a growing awareness of the role that it can play in accomplishing such a task. In fact, the digital preservation community has already envisaged the need of good quality and well-managed metadata for reducing the likelihood of important data becoming un-useable over substantially long periods of time (Calanag, Sugimoto, & Tabata, 2001). Furthermore, it has been recognized that metadata can be used to record information required to reconstruct or at the very least understand the reconstruction process of digital resources on future technological platform (Day, 1999). Metadata’s assistance in reconstruction or accessibility of preserved data, however, bears the same predicament as that of the efficient use of digital information over time: long-term metadata quality and integrity assurance notwithstanding the rapid evolvements of metadata formats and related technology. The only solution to this problem is employment of a well conceived, efficient as well as scalable long-term curation plan or strategy for metadata. In effect, curation has the ability to inhibit metadata from becoming out of step with the original data or undergoing additions, deletions or transformations, which change the meaning without being valid. In other words, in order to ascertain the overall quality and integrity of metadata over a sustained period of time, thereby assisting in ensuring continued access to high quality data (particularly within the e-Science context), effective long-term metadata curation is indispensable. With the evident necessity of long-term preservation of scientific data, the task of long-term curation of the metadata associated with that data is therefore equally important and beneficial in the context of e-Science. This paper endeavors to provide a concise discussion of the main requirements of long-term metadata curation. In addition, the paper introduces “Metadata Curation Record”, expressed in the form of an XML Schema, which essentially captures additional statements about both data objects and associated metadata to aid long-term digital curation. The paper also presents a metadata schema mapping/migration 2 The time period over which continued access to digital resources with accepted level of quality despite the impacts of technological changes is beneficial to both the user base and curatorial organization(s) of the resources. tool, which may be regarded as a fundamental stride towards an efficient and curation-aware metadata migration strategy. 2. Motivation Over the past few years, several organized and arguably successful endeavors (e.g. The NEDLIB project3) have been made in order to find an effective solution for successful long-term data preservation. However, the territory of long-term metadata curation, although increasingly acknowledged, thus far, is even somewhat unexplored, let alone conquered. In fact, in most digital preservation or curation motivated workgroups and projects, necessity of long-term metadata curation is seen relegated to the backseat and deemed secondary, mainly due to lack of awareness of the criticality of the problem. As a result, no acceptable methods exist to date for effective management and preservation of metadata for long periods of time. The Digital Curation Centre (DCC), UK4 in conjunction with the e-Science Centre of the CCLRC however, aims to solve this rather cultural issue. The work presented in this paper contributes to the long-term Metadata Curation activity of the DCC. 3. Metadata Defined The word “metadata” was invented on the model of meta-philosophy (the philosophy of philosophy), meta-language (a language to talk about language) etc. in which the prefix meta expresses reflexive application of a concept to itself (Kent and Schuerhoff, 1997). Therefore, at the most basic, metadata can be considered as data about data. However, as conformity of the middle term ("about") of this definition is crucial to a common understanding of metadata, this classical and simple definition of metadata has become ubiquitous and is understood in different ways by many different professional communities (Gorman, 2004). For example, from the bibliographic control outlook, the focus of "aboutness" is on the classification of the source data for identifying the location of information objects and facilitating the collocation of subject content. Conversely, from the computer science oriented data management perspective, "aboutness" may well emphasize on the enhancement of use in relation to the source data. Moreover, this metadata or "aboutness" is synonymous with its context in the sense of contextual information. 3 Networked European Deposite Library http://www.kb.nl/coop/nedlib/ 4 Digital Curation Centre, UK – http://www.dcc.ac.uk Nevertheless, in light of its acknowledged role in the organisation of and access to networked information and importance in long-term digital preservation, metadata may be defined as structured, standardized information that is crafted specifically to describe a digital resource, in order to aid in the intelligent, efficient and enhanced discovery, retrieval, use and preservation of that resource over time. In the context of digital preservation, information about the technical processes associated with preservation is an ideal example of metadata. 4. Metadata Curation In effect, the phrase “Metadata Curation” is an integral part of the phrase “Digital or Data Curation”, which has different interpretations within different information domains. From the museum perspective, data curation covers three core concepts – data conservation, data preservation and data access. Access to data or digital information in this sense may imply preserving data and making sure that the people to whom the data is relevant can locate it - that access is possible and useful. Another interpretation of the phrase “Data Curation” may be the active management of information, involving planning, where re-use of the data is the core issue (Macdonald and Lord, 2002). Therefore, in essence, long-term data or digital curation is the continuous activity of managing, improving and enhancing the use of data or other digital materials over their life-cycle and over time for current and future generations of users, in order to ensure that its suitability sustains for its intended purpose or a range of purposes, and it is available for discovery and re-use. In light of the above construal of digital preservation, Metadata curation may be defined as an inherent part of a digital curation process for the continuous management of metadata (which involves its creation and/or capturing as well as assuring its overall integrity) over the life-cycle of the digital materials that it describes in order to ascertain its suitability for facilitating the intelligent, efficient and enhanced discovery, retrieval, use and preservation of those digital materials over time. 5. Requirements of Metadata Curation The efficacy of Metadata curation largely relies upon successful implementation of a number of requirements. Although metadata curation requirements may be quite different according to the type of data described, the information outlined below attempts to provide a general overview of the main requirements. 5.1 Metadata Standards The Digital Preservation professionals have already perceived the necessity of metadata formats/standards5 in forestalling obsolescence of metadata (hence obsolescence of the actual data or resource), due to dynamic technological changes. In the context of long-term data curation, it is essential that the structure, semantics and syntax of Metadata conform to a widely supported standard(s), so that it is effective for the widest possible constituency, maximises its longevity and facilitates automated processing. As it would be impractical to even attempt to determine unequivocally what will be essential in order to curate metadata in the future, the metadata elements should reflect (along with other relevant information such as, metadata creator, creation date, version etc.) necessary assumptions about the future requirements in that regard. Furthermore, the metadata elements should be interchangeable with the elements of other approved standards across other systems with minimal manipulation in order to ensure metadata interoperability. This will consequently aid in minimization of overall metadata creation and maintenance cost. It may also be advantageous to define specific metadata elements that portray metadata quality. 5.2 Metadata Preservation Metadata curation requires metadata to be preserved along with data in order to ensure its accurate descriptions over time. To date, the dominant approach to long-term digital preservation has been that of migration. Unfortunately, it does pose the notable danger of data loss or in some cases the loss of original appearance and structure (i.e. ‘look and feel’) of data as well as being highly labour intensive. However, in the context of metadata preservation, ‘look and feel’ of metadata (e.g. differing date/time formats) is not as imperative as that of the original data as long as it maintains its aptness for describing the original data accurately over time. Therefore, albeit the existence and availability of Emulation (which seeks to solve the problem of data ‘look and feel’ loss by mimicking the hardware/software of the original data analysis environment) Migration would appear to be a better solution for long-term Metadata preservation. Having said that, if a superior or 5 Fundamentally, a metadata standard or specification is a set of specified metadata elements or attributes (mandatory or optional) based on rules and guidance provided by the governing body or organisation(s). alternative preservation strategy is proposed, this would also be worth considering as both Emulation and Migration have received criticism for being costly, highly technical, and labour intensive. However, a classic unresolved metadata migration issue is that of tracking and migrating changes to the metadata itself. This issue is likely to arise when currently used metadata standards/formats change and/or evolve in the future. For example, an element contained within a contemporary metadata format might be replaced in or even excluded from the newer versions of that format, thus incurring the problem of migrating the information under that element to its corresponding element(s) (if any) of the new format. Therefore, in order to successfully curate Metadata, a curation aware migration strategy needs to facilitate migration of (ideally from old formats to new formats) and tracking/checking changes (i.e. new formats to old formats) to metadata between different versions of its format but also be flexible for addition of further requirements. 5.3 Metadata Quality Assurance As highlighted earlier in this paper, quality assurance of metadata is an integral part of long-term metadata curation. It needs to be ensured that appropriate quality assurance procedures or mechanisms are in place to eliminate any quality flaws in a metadata record and thereby, ascertain its suitability for its intended purpose(s). As identified in (JISC, 2003), incorrect and inconsistent content of metadata significantly lower the overall quality of metadata. This inaccuracy and inconsistency in metadata often occur due to lack of validation and sanity checking at the time of metadata entry as well as metadata creation guidelines respectively. Non-interoperable metadata formats and erroneous metadata creation and/or management tools are also considerable contributors to such metadata quality flaws. validity against metadata schema(s) and reusability of metadata. While representational curation (i.e. structural validation) typically begins during or after creation of metadata records, semantic curation can in fact start even before metadata records come into existence. The importance of semantic curation is easily deducible from the fact that in order to ensure effective assistance of metadata in long-term digital curation, metadata curation (i.e. semantic curation) should begin at the very outset of metadata lifecycle by developing or adopting a curation-aware metadata framework or ontology. It is also useful to have a rich set of functionality for metadata validation incorporated within metadata creation and management tools. 5.4 Metadata Versioning Throughout the vibrant process of long-term metadata curation, metadata is prone to be volatile. This volatility may well be caused by updating of metadata that can involve amendments to or deletions from existing metadata records. However, previous versions of metadata may need to be retrieved in order to obtain vital information (e.g. in the case of annotation - who made the annotation and which version(s) of a value it applies to) about the associated preserved information if required. It is therefore essential to be able to discriminate between metadata in different states, which arise and co-exist over time by versioning metadata information. 5.5 Other Requirements Aside from the requirements outlined above, longterm metadata curation need take the following additional issues into account: Metadata Policy: A set of broad, high-level principles that form the guiding framework within which the Metadata curation can operate, must be defined. The Metadata Policy would normally be a subsidiary policy of the organizational data policy statement and should reference the rules with reference to legal and other related issues regarding the use and preservation of data and metadata, as governed by the data policy statement. Audit Trail & Provenance Tracking: Metadata Curation Process should ensure recording of information with required granularity and facilitate necessary means to track any significant changes (e.g. provenance change) to both data and metadata over their life cycles. In general, quality of a metadata record is measured by the degree of consistency with and/or accuracy in reference to the actual dataset and conformance to some agreed standard(s). Therefore, digital metadata curation process, at least the part or module contributing to metadata quality assurance should be executed on the following two levels: Semantic curation - organizing and managing the meaning of metadata, i.e. ensuring semantic validity of metadata to ensure meaningful description of dataset. Representational curation - organizing and managing the formal representation (and utilization) of metadata, i.e. ascertaining structural This will, amongst other things, help provide assurance as to the reliability of the content information of both data and associated metadata. Access Constraints & Control: Appropriate security measures should be adopted to ensure that the metadata records have not been compromised by unauthorized sources, thereby ensure the overall consistency in the metadata records. Furthermore, verification of the metadata records’ authenticity before they are ingested into the system for long-term curation is also a crucial requirement. Techniques such as digital signatures, checksum may be employed for this function. Fixity information, as defined within the OAIS framework (OAIS, 2002), is an ideal example that advocates the use of such techniques or mechanisms. 6. Metadata Curation Record It would not be an overstatement to regard the term “information” as highly crucial as well as instrumental in the context of long-term digital curation. To elaborate, success of a long-term curation strategy predominantly relies on sufficient and accurate information about the resources being curated. Under the influence of this observation, the “Metadata Curation Record” (hereafter referred to as a MCR) has been constructed in the form of an XML Schema, which aims to record additional statements about both data objects and associated metadata to aid long-term digital curation. In other words, the MCR essentially pursues two primary objectives. First objective is to capture as much information about a digital information object as possible in order to assist its long-term preservation, curation and accessibility. This objective may well echo the objectives of many widely used long-term preservation motivated XML schemas (e.g. PREMIS6). The second objective, on the other hand, is a feature that may not be discerned in most of contemporary metadata standards. That is to provide additional statements about the metadata record itself, thereby supporting long-term curation of that record. In a digital curation system, the metadata ingest interface and/or metadata extraction tools/services can be developed based on the MCR to ensure that appropriate and sufficient metadata is acquired to aid in the curation of both data and metadata. 6 PREservation Metadata: Implementation Strategies -http://www.oclc.org/research/projects/pmwg/ The approach employed to construct the curation record involved examining a range of different existing well-known metadata schemas, such as Dublin Core7, Directory Interchange Format (DIF)8, DCC RI Label9, CCLRC Scientific Metadata Model Version 2 (Sufi & Mathews, 2004) and IEEE Learning Object Metadata (LOM)10 and importing the most relevant elements (in terms of curation, preservation and accessibility) from them. The rationale for this approach was to utilising existing resources and thereby avoiding wheel reinvention as much as possible. 6.1. Overview In general, as depicted in figure 1, the elements contained within the Metadata Curation Record, are divided into four different categories: General, Availability, Preservation and Curation. Firstly, the “General” category represents all generic information (e.g. Creator, Publisher, Keywords, etc.) about a data object. This category of elements is primarily required for presenting an overview of the digital object to its potential users. In addition, the elements (e.g. keywords, subject) that record keywords related information; may well be used to aid in keywords based searching for scientific data across disparate sources. Secondly, the “Availability” elements provide information with regards to accessing the data object, checking its integrity and any access or use constraints associated with it. The “Preservation” category presents information that will assist in long-term preservation and accessibility of digital objects. Of particular mention is the OAIS compliant (OAIS, 2002) “Representation Information Label”, which captures information required to enable access to the preserved digital object in a meaningful way. The use of RI label can be recursive, especially in cases where meaningful interpretation of one RI element requires further RI. This recursion continues until there is sufficient information available to render the digital object in a form the OAIS Designated Community can understand. 7 Dublin Core Metadata Initiative http://dublincore.org/ 8 DIF Writer’s Guide http://gcmd.gsfc.nasa.gov/User/difguide/difman.html 9 DCC Info Label Report http://dev.dcc.ac.uk/twiki/bin/view/Main/DCCInfoLabelRep ort 10 IMS LOM http://www.imsproject.org/metadata/mdv1p3pd/imsmd_best v1p3pd.html Figure 1: Abstract level view of the Metadata Curation Record Finally, the “Curation” category elements provide information about life cycle and other related aspects (e.g. version, annotation) of the digital object and the metadata record itself, which may be used in efficient and long-term curation of both the digital object and its metadata. Of particular significance is the “LifeCycle” elements that are defined to record all changes along with information about who/what (e.g. factors, people etc.) conducted or was responsible for the changes that a digital object undergoes throughout its life cycle. This type of information is vital for implementing some crucial curation related functionalities, such as provenance tracking (as specified in the OAIS model) and audit trailing that are essential for checking and ensuring quality and integrity of data objects. Furthermore, “MetaMetadata” element, in this category is dedicated towards capturing information required to efficiently curate the MCR itself. It has its own “General” (e.g. Creator, Indexing), “Preservation” (e.g. Identifier, Representation Information), “Availability” (e.g. Metadata Quality, Access, Rights etc.) and “Curation” (e.g. Event) category elements. In a Metadata Curation system, the “MetaMetadata” elements would be implemented as a separate schema complimenting the MCR consisting of the rest of the elements. metadata schema mapping tool that has been developed using JAVA technologies, aims to resolve this issue by effectively facilitating easy, semiautomatic migration of metadata between two coexisting versions of its format. The tool employs an efficient regular expression driven matching algorithm to determine all possible matches (direct or indirect) between two versions of a metadata schema irrespective of their types (i.e. XML or Relational), calculates mapping rules based on the matches and finally migrates metadata from the source schema to the destination schema. 7.1 Rationale There are numerous commercial and non-commercial database schema mapping or migration tools available at present. Most of these tools enable users, to varying extent, to automatically find matches between two database and/or XML schemas and migrate and/or copy data across based on the matches. Examples of such tools include, Altova Map force11, SwisSQL12, etc. Many of these tools also facilitate interoperability between different databases by allowing users to perform cross-database schema migration, such as migration from Oracle to DB2, MS SQL to Oracle etc. Naturally, the existence of these tools may somewhat question the necessity and the motivation for the Metadata Schema Mapping Tool. 7. Metadata Schema Mapping Tool Successful long-term metadata curation demands curation-aware migration strategy in order to cope with the metadata migration issue (see 5.2) that arises from metadata schema/format evolution. The 11 Altova MapForce http://www.altova.com/download_mapforce.html 12 SwissSQL - www.swissql.com/ Figure 2: Direct & indirect matches between two versions of DATAFILE table In response to this question, it would not be an overstatement to say that the inability of currently available tools to find indirect or non-obvious matches between two schemas essentially justifies the necessity and confirms the uniqueness of the Metadata Schema Mapping tool. To illustrate, two database tables of two versions of ICAT schema of CCLRC13, as shown in figure 2, may be considered. Commercial tools will only be able to determine the direct matches (as indicated by the thin arrows from source table to destination table) between these two tables. Here, the term “Direct Match” refers to as exact duplicate or replica of a field or column in a database table in terms of the name and type of that particular field or column. These two database tables together provide an ideal example of the aforementioned metadata migration issue, i.e. migrating changes to the metadata itself, especially in cases, where one or more crucial elements in the old format do not have any directly corresponding elements in the new format. For example, the field “DATAFILE_UPDATE_TIME” in the version 1 of DATAFILE table has been changed to “DATAFILE_MODIFY_TIME” in version 2 (indicated by a solid arrow in the diagram). Currently available tools will not be able to establish any link between these two fields as they only search for direct matches as shown in diagram 2. However, looking up in any English Dictionary will help one easily determine that the word “UPDATE” in the former field is in fact synonymous to the word 13 The ISIS ICAT is a metadata catalog of the previous 20 years of data collected at the ISIS facility. The database schema used by them is based heavily on the CCLRC Scientific Metadata Model version 2 (Sufi & Mathews, 2004). “MODIFY” in the latter, hence declare the aforementioned two fields as “in-direct match”. The Metadata Schema Mapping tool is capable of doing exactly that along with other features, such as determining matches between two tables based on their relationship (i.e. foreign key based relationship) with other tables in corresponding schemas, in both cross-database and cross-schema format oriented settings. In addition to its role in metadata curation, the mapping tool has the ability to play a perceivably significant role in any data management environment administering considerably large and evolving data sets. In such a dynamic environment, data evolution often results in evolution of underlying schema(s) as well as change of databases employed (e.g. MS Access to Oracle), to assist and maintain the changes to the datasets. The schema mapping tool provides an easier and relatively less labour-intensive means (than the commercial tools) of identifying and reconciling complex and “non-obvious” differences between schemas. Thereby, the tool effectively facilitates more accurate migration of data from an old version of a schema to its newer version irrespective of the schema type and/or the database they reside in, while enabling the use of both versions. This will effectively make the accessibility of the datasets to the users more declarative as they would be able to pose queries to the datasets based on a version of the schema they are familiar with. From this perspective, the use of the tool is indeed beneficial to the efficient management of ever-expanding scientific data generated from various e-Science activities. 8. Future Work There is certainly a great deal of scope available for further advancement as well as innovation in terms of development of an efficient metadata curation strategy, concrete tool sets and so on. However, as the domain of long-term metadata curation has yet to be completely explored, it is difficult to unequivocally set a limit for the work to be done for a fully potent longterm metadata curation strategy. Nevertheless, key future activities of the project will include the development of a metadata curation model, which will effectively address the core requirements of long-term metadata curation. The model will essentially encompass a curation-aware metadata framework based on the MCR, efficient post-creation metadata quality assurance mechanisms and suitable metadata versioning techniques, amongst other things. The first draft of the model has already been designed as an extension to the OAIS reference model and is currently being assessed for possible improvements. The OAIS defined archival information preservation functions are not, however, within the scope of the Metadata Curation Model, although the model has explicit reference to some of those functions. Furthermore, the model is only focused on the curation of metadata and does not assume the responsibility of curation of the data that the metadata describes. 9. Conclusions Efficient and effective long-term metadata curation is a key component of successful preservation, enrichment and access of digital information in the long term. A preliminary research (Shaon, 2005) for this project revealed that majority of current metadata standards, systems and approaches (relevant to the context of metadata curation) in existence do not address the full set of metadata curation requirements as outlined in this paper. This profoundly addresses the necessity of curation-aware metadata standards, metadata management standards and system, which would effectively aid in developing a viable strategy for long-term metadata curation. Developing new standards for both the metadata and metadata management realm, however, would not be an efficient strategy. Therefore, a specification of extensions needed to aid metadata curation for existing standards and systems was recommended and seen as a fruitful area of both the work presented in this paper and future work of the project. The Metadata Curation Record and the Metadata Schema Mapping tool may therefore be seen as a union of best features of existing metadata standards and metadata mapping tools respectively. Nevertheless, the work should certainly be regarded as initial steps towards developing an efficient strategy for long-term metadata curation that would benefit any discipline concerned with long-term data preservation, such as eScience. References & Bibliography Calanag, M.L., Sugimoto, S., & Tabata, K. (2001): A metadata approach to digital preservation. In: Proceedings of the International Conference on Dublin Core and Metadata Applications 2001 (pp. 143-150). Tokyo: National Institute of Informatics – http://www.nii.ac.jp/dc2001/proceedings/product/p aper-24.pdf Day, M. (1999): Metadata for digital preservation: an update, Ariadne Issue 22, 1999 http://www.ariadne.ac.uk/issue22/metadata/intro.ht ml Gorman, G. E. (2004): International Yearbook of Library and Information Management, 2003-2004, metadata applications and management, facet publishing, 2004, part 1, pages 1-17. JISC, (2003): Quality Assurance For Metadata, QA Focus Document, QA Focus, a JISC-funded advisory service supporting JISC 5/.99 projects 2003 -http://www.ukoln.ac.uk/qafocus/documents/briefings/briefing-43/briefing-43A5.doc Kent, J. and Schuerhoff, M. (1997): Some Thoughts About a Metadata Management System, 1997, Statistics Netherlands – http://www.vldb.org/archive/vldb2000/presentatio ns/jarke.pdf Macdonald, A. and Lord, P. (2002): Digital Data Curation Task Force Report of the Task Force Strategy Discussion Day, November 2002 http://www.jisc.ac.uk/uploaded_documents/Curati onTaskForceFinal1 OAIS (2002): Reference Model for an Open Archival Information System (OAIS), CCSDS Blue Book. Issue 1. January 2002 http://public.ccsds.org/publications/archive/650x0b 1.pdf Shaon, A. (2005): Long-term Metadata Management & Quality Assurance in Digital Curation, MSc Dissertation, CCLRC ePublications Archive, 2005http://epubs.cclrc.ac.uk/bitstream/897/MSc_Disser tation.pdf Sufi, S. and Matthews, B. (2004): The CLRC Scientific Metadata Model Version 2, CCLRC ePublications Archive, August 2004, – http://epubs.cclrc.ac.uk/bitstream/485/csmdm.versi on-2.pdf