MetaData Standard— TextMD 劉仁翔(報告人) 王憲章 林黃瑋 97/11/15 1 Tables What’s TextMD TextMD Element Sets TextMD Attributes METS Structures METS-administrative metadata section Instances- LoC Tools for Text- Jhove References 2 What’s TextMD(1) Initials----Technical MetaData for text (Schema for Technical Metadata for Text ) It was originally created by the New York University Digital Library Team (NYU), and had been maintained by NYU through the current version (2.2). In October 2007, Library of Congress assumed maintenance of textMD. 3 What’s TextMD(2) TextMD is a XML Schema that details technical metadata for text-based digital objects. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, it could also exist as a standalone document. In the future textMD can be used within the PREMIS (Preservation Metadata Implementation Strategies ) element (From:Library of Congress) 4 TextMD Details as follows The textMD schema allows for detailing properties such as: encoding information (quality, platform, software, agent) 資訊 character information (character set and size, byte order and size, line terminators) 特徵 languages 語言 fonts 字體 … 5 markup information 標示資訊 processing and textual notes 執行與本文註記 technical requirements for printing and viewing 列印與檢視技術需求 page ordering and sequencing 頁數排序 (From:Library of Congress) 6 TextMD Element Sets (v.3.0 alpha) Root element in textMD element set textMD Usage: Root Element for bundling text technical metadata. Attributes: none. Contains: encoding, character_info, language, alt_language, font_script, markup_basis, markup_language, processingNote, printRequirements, viewingRequirements, textNote, pageOrder, pageSequence. Contained by: none. Additionally, the 3.0 alpha schema now has a target namespace URI: info:lc/xmlns/textMD-v3. 7 宣告 URI Namespace 元素Encoding platform 8 Model of TextMD 3.0 Schema From: http://www.loc.gov/stan dards/textMD/images/t extMD-v3.0amodel.png 13 Root Elements 9 Encoding Elements Contains: encoding_platform, encoding_software, encoding_agent 10 TextMD Attributes 屬性(v.3.0 alpha) These attributes may appear on given elements within textMD. (alphabetical) authority encoding Usage: Used to record a quality measure (as a string) for the output of the encoding process (OCR quality, transcription quality, etc.). Contained by: encoding. role Usage: Used to indicate whether the type of linebreak that a system uses. Enumerated values are CR, LF, or CR/LF. Contained by: encoding. QUALITY Usage: Used to identify a specific variable character set (as a string), such as UTF-8. Contained by: character_size. linebreak Usage: A string used to record the source of the non-ISO 639-2 language code (e.g., Ethnologue). Contained by: alt_language. Usage: Used to indicate the role of an agent. Enumerated values are OCR, TRANSCRIBER, MARKUP, and EDITOR. Contained by: encoding_agent. version Usage: Used to record the version number (as a string) for a given piece of software, a markup language, or a schema version. Contained by: encoding_software, markup_basis, or markup_language. 11 sample instance (standalone) : standalone document 12 Reviews-METS Structures 1. METS Header 2. Descriptive Metadata * 3. Administrative Metadata* 4. File Section * 5. Structural Map 6. Structural Links* 7. Behavior * (*=optional) 13 METS-administrative metadata section Administrative Metadata provides information regarding how the files were created and stored intellectual property rights metadata regarding the original source object from which the digital library object derives, information regarding the provenance(出處) of the files comprising the digital library object(i.e., master/derivative file relationships, and migration/transformation information) As with descriptive metadata, administrative metadata may be either external to the METS document, or encoded internally.(Standalone) From: http://www.loc.gov/standards/mets /METSOverview.v2.html#admMD 14 Administrative metadata<amdSec>(1) <amdSec> elements contain the administrative metadata pertaining to the files comprising a digital library object, as well as that pertaining to the original source material used to create the object. There are four main forms of administrative metadata provided for in a METS document: 1. Technical Metadata (information regarding files' creation, format, and use characteristics), <techMD>, 2. Intellectual Property Rights Metadata (copyright and license information), <rightsMD>, 3. Source Metadata (descriptive and administrative metadata regarding the analog source from which a digital library object derives), <sourceMD>, 4. Digital Provenance Metadata (information regarding source/destination relationships between files, including master/derivative relationships between files and information regarding migrations/transformations employed on files between original digitization of an artifact and its current incarnation as a digital library object). <digiprovMD>. Each of these four different types of administrative metadata has a unique subelement within the <amdSec> portion of a METS document in which that form of metadata can be embedded: <techMD>, <rightsMD>, <sourceMD>, and <digiprovMD>. Each of these four elements may occur more than once in any METS document. 15 Administrative metadata<amdSec>(2) The <techMD>, <rightsMD>, <sourceMD> and <digiprovMD> elements employ the same content model as <dmdSec>: they may contain an <mdRef> element to point to external administrative metadata, an <mdWrap> element to use when embedding administrative metadata within a METS document, or both. 註: Descriptive Metadata <dmdSec> External Descriptive Metadata (mdRef) Internal Descriptive Metadata (mdWrap) Multiple instances of these elements may occur within a METS document, and all of them must carry an ID attribute so that other elements within the METS document (such as divisions within the structural map or <file> elements) may be linked to the <amdSec> subelements which describe them. 16 an <techMD> element which includes technical metadata regarding a file's preparation mdWrap tag可將不同MD帶入METS文 件中(此例 NISOIMG) <techMD ID="AMD001"> <mdWrap MIMETYPE="text/xml" MDTYPE="NISOIMG" LABEL="NISO Img. Data"> <xmlData> <niso:MIMEtype>image/tiff</niso:MIMEtype> <niso:Compression>LZW</niso:Compression> <niso:PhotometricInterpretation>8</niso:PhotometricInterpretation> <niso:Orientation>1</niso:Orientation> <niso:ScanningAgency>NYU Press</niso:ScanningAgency> </xmlData> 嵌入XML schema </mdWrap> </techMD> 17 LoC的METS應用:AVPrototyping Project http://www.loc.gov/rr/mopic/a vprot/metsmenu2.html 18 TEXTMD From:http://www.loc.gov/rr/mopic/ avprot/metsmenu2.html 19 Suggested Extension Schemas(LoC:AV-Prototyping Project) Extension Schemas Listings of the proposed Extension Schemas can be found on the Prototyping Project web site at http://lcweb.loc.gov/rr/mopic/avprot/avlcdocs.html#md. • GDM • AllFilesMD • ImageMD • TextMD • AudioMD • VideoMD • RightsMD http://www.loc.gov/rr/mopic/avprot/metsmenu2.html • SourceMD • ProcessMD 20 Another Example: AIP Archival Information Package (LoC) AIP preliminary design is based upon the architectural concepts of the OAIS Reference Model and the metadata model specified in the METS. www.loc.gov/rr/mopic/avprot/AI P-Study_v19.pdf 21 22 Findings: Technical Metadata for Files Alternative technical metadata schemas for different media types are encouraged: MIX for images http://www.loc.gov/standards/mix/mix.xsd TextMD for text http://dlib.nyu.edu/METS/textmd.xsd AUDIOMD for audio http://lcweb2.loc.gov/mets/Schemas/AMD.xsd VIDEOMD for video http://lcweb2.loc.gov/mets/Schemas/VMD.xsd Where possible we are using JHOVE to derive all of these (http://hul.harvard.edu/jhove/) From:Thomas Habing, 23 Tools for Text: JHOVE (JSTOR/Harvard Object Validation Environment ) JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects. 自動檢測驗證工具 JHOVE is a format-specific digital object validation API written in Java Use Cases 24 http://hul.harvard.edu/jhove/ 25 Tools for Text: ASCII & UTF-8 modules ASCII (American Standard Code for Information Interchange ) Coverage ASCII (ANSI X3.4-1986, ECMA-6, ISO 646:1991) [ANSI X3.4, ECMA-6, ISO 646,] UTF-8 (8-bit UCS/Unicode Transformation Format ) Coverage UTF-8 encoded content streams [Unicode] http://hul.harvard.edu/jhove/ 26 References: TextMD Official Site http://www.loc.gov/standards/textMD/ TextMD Element Set http://www.loc.gov/standards/textMD/elementSet/index.html Model of TextMD 3.0 Schemahttp://www.loc.gov/standards/textMD/images/textMD-v3.0amodel.png AV Prototype Project Working Documents http://www.loc.gov/rr/mopic/avprot/metsmenu2.html JHOVE-JSTOR/Harvard Object Validation Environment, 2008: http://hul.harvard.edu/jhove/ METS: An Overview & Tutorial http://www.loc.gov/standards/mets/METSOverview.v2.html ARCHIVAL INFORMATION PACKAGE (AIP) DESIGN STUDY www.loc.gov/rr/mopic/avprot/AIP-Study_v19.pdf METS, MODS and PREMIS, Oh My! , Thomas Habing http://www.loc.gov/standards/mods/presentations/habing-ala07/ MODS, METS, and other metadata standards, Jerome McDonough www.kansalliskirjasto.fi/attachments/5m4XaGYjD/5y09H7Dbx/Files/Current File/Finland-7-modsmets.ppt 27 報告結束 請多指教 28