TextMD - Asiaa

advertisement
MetaData Standard—
TextMD
劉仁翔(報告人)
王憲章
林黃瑋
97/11/15
1
Tables








What’s TextMD
TextMD Element Sets
TextMD Attributes
METS Structures
METS-administrative metadata section
Instances- LoC
Tools for Text- Jhove
References
2
What’s TextMD(1)



Initials----Technical MetaData for text
(Schema for Technical Metadata for Text )
It was originally created by the New York
University Digital Library Team (NYU), and
had been maintained by NYU through the
current version (2.2).
In October 2007, Library of Congress
assumed maintenance of textMD.
3
What’s TextMD(2)


TextMD is a XML Schema that details technical
metadata for text-based digital objects. It most
commonly serves as an extension schema used
within the Metadata Encoding and Transmission
Schema (METS) administrative metadata section.
However, it could also exist as a standalone
document.
In the future textMD can be used within the PREMIS
(Preservation Metadata Implementation Strategies )
element
(From:Library of Congress)
4
TextMD Details as follows
The textMD schema allows for detailing
properties such as:
 encoding information (quality, platform,
software, agent) 資訊
 character information (character set and size,
byte order and size, line terminators) 特徵
 languages 語言
 fonts 字體
 …
5




markup information 標示資訊
processing and textual notes 執行與本文註記
technical requirements for printing and
viewing 列印與檢視技術需求
page ordering and sequencing 頁數排序
(From:Library of Congress)
6
TextMD Element Sets (v.3.0 alpha)
Root element in textMD element set
 textMD





Usage: Root Element for bundling text technical metadata.
Attributes: none.
Contains: encoding, character_info, language,
alt_language, font_script, markup_basis, markup_language,
processingNote, printRequirements, viewingRequirements,
textNote, pageOrder, pageSequence.
Contained by: none.
Additionally, the 3.0 alpha schema now has a target
namespace URI: info:lc/xmlns/textMD-v3.
7
宣告
URI Namespace
元素Encoding platform
8
Model of TextMD 3.0
Schema
From:
http://www.loc.gov/stan
dards/textMD/images/t
extMD-v3.0amodel.png
13 Root Elements
9
Encoding Elements
Contains:
encoding_platform,
encoding_software,
encoding_agent
10
TextMD Attributes 屬性(v.3.0 alpha)
These attributes may appear on given elements within textMD. (alphabetical)

authority



encoding





Usage: Used to record a quality measure (as a string) for the output of the encoding process (OCR quality,
transcription quality, etc.).
Contained by: encoding.
role



Usage: Used to indicate whether the type of linebreak that a system uses. Enumerated values are CR, LF, or
CR/LF.
Contained by: encoding.
QUALITY


Usage: Used to identify a specific variable character set (as a string), such as UTF-8.
Contained by: character_size.
linebreak


Usage: A string used to record the source of the non-ISO 639-2 language code (e.g., Ethnologue).
Contained by: alt_language.
Usage: Used to indicate the role of an agent. Enumerated values are OCR, TRANSCRIBER, MARKUP, and
EDITOR.
Contained by: encoding_agent.
version


Usage: Used to record the version number (as a string) for a given piece of software, a markup language, or
a schema version.
Contained by: encoding_software, markup_basis, or markup_language.
11
sample instance
(standalone) :
standalone
document
12
Reviews-METS Structures







1. METS Header
2. Descriptive Metadata *
3. Administrative Metadata*
4. File Section *
5. Structural Map
6. Structural Links*
7. Behavior *
(*=optional)
13
METS-administrative metadata section

Administrative Metadata




provides information regarding how the files were created
and stored
intellectual property rights metadata regarding the original
source object from which the digital library object derives,
information regarding the provenance(出處) of the files
comprising the digital library object(i.e., master/derivative
file relationships, and migration/transformation information)
As with descriptive metadata, administrative metadata may
be either external to the METS document, or encoded
internally.(Standalone)
From:
http://www.loc.gov/standards/mets
/METSOverview.v2.html#admMD
14
Administrative metadata<amdSec>(1)







<amdSec> elements contain the administrative metadata pertaining to the files
comprising a digital library object, as well as that pertaining to the original
source material used to create the object. There are four main forms of
administrative metadata provided for in a METS document:
1. Technical Metadata (information regarding files' creation, format, and use
characteristics), <techMD>,
2. Intellectual Property Rights Metadata (copyright and license information),
<rightsMD>,
3. Source Metadata (descriptive and administrative metadata regarding the
analog source from which a digital library object derives), <sourceMD>,
4. Digital Provenance Metadata (information regarding source/destination
relationships between files, including master/derivative relationships between
files and information regarding migrations/transformations employed on files
between original digitization of an artifact and its current incarnation as a digital
library object). <digiprovMD>.
Each of these four different types of administrative metadata has a unique
subelement within the <amdSec> portion of a METS document in which that
form of metadata can be embedded: <techMD>, <rightsMD>, <sourceMD>, and
<digiprovMD>.
Each of these four elements may occur more than once in any METS document.
15
Administrative metadata<amdSec>(2)


The <techMD>, <rightsMD>, <sourceMD> and <digiprovMD>
elements employ the same content model as <dmdSec>: they
may contain an <mdRef> element to point to external
administrative metadata, an <mdWrap> element to use when
embedding administrative metadata within a METS document,
or both.
註: Descriptive Metadata <dmdSec>
External Descriptive Metadata (mdRef)
Internal Descriptive Metadata (mdWrap)
Multiple instances of these elements may occur within a
METS document, and all of them must carry an ID attribute so
that other elements within the METS document (such as
divisions within the structural map or <file> elements) may be
linked to the <amdSec> subelements which describe them.
16
an <techMD> element which includes technical
metadata regarding a file's preparation
mdWrap tag可將不同MD帶入METS文
件中(此例 NISOIMG)
<techMD ID="AMD001">
<mdWrap MIMETYPE="text/xml" MDTYPE="NISOIMG" LABEL="NISO
Img. Data">
<xmlData>
<niso:MIMEtype>image/tiff</niso:MIMEtype>
<niso:Compression>LZW</niso:Compression>
<niso:PhotometricInterpretation>8</niso:PhotometricInterpretation>
<niso:Orientation>1</niso:Orientation>
<niso:ScanningAgency>NYU Press</niso:ScanningAgency>
</xmlData>
嵌入XML schema
</mdWrap>
</techMD>
17
LoC的METS應用:AVPrototyping Project
http://www.loc.gov/rr/mopic/a
vprot/metsmenu2.html
18
TEXTMD
From:http://www.loc.gov/rr/mopic/
avprot/metsmenu2.html
19
Suggested Extension Schemas(LoC:AV-Prototyping Project)
Extension Schemas
 Listings of the proposed Extension Schemas can be
found on the Prototyping Project web site at
http://lcweb.loc.gov/rr/mopic/avprot/avlcdocs.html#md.
• GDM
• AllFilesMD
• ImageMD
• TextMD
• AudioMD
• VideoMD
• RightsMD
http://www.loc.gov/rr/mopic/avprot/metsmenu2.html
• SourceMD
• ProcessMD
20
Another Example: AIP
Archival Information Package (LoC)
AIP preliminary design is based upon the architectural concepts of the
OAIS Reference Model and the metadata model specified in the METS.
www.loc.gov/rr/mopic/avprot/AI
P-Study_v19.pdf
21
22
Findings: Technical Metadata for Files
Alternative technical metadata schemas for different
media types are encouraged:
 MIX for images
http://www.loc.gov/standards/mix/mix.xsd
 TextMD for text
http://dlib.nyu.edu/METS/textmd.xsd
 AUDIOMD for audio
http://lcweb2.loc.gov/mets/Schemas/AMD.xsd
 VIDEOMD for video
http://lcweb2.loc.gov/mets/Schemas/VMD.xsd
Where possible we are using JHOVE to derive all of
these (http://hul.harvard.edu/jhove/)
From:Thomas
Habing,
23
Tools for Text: JHOVE (JSTOR/Harvard
Object Validation Environment )



JHOVE provides functions to perform format-specific
identification, validation, and characterization of digital objects.
自動檢測驗證工具
JHOVE is a format-specific digital object validation API written
in Java
Use Cases
24
http://hul.harvard.edu/jhove/
25
Tools for Text: ASCII & UTF-8 modules

ASCII (American Standard Code for
Information Interchange )


Coverage
ASCII (ANSI X3.4-1986, ECMA-6, ISO 646:1991)
[ANSI X3.4, ECMA-6, ISO 646,]
UTF-8 (8-bit UCS/Unicode Transformation
Format )

Coverage
UTF-8 encoded content streams [Unicode]
http://hul.harvard.edu/jhove/
26
References:









TextMD Official Site
http://www.loc.gov/standards/textMD/
TextMD Element Set
http://www.loc.gov/standards/textMD/elementSet/index.html
Model of TextMD 3.0
Schemahttp://www.loc.gov/standards/textMD/images/textMD-v3.0amodel.png
AV Prototype Project Working Documents
http://www.loc.gov/rr/mopic/avprot/metsmenu2.html
JHOVE-JSTOR/Harvard Object Validation Environment, 2008:
http://hul.harvard.edu/jhove/
METS: An Overview & Tutorial
http://www.loc.gov/standards/mets/METSOverview.v2.html
ARCHIVAL INFORMATION PACKAGE (AIP) DESIGN STUDY
www.loc.gov/rr/mopic/avprot/AIP-Study_v19.pdf
METS, MODS and PREMIS, Oh My! , Thomas Habing
http://www.loc.gov/standards/mods/presentations/habing-ala07/
MODS, METS, and other metadata standards, Jerome McDonough
www.kansalliskirjasto.fi/attachments/5m4XaGYjD/5y09H7Dbx/Files/Current
File/Finland-7-modsmets.ppt
27
報告結束
請多指教
28
Download