Lifecycle Metadata for Digital Objects October 2, 2006 Implementing Metadata in XML

advertisement
Lifecycle Metadata for
Digital Objects
October 2, 2006
Implementing Metadata in XML
What is the XML environment?






XML editor (XML editors can’t do anything automatic until you
load a DTD or schema; but you can edit XML in any plain-text
editor)
XML parser (at a minimum; while XML must be well-formed, it
does not have to be validated; available in some browsers)
Display program (e.g. browser)
DTD or schema to define elements
Style sheet for display of elements
XSLT engine to convert to other formats (e.g. database,
webpage)
Review of “orders” of data







First-order: language (segmentation)
Second-order: encoding
Third-order: meaning
Fourth order: groups of 3 and/or 4
Fifth order: function
Note that each order is “meta” with respect to the
one below and “data” with respect to the one
above (cf. Goedel)
Hence you “mark up” the order you wish to
objectivize and access (examples: TEI, EAD)
Remember, XML “does” nothing





XML is not procedural
XML structures information
XML can store information (but is not
random-access)
XML can be a package for sending
information
XML can be part of a solution for formatting
information
XML can enable action: RSS




To add an RSS feed:
First create the information you want to feed
in XML
Then get yourself harvested!
Further information and other examples of
XML in action from Libby Peterek
XML in two wrapper modes

The XML document as metadata repository



XML document contains all the metadata
Objects themselves are in separate files pointed
to by the document (XLinks)
The XML document as the whole enchilada



Object is marked up in XML too
Metadata is added as additional elements to the
original object
Is this always a good idea?
Why mark up the object itself?



The object is a text
The text is well-formed as a hierarchical
structure (problem of overlaps not solved in
XML)
Advantage is that the object carries its own
metadata
Why not mark up the object?


The object is not a text!
The object is a text, but the text is too
complex to mark up in XML (hierarchical
model doesn’t suit everything; “overlap”
problem)
Best of both worlds




XML metadata tags for human-readable
packaging
XML metadata loaded into database for
random-access processing
(Text) object version marked up in XML as
relevant
Original (text) object and derivative(s) pointed
to in separate file(s) for preservation
XML Syntax rules for well-formed
XML






An element containing text or elements must have start
and end tags
An empty element’s single tag must have a slash (/)
before the end bracket
All attribute values must be in quotes
Elements may not overlap
Isolated markup characters may not appear in parsed
content
Element names may not use all characters (some are
forbidden), and case is significant
*Structure of the XML Document*

Document prologue



XML declaration
Document type declaration
 Points to root element
 Points to external standards (DTDs, namespaces)
 Lists special internally-defined elements and general
entities
Document itself



Bracketed by root element tags at beginning and end
Contains elements, attributes, entities
Nested, hierarchical structure
XML Declaration

Gives version of XML


Defines character encoding (optional)


<?xml version=“1.0”?>
<?xml version=“1.0” encoding=“UTF-8”?>
Indicates presence of other needed files
(optional)

<?xml version=“1.0” encoding=“UTF-8”
standalone=“no”?>
Document type declaration

Points first to root element


Then points to any external source for definition of
document structure (that is, a DTD or schema),
either a local separate file pathname (SYSTEM) or
the URL for a file on the network (PUBLIC)


<!DOCTYPE example>
<!DOCTYPE example SYSTEM “c:\My
Documents\classes\metadata\example.dtd”…>
Then adds any overriding local elements or entities
(internal subset) in square brackets
XML document elements


Elements don’t need to be declared except to
overrride DTD
Elements contain information (element tags
simply bracket information)


<name attribute=value>chardata</name>
Empty elements (no data is contained, begin
and end element tags are collapsed to one)

<name attribute=value />
Attributes of XML elements



Elements don’t require attributes; some
functions can be achieved by nesting
subelements within elements
Used to provide more details about an
element; used to split off groups of elements
for particular purposes (e.g., layout, search)
<elementname attname=“value”>
General entities in the XML
document



External entities (e.g., imported text or other object)
must be declared in document prologue
The “entity” behaves something like a “variable”;
once defined, value can be referenced
Within the document, the entity name is used
preceded by an ampersand:


<greeting> Dear &name, </greeting>
When the document is displayed or used, the entity
value at the time will be substituted for the name
Miscellaneous markup



<!--Comments-->
<![CDATA[Contains#$*^%*&%otherwise
forbidden]]>
<?processinginstruction data?>
Namespaces in the XML
document



Namespaces must be declared before use:
xmlns:name=“URI” then elements from
namespace can be used in document as:
<name:element>….</name:element>
Scope is the element within which
namespace is declared, plus descendant
nodes
Namespaces cannot be validated with a DTD
The DTD





Document Type Definition; not actually
expressed in XML
Provides a lexicon of allowed elements and
attributes for the XML document that refers to it
Defines a content model for each element
Like declaration of data types in a programming
language; allows you to define your own types
(a private, or SYSTEM DTD)
Or you can use a preexisting DTD (a PUBLIC
DTD, e.g. EAD, Dublin Core)
Element declarations in the DTD

Occur within the DTD or within the XML
document to give local definition overriding
the DTD



<!ELEMENT name content-model>
Element declarations need not be ordered
Content-models:


(#PCDATA) for character data
Element lists subelements (element, element,
element) modified by , | ? + * indicating ordered,
alternative, optional, multiple, required
Attribute declarations in the DTD


All attributes for one element declared in an
attribute list
Gives attribute name, attribute’s data type,
attribute’s behavior

<!ATTLIST elementname
attname1 atttype1 attdesc1
attname2 atttype2 attdesc2
>
Entity declarations in the DTD

General entities are like variables. They assign a
name and define a type. Examples:






quoted text <!ENTITY title “Temporary crazy title”>
text from an external source
other data from an external local file
<!ENTITY logo SYSTEM “images/logo.gif” NDATA
gif>
or data from an external network source indicated as
PUBLIC (although this requires a fallback local source
as well)
Can be inserted thereafter as &title;
External parameter entities can import whole
DTDs
XML Tools for home use





In class we will be using XMetaL Author, but it’s far
from free (there is a trial download if you are a
registered Corel user).
One free XML authoring environment is Amaya from
the W3C: http://www.w3.org/Amaya/
Another is XML Cooktop:
http://www.xmlcooktop.com/
You can also validate individual XML files using
online web services pointed to at:
http://www.cogsci.ed.ac.uk/~richard/xml-check.html
To display XML, you can use IE, Mozilla, Firefox
Amaya screenshot
XML Cooktop editor screenshot
How does all this relate to
databases?





By defining a “language” for markup in XML, you
create categories
Compare to accepted method of placing text in a
relational table in order to process it
Especially useful for regularly-occurring metadata
Even freely-occurring objects can thus be found and
grouped (e.g., TEI grammatical markup)
This is why the structure of a markup scheme is so
important: you get what you pay for
Exercise 1: Assemble tools



Find and look at XMetaL Author in lab
Go online and download the Dublin Core
DTD into the “My Assets” folder of XMetaL
Author:
http://dublincore.org/documents/2001/04/11/d
cmes-xml/dcmes-xml-dtd.dtd
Open a new document in Xmetal and select
the DC DTD
Exercise 2: Mark something up
Download