Lifecycle Metadata for Digital Objects October 2, 2006 Implementing Metadata in XML What is the XML environment? XML editor (XML editors can’t do anything automatic until you load a DTD or schema; but you can edit XML in any plain-text editor) XML parser (at a minimum; while XML must be well-formed, it does not have to be validated; available in some browsers) Display program (e.g. browser) DTD or schema to define elements Style sheet for display of elements XSLT engine to convert to other formats (e.g. database, webpage) Review of “orders” of data First-order: language (segmentation) Second-order: encoding Third-order: meaning Fourth order: groups of 3 and/or 4 Fifth order: function Note that each order is “meta” with respect to the one below and “data” with respect to the one above (cf. Goedel) Hence you “mark up” the order you wish to objectivize and access (examples: TEI, EAD) Remember, XML “does” nothing XML is not procedural XML structures information XML can store information (but is not random-access) XML can be a package for sending information XML can be part of a solution for formatting information XML can enable action: RSS To add an RSS feed: First create the information you want to feed in XML Then get yourself harvested! Further information and other examples of XML in action from Libby Peterek XML in two wrapper modes The XML document as metadata repository XML document contains all the metadata Objects themselves are in separate files pointed to by the document (XLinks) The XML document as the whole enchilada Object is marked up in XML too Metadata is added as additional elements to the original object Is this always a good idea? Why mark up the object itself? The object is a text The text is well-formed as a hierarchical structure (problem of overlaps not solved in XML) Advantage is that the object carries its own metadata Why not mark up the object? The object is not a text! The object is a text, but the text is too complex to mark up in XML (hierarchical model doesn’t suit everything; “overlap” problem) Best of both worlds XML metadata tags for human-readable packaging XML metadata loaded into database for random-access processing (Text) object version marked up in XML as relevant Original (text) object and derivative(s) pointed to in separate file(s) for preservation XML Syntax rules for well-formed XML An element containing text or elements must have start and end tags An empty element’s single tag must have a slash (/) before the end bracket All attribute values must be in quotes Elements may not overlap Isolated markup characters may not appear in parsed content Element names may not use all characters (some are forbidden), and case is significant *Structure of the XML Document* Document prologue XML declaration Document type declaration Points to root element Points to external standards (DTDs, namespaces) Lists special internally-defined elements and general entities Document itself Bracketed by root element tags at beginning and end Contains elements, attributes, entities Nested, hierarchical structure XML Declaration Gives version of XML Defines character encoding (optional) <?xml version=“1.0”?> <?xml version=“1.0” encoding=“UTF-8”?> Indicates presence of other needed files (optional) <?xml version=“1.0” encoding=“UTF-8” standalone=“no”?> Document type declaration Points first to root element Then points to any external source for definition of document structure (that is, a DTD or schema), either a local separate file pathname (SYSTEM) or the URL for a file on the network (PUBLIC) <!DOCTYPE example> <!DOCTYPE example SYSTEM “c:\My Documents\classes\metadata\example.dtd”…> Then adds any overriding local elements or entities (internal subset) in square brackets XML document elements Elements don’t need to be declared except to overrride DTD Elements contain information (element tags simply bracket information) <name attribute=value>chardata</name> Empty elements (no data is contained, begin and end element tags are collapsed to one) <name attribute=value /> Attributes of XML elements Elements don’t require attributes; some functions can be achieved by nesting subelements within elements Used to provide more details about an element; used to split off groups of elements for particular purposes (e.g., layout, search) <elementname attname=“value”> General entities in the XML document External entities (e.g., imported text or other object) must be declared in document prologue The “entity” behaves something like a “variable”; once defined, value can be referenced Within the document, the entity name is used preceded by an ampersand: <greeting> Dear &name, </greeting> When the document is displayed or used, the entity value at the time will be substituted for the name Miscellaneous markup <!--Comments--> <![CDATA[Contains#$*^%*&%otherwise forbidden]]> <?processinginstruction data?> Namespaces in the XML document Namespaces must be declared before use: xmlns:name=“URI” then elements from namespace can be used in document as: <name:element>….</name:element> Scope is the element within which namespace is declared, plus descendant nodes Namespaces cannot be validated with a DTD The DTD Document Type Definition; not actually expressed in XML Provides a lexicon of allowed elements and attributes for the XML document that refers to it Defines a content model for each element Like declaration of data types in a programming language; allows you to define your own types (a private, or SYSTEM DTD) Or you can use a preexisting DTD (a PUBLIC DTD, e.g. EAD, Dublin Core) Element declarations in the DTD Occur within the DTD or within the XML document to give local definition overriding the DTD <!ELEMENT name content-model> Element declarations need not be ordered Content-models: (#PCDATA) for character data Element lists subelements (element, element, element) modified by , | ? + * indicating ordered, alternative, optional, multiple, required Attribute declarations in the DTD All attributes for one element declared in an attribute list Gives attribute name, attribute’s data type, attribute’s behavior <!ATTLIST elementname attname1 atttype1 attdesc1 attname2 atttype2 attdesc2 > Entity declarations in the DTD General entities are like variables. They assign a name and define a type. Examples: quoted text <!ENTITY title “Temporary crazy title”> text from an external source other data from an external local file <!ENTITY logo SYSTEM “images/logo.gif” NDATA gif> or data from an external network source indicated as PUBLIC (although this requires a fallback local source as well) Can be inserted thereafter as &title; External parameter entities can import whole DTDs XML Tools for home use In class we will be using XMetaL Author, but it’s far from free (there is a trial download if you are a registered Corel user). One free XML authoring environment is Amaya from the W3C: http://www.w3.org/Amaya/ Another is XML Cooktop: http://www.xmlcooktop.com/ You can also validate individual XML files using online web services pointed to at: http://www.cogsci.ed.ac.uk/~richard/xml-check.html To display XML, you can use IE, Mozilla, Firefox Amaya screenshot XML Cooktop editor screenshot How does all this relate to databases? By defining a “language” for markup in XML, you create categories Compare to accepted method of placing text in a relational table in order to process it Especially useful for regularly-occurring metadata Even freely-occurring objects can thus be found and grouped (e.g., TEI grammatical markup) This is why the structure of a markup scheme is so important: you get what you pay for Exercise 1: Assemble tools Find and look at XMetaL Author in lab Go online and download the Dublin Core DTD into the “My Assets” folder of XMetaL Author: http://dublincore.org/documents/2001/04/11/d cmes-xml/dcmes-xml-dtd.dtd Open a new document in Xmetal and select the DC DTD Exercise 2: Mark something up