lbsc_670_class05_worksheet_092011

advertisement
Class 5 – Encoding metadata in XML
Exercise overview
Last week we explored the Dublin Core scheme, the MARC metadata scheme and explored
metadata tools. This week we will expand on our understanding of Dublin Core and MARC as
metadata standards and begin exploring a common encoding standard known as XML. By the end of
this exercise we should be familiar with these two standards and should be ready to complete the first
part of assignment 1 (representation).
Instructions
Working in groups of 3-4, complete the worksheet. Because this worksheet involves technical
exercises, each person should complete the technical portions. As your group works through the
technical elements of the worksheet keep talking and helping each other. Wait for your group
members to catch up or help them over rough spots so that you can discuss the key questions
together. As always, appoint one person to read the text and questions on the worksheet, one
person to record the group answers on the worksheet and one person who is responsible for
reporting back. All three members of the group should participate in exploration, discussion and
reading.
Technical note: In the coding instructions below. Code is shown on numbered lines and are
enclosed in boxes. The numbered lines are simply to help as a reference during instruction and
should not be entered into your program. For example a line that reads 56. p { visibility:hidden; }
should simply be typed in as p { visibility:hidden; }
The Exchanger XML Editor
If you have not installed the exchanger editor yet, go it
http://www.exchangerxml.com/editor/downloads.html?x=53&y=19 and install it. You are free to use
any XML editor you like as long as it supports XML Validation and XSL transformations. Before we
get very far into this exercise, we need to recap our tour of the Exchanger XML editor. As a group
lets make sure that each person can:
Organization of Information
Page 1
1. Launch exchanger
2. Create a new XML document, save a document
3. Use the editor
4. Check a documents validity and well-formedness
5. View errors in the Errors pane.
Review
In our lecture we discussed XML documents and their role in storing metadata for use in information
systems. In previous weeks we have focused on metadata schemas and encoding systems. Before
we proceed lets make sure we are on the same page with these two concepts:
Metadata schema: A system that defines the intellectual structure of a document. An example of a
metadata schema is Dublin Core. Dublin Core contains elements to describe the title, creator,
publication date and other descriptive and technical information about a resource. HTML is a schema
that is a bit more complex and has a wide range of elements defined (p, a, h1, h2, div, img). The
HTML system has been defined to work with a specific type of application (a web-browser) but is
increasingly seen in a number of other areas as well.
Encoding system: An encoding system is a system whose purpose is to define how a metadata
schema will be implemented. There are a wide range of encoding systems – XML, RDF, OWL,
MARC are just a few that we will look at this semester. For now we are concerned with XML. XML is
an interoperable encoding system that can accommodate any number of different metadata schemas.
XML Encoding models
Before we return to the Dublin Core and Qualified Dublin Core standard we want to get to know a few
key XML encoding system concepts. XML is an encoding standard that allows us to represent
metadata using a specific technical standard that works with a wide variety of applications. In this
class we will be creating XML in the Exchanger program.
Lets begin by reviewing the Document Object Model.
Organization of Information
Page 2
As we recall, the Document Object Model describes a hierarchical structure which begins with a root
element (in HTML the root element is <html>). In the example above the root element is <table>.
The XML encoding standard also uses the DOM but does not pre-define the element names or
document structure. Instead, the XML DOM simply says that

All XML documents begin with an XML declaration

All XML documents have a root element. That root element is determined by the DTD or
Schema to which the document adheres.

All XML documents must be Well Formed

All XML documents should be Validated according to a schema
Lets review a few of these concepts briefly:
XML Declaration – the first line of any xml document. It tells the application reading the document
that it is an XML document.

DTD – Document type Definition. This is a standardizing document that describes the
metadata schema (e.g. Dublin Core) encoded in the XML document.
Organization of Information
Page 3

Schema – XML Schemas are similar to DTDs but include advanced features that help refine a
metadata schema specification. These features including controlling metadata element
content and structure definitions.

Well formed – In order to be well formed and XML document must conform to the XML
specification. This includes adhering to the DOM.

Validated – Validating an XML document means that it is well formed, has appropriate DTD or
schema definitions included and follows those definitions.

Namespace – A namespace is a container that provides a schema context for the metadata
being represented. For example, when we define a namespace for Dublin core in the example
below we provide a means for the XML document to understand when an element
corresponds to the DC specification.
With these definitions in mind. Lets look at a sample XML document that implements the Dublin Core
metadata Schema.
1.
2. <?xml version="1.0"?>
3. <!—Our XML declaration tells our web-browser that this is an XML document conforming to the
version 1.0 
4. <simpledc
5. xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
6. xsi:noNamespaceSchemaLocation=http://dublincore.org/schemas/xmls/qdc/2008/02/11/simpledc.xsd
7. xmlns:dc="http://purl.org/dc/elements/1.1/">
8. <!—The simpledc element is a wrapper element defined using the xmlns:xsi attribute on line 1.a. This
wrapper element is a placeholder so that our XML document has a root element. In this line you also
see that we define the xsi namespace to hold the root schema for the document (simpledc.xsd) and
we define a dc namespace that holds the actual Dublin core elements 
9. <dc:title>Organization of Information Course Example </dc:title>
10. <!-- Line 4 is a DC properties -->
11. <dc:description>This is an example record </dc:description>
12. <dc:publisher> University of Maryland </dc:publisher>
Organization of Information
Page 4
13. <dc:identifier> http://erikmitchell.info/lbsc670_fall2011</dc:identifier>
14. </simpledc>
15. <!-- after we have entered all our dc properties as xml elements we close our simpledc element -->
Review the code above and answer the following questions
The XML declaration on line 1 serves what purpose?
Why do we have a <simpledc> element and why is it necessary for the document to be “well
formed?”
In which element do we define our dc: namespace?
What purpose does the DC namespace serve?
In which element would we add a namespace declaration for another metadata standard?
Organization of Information
Page 5
Working with XML
Lets take a few minutes and get familiar with creating and validating XML in Exchanger. Do this task
individually but help others in your group.
1. Begin by creating a new XML document in Exchnager
a. File >> New >> Default XML Document >> Ok
2. Replace all of the text in the editor window by typing in the code in the example above
3. Check that the document is well formed
a. XML >> Check Well Formedness. Look for errors in the Errors window.
4. Check that the document validates
a. XML >> Validate
5. Save your document!
Step 1:
Metadata schemas
Metadata schemas are an important part of this class. In the coming exercises we will be spending
more time exploring metadata schemas in more detail. For now, lets understand the elements of the
Dublin Core metadata schema.
Dublin core is comprised of properties, classes, vocabulary schemes and syntax schemes. Lets
review these briefly.
Definitions
Properties: In the Dublin Core world, a property is a specific field (e.g. title, creator, date) to which a
value is assigned (e.g. <dc:title>Metadata 101</dc:title>). The term “property” is equivalent to the
term “element” in the XML realm.
Classes: In Dublin Core, a Class is a refining attribute for a property. It tends to provide additional
context to the value assigned to a property (e.g. <dc:type
xsi:type=”dcterms:DCMIType”>StillImage</dc:type>) Dublin Core properties are comparable to
HTML attributes.
Organization of Information
Page 6
Lets briefly decompose that statement:
<Namespace:Property attributetype=”namespace:Value”>
<dc:type
</Namespace:property>
xsi:type=”dcterms:DCMIType”>StillImage </dc:type>
Vocabulary Encoding Schemes: In Dublin Core, a Vocabulary Encoding Scheme is a reference to
a defined vocabulary in which an assigned value can be referenced. (e.g./<dc:subject
xsi:type=”dcterms:LCSH”>United States – History</dc:subject>
Syntax Encoding Schemes: In Dublin Core, a Syntax Encoding Scheme is a reference to rules that
define the structure of a value. (e.g. <dc:language xsitype=”dcmitype:ISO6392”>eng</dc:language>)
With these definitions in mind. Lets look at the Qualified Dublin Core standard in more depth. Go to a
web browser and open the page http://dublincore.org. Click on Specifications and then on the Dublin
Core Metadata Registry (shortcut is dublincore.org/dcregistry). This page is a good place to start to
get an idea of the fields that a Dublin Core record can contain. Click on the “Browse | Search” link in
the bottom paragraph and select “Summary of all terms” in the dropdown box before clicking browse.
On the page you will see four tables of data – Properties, Classes, Vocabulary Encoding Schemes
and Syntax Encoding Schemes.
Lets use the DC registry and the following example to become familiar with property, value,
vocabulary encoding schema and syntax encoding schemes.
16. <?xml version="1.0"?>
17. <qualifieddc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
18. xsi:noNamespaceSchemaLocation="http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd"
19. xmlns:dc="http://purl.org/dc/elements/1.1/"
20. xmlns:dcterms="http://purl.org/dc/terms/">
21. <dc:title>Organization of Information Course Example </dc:title>
22. <dc:subject xsi:type="dcterms:DDC">062</dc:subject>
Organization of Information
Page 7
23. <dc:subject xsi:type="dcterms:UDC"> 061(410)</dc:subject>
24. <dc:description>This is an example record </dc:description>
25. <dc:description xml:lang="fr"> Cette classe est magnifique!</dc:description>
26. <dc:publisher>University of Maryland </dc:publisher>
27. <dc:identifier xsi:type="dcterms:URI"> http://erikmitchell.info/lbsc670_fall2011</dc:identifier>
28. <dcterms:isPartOf xsi:type="dcterms:URI">http://erikmitchell.info</dcterms:isPartOf>
29. </qualifieddc>
Questions
What XML element uses a DCTerms namespaced property in this example? What is the
definition of that term?
Find an example of syntax encoding scheme for language? Which field uses it?
The root element changed for this example? What line of code contains the changed pointer to
the XSD file required for this?
Organization of Information
Page 8
Practice
Lets get more practice with Exchanger. Open a new file and using the XML file in example 2 and the
Dublin Core registry for help, Create a Qualified Dublin Core Record for the following resource in
Exchanger. Make sure the document is both well formed and validates!. http://nyti.ms/oRLwU8
Key Questions
What technical challenges did your group run into? Were you able to overcome them?
While we have been working with Well formed and validated documents we have not discussed
why these are important – Brainstorm three reasons why validating and well-formedness are
important.
Organization of Information
Page 9
Summary
This week we explored Dublin core and XML. We learned that representations of information objects
in XML allows us to leverage IT systems to make information representations provide services such
as searching for summaries and lists of things, providing user-specific views of information and
storing information for web-scale databases. Next week we will learn more about MARC and about
RSS a metadata scheme that employs an “application profile” approach to provide syndicated feeds.
RSS is a very power platform and we will explore this standard in more detail in the coming weeks.
At the end of each class we will take a moment to reflect on what we learned and what we are still
unclear on. Take a minute to discuss this as a group and then each of you should visit
http://bit.ly/lbsc670_questions. The 1 minute survey is completely anonymous. At the beginning of
each class this semester we will review unanswered questions.
Organization of Information
10
Page
Download