assignment (English)

advertisement
Workshop Interchange Languages
Introduction
This workshop consists of two types of exercises:
1. Exercises on paper. There is space to answer these exercise on this paper (in the bordered area).
2. Electronic exercises.
This assignment is meant to get yourself acquainted with XML and the software used in the other
practical assignments.
The files that are required for this workshop can be found on the course website at:
http://www.cs.uu.nl/docs/vakken/uwt/homologatie/week1/workshop.zip
0. Walkthrough of XML Spy
This section is meant to make you familiar with the editor XML Spy. We will use the document
collection.xml, a short anthology of Dutch poetry. You may need the following buttons from the tool
bar:
Open file
Check wellformedness (F7): yellow check mark
Validate file (F8): green check mark
XSL Transformation (F10)
Enhanced Grid view
Text view
Browser view
Do the following:
1. Start XML Spy
2. Open document collection.xml
3. View the document in each of the three different views: Enhanced Grid view, Text view and
Browser view. In general, working in Text View is most agreeable.
4. Check the well-formedness of the document (not possible in Browser view)
5. Check the validity of the document (not possible in Browser view)
6. Transform the document (the transformation sheet used is poems.xsl, which is called in the second
line of the document). Is there a difference compared to the Browser view?
NB. If you find the font hard to read, you can change the font settings. Choose:
 Tools
 Options
 Text fonts
en change the settings to suit your tastes.
Workshop Interchange Languages 2003
p. 2
1. Well-formed XML
The document wellformed.xml contains eight errors (fouten), so it is not well-formed. Open the
document in XML Spy, check for well-formedness and correct the errors one at a time. Mark the
changes in the document below:
<?xml version="1.0" ?>
<doc>
<fout n="1">
<p>This is a paragraph<p>
</fout>
<fout n="2">
</p>This is a paragraph</p>
</fout>
<fout n="3">
<p>This is a paragraph</para>
</fout>
<fout n="4">
<p>This is a paragraph with an <pb>empty tag</p>
</fout>
<fout n="5">
<chapter n=1>chapter one begins here...</chapter>
</fout>
<fout n="6">
<chapter="1">chapter one begins here...</chapter>
</fout>
<fout n="7">
<chapter>...chapter two ends here</chapter n='2'>
</fout>
<fout n="8">
<p><b>this is bold </b><i>this is italic
<b>this is both</i></b></p>
</fout>
</doc>
2. Valid XML
2.1 From well-formed to valid XML
The document valid.xml is well-formed but not valid.
1. Open the document in XML Spy and check to see if it is well-formed.
2. Assign the DTD simplepoem.dtd to the document. NB the so-called Doctype Declaration has to
be introduced before the <poem> tag.
3. Validate the document and edit the mark-up if necessary.
4. Mark all the changes in the bordered area on the next page.
Workshop Interchange Languages 2003
p. 3
Do you want to know more about the DTD? You can open and examine it using XML Spy. Below is a
simplified diagram of the DTD:
Elements in closed rectangles are required elements; elements within striped rectangles are not. A
<linegroup> consists of one or more <linegroup>s and/or <line>s.
<?xml version="1.0"?>
<poem>
<title>Visser van Ma Yuan</title>
<linegroup>
<line>onder wolken vogels varen</line>
<line>onder golven vliegen vissen</line>
<line>maar daartussen rust de visser</line>
</linegroup>
<line>golven worden hoge wolken</line>
<line>wolken worden hoge golven</line>
<regel>maar intussen rust de visser</regel>
</poem>
2.2 Enter a poem
Get a short poem from the materials ZIP file. Currently, there is only one, called gedicht.pdf (in
Dutch); an English example will be added later. Enter the poem in XML Spy and add mark-up
following the model of the (corrected) poem from exercise 2.1.
Hints:
1. First construct a valid a structure for (a part of) the poem
2. Add text to the structure
3. Add the remaining stanzas and lines
4. Validate regularly while doing so
3. XSLT
3.1 XSLT given
You can format the poem you just entered using the transformation sheet poems.xsl which we used
earlier. Choose Assign XSL from the XSL menu.
Workshop Interchange Languages 2003
p. 4
3.2 Making changes
For this assignment we will use the document visser.xml and the poem-wild.xsl transformation sheet.
First examine the latter. It consists largely of a series of templates, almost all of which have the same
construction. Here is one:
<xsl:template match="author">
<h1>
<xsl:apply-templates/>
</h1>
</xsl:template>
1
2
3
4
5
What can we say about this ?
 An XSLT sheet is itself an XML file.
 In the first line, the <match> attribute indicates to which elements the template will apply. In this
case the template applies to the <author> element.
 Lines 2 and 4 indicate that an element <h1> will be made.
 Line 3 indicates that the contents of the <author> element must be further processed by the
templates. In this case the effect will be that the text from the <author> element will be placed
between <h1> tags.
Adapt the given style sheet in four steps
1. edit the template for ‘author’ so that the ‘author’ name will appear between <h2> tags
2. edit the template for ‘title’ so that the title will appear in italics
3. add a new template for the element <animal>: make the contents appear in bold font
4. add a new template for the element <persoon> and add special mark-up
Extra exercise, optional
Assign a special mark-up to the first line of each stanza. Use a template with the line below as its first
line:
<xsl:template match="line[@n=’1’]">
This template matches with a <line> element with attribute n=”1”.
4. Entities
Entities are chunks of data which can be added to an XML document using its name, like a constant.
During the course you will learn a number of applications of entities. In this workshop you will learn
two:
 inserting an image
 inserting special characters
A number of entities is pre-defined, but usually they must be declared as a part of the Doctype
Declaration. How to do this will be explained below.
4.1 Inserting an image
This is an example of an extended Doctype Declaration. The added declarations are placed between
square brackets [ ]:
<!DOCTYPE TEI.2 SYSTEM "teixlite.dtd" [
<!NOTATION jpeg SYSTEM "jpegplaatje">
<!NOTATION gif SYSTEM "gifplaatje">
<!ENTITY plaatje1 SYSTEM "y1.jpg" NDATA jpeg>
<!ENTITY plaatje2 SYSTEM "y2.jpg" NDATA jpeg>
<!ENTITY plaatje3 SYSTEM "y3.gif" NDATA gif>
]
This may look forbidding, but you will soon understand the system behind it. Let us examine this line:
<!ENTITY plaatje2 SYSTEM "y2.jpg" NDATA jpeg>
Workshop Interchange Languages 2003
p. 5
We find:
<!ENTITY
plaatje2
SYSTEM
"y2.jpg"
NDATA
jpeg
>
Required, not important
The name we will use in the XML file for the external file
Required, not important
The name of the external file as it is called on the disk, including the file name extension
(‘.jpg’).
Required, not important
The type of file. There must be a 'notation' line in de Doctype Declaration of in de DTD
which corresponds to this.
Required, not important
So, for each external file we must know or assign (1) the name we want to use for the file within our
document, (2) the 'real' file name and (3) the type of file.
Now add an illustration to your poem. Hints:
 there is a number of GIF images available
 the ‘notation’ is declared in the DTD
 first add an ‘entity’ declaration to the document
 then add an empty <image/> element: find out from the DTD diagram where this may be done
 add to the <image/> element an attribute with name entityname and as value the internal name
of your entity
 check whether the document is valid
 check whether you can see the image when poems.xsl is applied to the document
4.2 Special characters
Entities can also be used to insert a special character. To do so we place an “entity reference” for this
character in the document. Such a reference consists of
 the & character
 the name of the entity
 het teken ;
An example is é : the entity reference for the character é.
Because of their special meaning in XML, we cannot simply use the characters &, < and > in an XML
document. Instead we must use the following pre-defined entity references for those characters
whenever we want to add them to a text:
&
<
>
&
<
>
(ampersand)
(less than)
(greater than)
Exercise: Open the document code.xml. In this document the <code> element is still empty. The goal
of the exercise is to display an example of well-formed XML code in Browser View. Here is a simple
example:
(see next page)
Workshop Interchange Languages 2003
Now make your own example by adding content to the <code> element. It should contain the three
characters mentioned above.
p. 6
Download