Workshop Interchange Languages Introduction This workshop consists of two types of exercises: 1. Exercises on paper. There is space to answer these exercise on this paper (in the bordered area). 2. Electronic exercises. This assignment is meant to get yourself acquainted with XML and the software used in the other practical assignments. The files that are required for this workshop can be found on the course website at: http://www.cs.uu.nl/docs/vakken/uwt/homologatie/week1/workshop.zip 0. Walkthrough of XML Spy This section is meant to make you familiar with the editor XML Spy. We will use the document collection.xml, a short anthology of Dutch poetry. You may need the following buttons from the tool bar: Open file Check wellformedness (F7): yellow check mark Validate file (F8): green check mark XSL Transformation (F10) Enhanced Grid view Text view Browser view Do the following: 1. Start XML Spy 2. Open document collection.xml 3. View the document in each of the three different views: Enhanced Grid view, Text view and Browser view. In general, working in Text View is most agreeable. 4. Check the well-formedness of the document (not possible in Browser view) 5. Check the validity of the document (not possible in Browser view) 6. Transform the document (the transformation sheet used is poems.xsl, which is called in the second line of the document). Is there a difference compared to the Browser view? NB. If you find the font hard to read, you can change the font settings. Choose: Tools Options Text fonts en change the settings to suit your tastes. Workshop Interchange Languages 2003 p. 2 1. Well-formed XML The document wellformed.xml contains eight errors (fouten), so it is not well-formed. Open the document in XML Spy, check for well-formedness and correct the errors one at a time. Mark the changes in the document below: <?xml version="1.0" ?> <doc> <fout n="1"> <p>This is a paragraph<p> </fout> <fout n="2"> </p>This is a paragraph</p> </fout> <fout n="3"> <p>This is a paragraph</para> </fout> <fout n="4"> <p>This is a paragraph with an <pb>empty tag</p> </fout> <fout n="5"> <chapter n=1>chapter one begins here...</chapter> </fout> <fout n="6"> <chapter="1">chapter one begins here...</chapter> </fout> <fout n="7"> <chapter>...chapter two ends here</chapter n='2'> </fout> <fout n="8"> <p><b>this is bold </b><i>this is italic <b>this is both</i></b></p> </fout> </doc> 2. Valid XML 2.1 From well-formed to valid XML The document valid.xml is well-formed but not valid. 1. Open the document in XML Spy and check to see if it is well-formed. 2. Assign the DTD simplepoem.dtd to the document. NB the so-called Doctype Declaration has to be introduced before the <poem> tag. 3. Validate the document and edit the mark-up if necessary. 4. Mark all the changes in the bordered area on the next page. Workshop Interchange Languages 2003 p. 3 Do you want to know more about the DTD? You can open and examine it using XML Spy. Below is a simplified diagram of the DTD: Elements in closed rectangles are required elements; elements within striped rectangles are not. A <linegroup> consists of one or more <linegroup>s and/or <line>s. <?xml version="1.0"?> <poem> <title>Visser van Ma Yuan</title> <linegroup> <line>onder wolken vogels varen</line> <line>onder golven vliegen vissen</line> <line>maar daartussen rust de visser</line> </linegroup> <line>golven worden hoge wolken</line> <line>wolken worden hoge golven</line> <regel>maar intussen rust de visser</regel> </poem> 2.2 Enter a poem Get a short poem from the materials ZIP file. Currently, there is only one, called gedicht.pdf (in Dutch); an English example will be added later. Enter the poem in XML Spy and add mark-up following the model of the (corrected) poem from exercise 2.1. Hints: 1. First construct a valid a structure for (a part of) the poem 2. Add text to the structure 3. Add the remaining stanzas and lines 4. Validate regularly while doing so 3. XSLT 3.1 XSLT given You can format the poem you just entered using the transformation sheet poems.xsl which we used earlier. Choose Assign XSL from the XSL menu. Workshop Interchange Languages 2003 p. 4 3.2 Making changes For this assignment we will use the document visser.xml and the poem-wild.xsl transformation sheet. First examine the latter. It consists largely of a series of templates, almost all of which have the same construction. Here is one: <xsl:template match="author"> <h1> <xsl:apply-templates/> </h1> </xsl:template> 1 2 3 4 5 What can we say about this ? An XSLT sheet is itself an XML file. In the first line, the <match> attribute indicates to which elements the template will apply. In this case the template applies to the <author> element. Lines 2 and 4 indicate that an element <h1> will be made. Line 3 indicates that the contents of the <author> element must be further processed by the templates. In this case the effect will be that the text from the <author> element will be placed between <h1> tags. Adapt the given style sheet in four steps 1. edit the template for ‘author’ so that the ‘author’ name will appear between <h2> tags 2. edit the template for ‘title’ so that the title will appear in italics 3. add a new template for the element <animal>: make the contents appear in bold font 4. add a new template for the element <persoon> and add special mark-up Extra exercise, optional Assign a special mark-up to the first line of each stanza. Use a template with the line below as its first line: <xsl:template match="line[@n=’1’]"> This template matches with a <line> element with attribute n=”1”. 4. Entities Entities are chunks of data which can be added to an XML document using its name, like a constant. During the course you will learn a number of applications of entities. In this workshop you will learn two: inserting an image inserting special characters A number of entities is pre-defined, but usually they must be declared as a part of the Doctype Declaration. How to do this will be explained below. 4.1 Inserting an image This is an example of an extended Doctype Declaration. The added declarations are placed between square brackets [ ]: <!DOCTYPE TEI.2 SYSTEM "teixlite.dtd" [ <!NOTATION jpeg SYSTEM "jpegplaatje"> <!NOTATION gif SYSTEM "gifplaatje"> <!ENTITY plaatje1 SYSTEM "y1.jpg" NDATA jpeg> <!ENTITY plaatje2 SYSTEM "y2.jpg" NDATA jpeg> <!ENTITY plaatje3 SYSTEM "y3.gif" NDATA gif> ] This may look forbidding, but you will soon understand the system behind it. Let us examine this line: <!ENTITY plaatje2 SYSTEM "y2.jpg" NDATA jpeg> Workshop Interchange Languages 2003 p. 5 We find: <!ENTITY plaatje2 SYSTEM "y2.jpg" NDATA jpeg > Required, not important The name we will use in the XML file for the external file Required, not important The name of the external file as it is called on the disk, including the file name extension (‘.jpg’). Required, not important The type of file. There must be a 'notation' line in de Doctype Declaration of in de DTD which corresponds to this. Required, not important So, for each external file we must know or assign (1) the name we want to use for the file within our document, (2) the 'real' file name and (3) the type of file. Now add an illustration to your poem. Hints: there is a number of GIF images available the ‘notation’ is declared in the DTD first add an ‘entity’ declaration to the document then add an empty <image/> element: find out from the DTD diagram where this may be done add to the <image/> element an attribute with name entityname and as value the internal name of your entity check whether the document is valid check whether you can see the image when poems.xsl is applied to the document 4.2 Special characters Entities can also be used to insert a special character. To do so we place an “entity reference” for this character in the document. Such a reference consists of the & character the name of the entity het teken ; An example is &eacute; : the entity reference for the character é. Because of their special meaning in XML, we cannot simply use the characters &, < and > in an XML document. Instead we must use the following pre-defined entity references for those characters whenever we want to add them to a text: & < > &amp; &lt; &gt; (ampersand) (less than) (greater than) Exercise: Open the document code.xml. In this document the <code> element is still empty. The goal of the exercise is to display an example of well-formed XML code in Browser View. Here is a simple example: (see next page) Workshop Interchange Languages 2003 Now make your own example by adding content to the <code> element. It should contain the three characters mentioned above. p. 6