03009323 - DE LAZZARI Thomas C032036 Internet Mark-up Languages – Coursework Session 2003-2004 Introduction The coursework has been done for the CNDS module Internet Mark-up Languages C032036 of Napier University (Edinburgh). The mark-up format of the files to be parsed is XML. DTD sets out the rules which a valid XML document agrees with. XSL is a family of recommendations for defining XML document transformation and presentation. In the first part, a DTD has been constructed in order to validate a sample XML file. In the second part, an XSL stylesheet extract specific data from a file. It allows to sort all the information on a specific country. At the end, a conclusion is written in order to guide you through my approach in solving different problems. The tools and the documentation used are also given. All the source code has been commented to make easier the understanding of each steps’ goals. 1 03009323 - DE LAZZARI Thomas Task 1 – DTD Here is the code built for SQL tips. The tip file used (tip194061) is available under resources for module co32036 on http://www.soc.napier.ac.uk. Look at the comments. It explains how I have achieved this task and especially in what the code line referred to. TIP.DTD <!-- tip.dtd used to validate the tip file tip194061.html --> 1 2 3 <!-- Enhancement of html, link is plain text --> 4 <!ENTITY % link.content "(#PCDATA)"> 5 6 <!-- XHTML entity call in order to define the link element and allow 7 xhtml markup --> 8 <!ENTITY % xhtml SYSTEM "./xhtml11-flat.dtd"> 9 %xhtml; 10 11 <!-- tip is the top node, it can contain link, add and sql elements --> 12 <!ELEMENT tip (link, add*, sql*)> 13 14 <!-- Standard xhtml markup allowed by the Flow.mix entity which contains 15 %Block.class, %Inline.class ... --> 16 <!ELEMENT add (#PCDATA|%Flow.mix;)*> 17 <!ELEMENT sql (#PCDATA)> 18 19 <!-- tip has a unique identifier --> 20 <!ATTLIST tip id ID #REQUIRED> 21 22 <!-- Engine specific variation not necessary implied --> 23 <!ATTLIST add engine (access | db2 | mysql | oracle | postgres | 24 sqlserver) #IMPLIED> 25 <!ATTLIST sql engine (access | db2 | mysql | oracle | postgres | 26 sqlserver) #IMPLIED> 27 28 29 30 When rxp validates the tip.dtd, there are warnings due to xhtml11-flat.dtd but no errors. Warning: Ignoring redefinition of parameter entity head.qname in entity “xhtml” at line 4445 char 32 of file xhtml11-flat.dtd. 2 03009323 - DE LAZZARI Thomas Task 2 – XSL Here are the five hardest problems that I have resolved. They are questions 10 to 14. Input files are at http://sqlzoo2.napier.ac.uk/~andrew/cia/. WORK.XSL <?xml version="1.0" ?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:htm="http://www.w3.org/1999/xhtml"> <!-- TOP LEVEL TEMPLATE --> <xsl:template match="/"> <cia><xsl:apply-templates /></cia> </xsl:template> <!-- QUESTION 10 --> <xsl:template match="htm:tr[contains(.,'Highways:')]"> <xsl:variable name="highways"> <xsl:value-of select=" substring-before(substring-after(.,'paved:'),' km')" /> </xsl:variable> <!-- There was a problem with the decimal format of the number. In order to remove the grouping selector, I have used translate()--> <highway> <xsl:value-of select="translate($highways,',','') div 1.6" /> </highway> </xsl:template> <!-- QUESTION 11 and 13 --> <xsl:template match=" htm:tr[contains(.,'Diplomatic representation in the US:')]"> <xsl:variable name="fax"> <xsl:value-of select="substring-after(.,'FAX:')" /></xsl:variable> <!-- We can’t use substring-before() because for for some countries the words after the fax number are different --> <fax><xsl:value-of select="substring($fax,0,20)" /></fax> <xsl:variable name="num"> <!-- In order to match the right words, I had to remove all the blank spaces and the special characters like &#xa; --> <xsl:if test="contains(translate(.,' ',''),'SanFrancisco')">1</xsl:if> <xsl:if test="contains(translate(.,'&#xa;',''),'LosAngeles')">1</xsl:if> </xsl:variable> <xsl:choose> <!-- Two west consulates, $num=11 so we must replace it --> <xsl:when test="$num = 11"> <west-coast><xsl:attribute name="count">2</xsl:attribute> </west-coast> </xsl:when> <xsl:otherwise> <!-- If no consulates, $num="" --> <west-coast><xsl:attribute name="count"> <xsl:value-of select="$num" /></xsl:attribute></west-coast> </xsl:otherwise> </xsl:choose> </xsl:template> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 3 03009323 - DE LAZZARI Thomas <!-- QUESTION 12 --> <xsl:template match=" htm:tr[contains(.,'Airports - with paved runways:')]"> <airport> <!-- xsl:attribute is used to add attributes to airport --> <xsl:attribute name="large"> <xsl:value-of select="substring-before (substring-after(.,'over 3,047 m:'),'2,438 to 3,047 m:')" /> </xsl:attribute> <xsl:attribute name="medium"> <xsl:value-of select="substring-before (substring-after(.,'2,438 to 3,047 m:'),'914 to 1,523 m:')" /> </xsl:attribute> <xsl:attribute name="small"> <xsl:value-of select="substring-before (substring-after(.,'914 to 1,523 m:'),'under 914 m:')" /> </xsl:attribute> <xsl:attribute name="tiny"> <xsl:value-of select="substring-before (substring-after(.,'under 914 m:'),' (2002)')" /> </xsl:attribute> </airport> </xsl:template> <!-- QUESTION 14 --> <xsl:template match="htm:tr[contains(.,'Exports:')]"> <xsl:variable name="exports"> <xsl:value-of select="substring-before (substring-after(.,'$'),' ')" /></xsl:variable> <xsl:variable name="unit"> <!-- For some countries, the unit is not billion but million so we have to match it in a variable $unit --> <xsl:value-of select="substring-before (substring-after(.,$exports),'f.o.b')" /></xsl:variable> <xsl:variable name="percentage"> <!-- I have used the axis to select the percentage --> <xsl:value-of select="substring-before(substring-after (following-sibling::htm:tr[position()=2],'US'),'%')" /> </xsl:variable> <export> <!-- floor() round down the result --> $<xsl:value-of select="floor(($exports*$percentage)*0.01)" /> <xsl:value-of select="$unit" /> </export> </xsl:template> <xsl:template match="text()" /> </xsl:stylesheet> 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 OUTPUT.XML <?xml version="1.0" encoding="UTF-16"?> <cia xmlns:htm="http://www.w3.org/1999/xhtml"> <fax>[1] (202) 944-6166</fax> <west-coast count="2" /> <export>$26 billion</export> <!-- 1 Mile = 1.6 Km --> <highway>558062.5</highway> <airport large="13" medium="28" small="80" tiny="57" /> </cia> 1 2 3 4 5 6 7 8 9 10 This is the output from the translation of the file : fr.html. 4 03009323 - DE LAZZARI Thomas Task 3 – Conclusion Difficulties and approach TIP.DTD : In order to test that the <add> element may contain standard xhtml markup, I have replace the <p> tag in the first <add> element by new extra tags : <i>, <div>, <b>, and so on... In the recommendation for XHTML, I found a solution for this problem : %Flow.mix. I didn’t know that the <link> element was defined in the xhtml11-flat.dtd. So first, I had : <!ENTITY % Block.extra "|link">. WORK.XSL : My main problem was for the question number 10. I first tried to use : <xsl:decimal-format name="us" decimal-separator="." grouping-separator=","/> and format-number($highways, 'us'). But, it doesn’t work because $highways was not a number (NaN). Thus, I used the translate() function to remove the “,”. In the question 13, my first function test was unable to search for the words “Los Angeles” and “San Francisco” but I noticed that it was working with “San”, “Angeles”, “Los” or “Francisco”. So, I removed the blank spaces and the special character &#xa; with translate(). Question number 10, I don’t find the same number of the statement. However, 1 Mile = 1.6 Km. In the 14th problem, I didn’t know how to proceed in the matching of the two different <tr>. I matched the first one and used the position() function to select the second one (with the percentage in it). Tools RXP : It is a validating XML parser written in C. I have used the MSDOS/Windows executable with the command line : rxp -V -V tip194061.xml. The first -V option is the validation of the file, and the second allows the program to stop if there is an error in the DTD. MSXSL : The msxsl.exe command line utility enables you to perform command line Extensible Stylesheet Language (XSL) transformations using the Microsoft® XSL processor. The command line used is : msxsl fr.xml work.xsl -o output.xml. The source can be fr.xml or fr.html. Online help W3C : http://www.w3.org/Style/XSL/ http://www.w3.org/TR/xhtml1/ Introduction to XML : http://www.dcs.napier.ac.uk/~andrew/xml/ XSL Tutorial : http://www.w3schools.com/xsl/ 5