C032036 Internet Mark-up

advertisement
03009323 - DE LAZZARI Thomas
C032036 Internet Mark-up
Languages – Coursework
Session 2003-2004
Introduction
The coursework has been done for the CNDS module Internet Mark-up Languages
C032036 of Napier University (Edinburgh).
The mark-up format of the files to be parsed is XML.
DTD sets out the rules which a valid XML document agrees with.
XSL is a family of recommendations for defining XML
document transformation and presentation.

In the first part, a DTD has been constructed in order to validate a sample XML file.

In the second part, an XSL stylesheet extract specific data from a file. It allows to
sort all the information on a specific country.

At the end, a conclusion is written in order to guide you through my approach in
solving different problems. The tools and the documentation used are also given.
All the source code has been commented to make easier the understanding of each
steps’ goals.
1
03009323 - DE LAZZARI Thomas
Task 1 – DTD
Here is the code built for SQL tips.
The tip file used (tip194061) is available under resources for module
co32036 on http://www.soc.napier.ac.uk.
Look at the comments. It explains how I have achieved this task and especially in what
the code line referred to.
TIP.DTD
<!-- tip.dtd used to validate the tip file tip194061.html -->
1
2
3
<!-- Enhancement of html, link is plain text -->
4
<!ENTITY % link.content "(#PCDATA)">
5
6
<!-- XHTML entity call in order to define the link element and allow
7
xhtml markup -->
8
<!ENTITY % xhtml SYSTEM "./xhtml11-flat.dtd">
9
%xhtml;
10
11
<!-- tip is the top node, it can contain link, add and sql elements -->
12
<!ELEMENT tip (link, add*, sql*)>
13
14
<!-- Standard xhtml markup allowed by the Flow.mix entity which contains 15
%Block.class, %Inline.class ... -->
16
<!ELEMENT add (#PCDATA|%Flow.mix;)*>
17
<!ELEMENT sql (#PCDATA)>
18
19
<!-- tip has a unique identifier -->
20
<!ATTLIST tip id ID #REQUIRED>
21
22
<!-- Engine specific variation not necessary implied -->
23
<!ATTLIST add engine (access | db2 | mysql | oracle | postgres |
24
sqlserver) #IMPLIED>
25
<!ATTLIST sql engine (access | db2 | mysql | oracle | postgres |
26
sqlserver) #IMPLIED>
27
28
29
30
When rxp validates the tip.dtd, there are warnings due to xhtml11-flat.dtd but no errors.
Warning: Ignoring redefinition of parameter entity head.qname in entity “xhtml” at line
4445 char 32 of file xhtml11-flat.dtd.
2
03009323 - DE LAZZARI Thomas
Task 2 – XSL
Here are the five hardest problems that I have resolved. They are questions
10 to 14. Input files are at http://sqlzoo2.napier.ac.uk/~andrew/cia/.
WORK.XSL
<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:htm="http://www.w3.org/1999/xhtml">
<!-- TOP LEVEL TEMPLATE -->
<xsl:template match="/">
<cia><xsl:apply-templates /></cia>
</xsl:template>
<!-- QUESTION 10 -->
<xsl:template match="htm:tr[contains(.,'Highways:')]">
<xsl:variable name="highways">
<xsl:value-of select="
substring-before(substring-after(.,'paved:'),' km')" />
</xsl:variable>
<!-- There was a problem with the decimal format of the number. In
order to remove the grouping selector, I have used translate()-->
<highway>
<xsl:value-of select="translate($highways,',','') div 1.6" />
</highway>
</xsl:template>
<!-- QUESTION 11 and 13 -->
<xsl:template match="
htm:tr[contains(.,'Diplomatic representation in the US:')]">
<xsl:variable name="fax">
<xsl:value-of select="substring-after(.,'FAX:')" /></xsl:variable>
<!-- We can’t use substring-before() because for for some countries
the words after the fax number are different -->
<fax><xsl:value-of select="substring($fax,0,20)" /></fax>
<xsl:variable name="num">
<!-- In order to match the right words, I had to remove all the
blank spaces and the special characters like 
 -->
<xsl:if
test="contains(translate(.,' ',''),'SanFrancisco')">1</xsl:if>
<xsl:if
test="contains(translate(.,'
',''),'LosAngeles')">1</xsl:if>
</xsl:variable>
<xsl:choose>
<!-- Two west consulates, $num=11 so we must replace it -->
<xsl:when test="$num = 11">
<west-coast><xsl:attribute name="count">2</xsl:attribute>
</west-coast>
</xsl:when>
<xsl:otherwise>
<!-- If no consulates, $num="" -->
<west-coast><xsl:attribute name="count">
<xsl:value-of select="$num" /></xsl:attribute></west-coast>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
3
03009323 - DE LAZZARI Thomas
<!-- QUESTION 12 -->
<xsl:template match="
htm:tr[contains(.,'Airports - with paved runways:')]">
<airport>
<!-- xsl:attribute is used to add attributes to airport -->
<xsl:attribute name="large">
<xsl:value-of select="substring-before
(substring-after(.,'over 3,047 m:'),'2,438 to 3,047 m:')" />
</xsl:attribute>
<xsl:attribute name="medium">
<xsl:value-of select="substring-before
(substring-after(.,'2,438 to 3,047 m:'),'914 to 1,523 m:')" />
</xsl:attribute>
<xsl:attribute name="small">
<xsl:value-of select="substring-before
(substring-after(.,'914 to 1,523 m:'),'under 914 m:')" />
</xsl:attribute>
<xsl:attribute name="tiny">
<xsl:value-of select="substring-before
(substring-after(.,'under 914 m:'),' (2002)')" />
</xsl:attribute>
</airport>
</xsl:template>
<!-- QUESTION 14 -->
<xsl:template match="htm:tr[contains(.,'Exports:')]">
<xsl:variable name="exports">
<xsl:value-of select="substring-before
(substring-after(.,'$'),' ')" /></xsl:variable>
<xsl:variable name="unit">
<!-- For some countries, the unit is not billion but million so we
have to match it in a variable $unit -->
<xsl:value-of select="substring-before
(substring-after(.,$exports),'f.o.b')" /></xsl:variable>
<xsl:variable name="percentage">
<!-- I have used the axis to select the percentage -->
<xsl:value-of select="substring-before(substring-after
(following-sibling::htm:tr[position()=2],'US'),'%')" />
</xsl:variable>
<export>
<!-- floor() round down the result -->
$<xsl:value-of select="floor(($exports*$percentage)*0.01)" />
<xsl:value-of select="$unit" />
</export>
</xsl:template>
<xsl:template match="text()" />
</xsl:stylesheet>
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
OUTPUT.XML
<?xml version="1.0" encoding="UTF-16"?>
<cia xmlns:htm="http://www.w3.org/1999/xhtml">
<fax>[1] (202) 944-6166</fax>
<west-coast count="2" />
<export>$26 billion</export>
<!-- 1 Mile = 1.6 Km -->
<highway>558062.5</highway>
<airport large="13" medium="28" small="80" tiny="57" />
</cia>
1
2
3
4
5
6
7
8
9
10
This is the output from the translation of the file : fr.html.
4
03009323 - DE LAZZARI Thomas
Task 3 – Conclusion
Difficulties and approach

TIP.DTD : In order to test that the <add> element may contain standard xhtml
markup, I have replace the <p> tag in the first <add> element by new extra tags :
<i>, <div>, <b>, and so on...
In the recommendation for XHTML, I found a solution for this problem : %Flow.mix.
I didn’t know that the <link> element was defined in the xhtml11-flat.dtd. So first, I
had : <!ENTITY % Block.extra "|link">.

WORK.XSL : My main problem was for the question number 10. I first tried to use :
<xsl:decimal-format name="us" decimal-separator="."
grouping-separator=","/> and format-number($highways, 'us').
But, it doesn’t work because $highways was not a number (NaN). Thus, I used the
translate() function to remove the “,”.
In the question 13, my first function test was unable to search for the words “Los
Angeles” and “San Francisco” but I noticed that it was working with “San”, “Angeles”,
“Los” or “Francisco”. So, I removed the blank spaces and the special character 

with translate().
Question number 10, I don’t find the same number of the statement. However,
1 Mile = 1.6 Km.
In the 14th problem, I didn’t know how to proceed in the matching of the two different
<tr>. I matched the first one and used the position() function to select the second
one (with the percentage in it).
Tools

RXP : It is a validating XML parser written in C. I have used the MSDOS/Windows
executable with the command line : rxp -V -V tip194061.xml. The first -V option is
the validation of the file, and the second allows the program to stop if there is an
error in the DTD.

MSXSL : The msxsl.exe command line utility enables you to perform command line
Extensible Stylesheet Language (XSL) transformations using the Microsoft® XSL
processor. The command line used is : msxsl fr.xml work.xsl -o output.xml. The
source can be fr.xml or fr.html.
Online help

W3C : http://www.w3.org/Style/XSL/
http://www.w3.org/TR/xhtml1/

Introduction to XML : http://www.dcs.napier.ac.uk/~andrew/xml/

XSL Tutorial : http://www.w3schools.com/xsl/
5
Download