Introduction to XML

advertisement
Introduction to XML
Jussi Pohjolainen
TAMK University of Applied Sciences
What is XML?
• eXtensible Markup Language, is a specification
for creating custom markup languages
• W3C Recommendation
• Primary purpose is to help computers to share
data
• XML is meta-language. This means that you
use it for creating languages.
• XML is an extensive concept.
XML Document
•
•
•
•
Every XML-document is text-based
=> sharing data between different computers!
=> sharing data in Internet!
=> platform independence!
Binary vs. Text
• Problems with Binary format
– Platform depence
– Firewalls
– Hard to debug
– Inspecting the file can be hard
• Since XML is text-based, it does not have the
problems mentioned above.
• What are the disadvantages in text format?
XML Doc Advantages
• Easy data sharing, text documents are readable
between any device.
• Documents can be modified with any text editor.
• Possible to understand the contents of the xmldocument just by looking at it with text editor.
• Easy to manipulate via programming languages
• Two levels of correctness: Well formed and Valid.
.doc – file format
Windows
MS Word 2000
0101011010101010001010
1010101110101010001011
1010101110101010110101
1110101010101010101010
Mac OS X
Since .doc is closed binary-format,
there are very few alternatives for
word processors that fully support
the doc – file format
.docx – file format (Office Open XML)
Windows
MS Word 2007
<xml>
<heading1>title</heading1>
.
Now the format is
.
open
and it's much
</xml>
easier to access
Mac OS X
Hopefully in the future there
will be loads of free programs
that support this new open
and easy access file format
SGML vs. XML
SGML: Standard Generalized Markup Language
XML
OOXML
(.docx)
MathML
(.mml)
XHTML
(.xhtml)
HTML
(.html)
XML – Meta Language
• XML is meta language, which you can use to create
your own markup languages.
• There are several XML Markup Languages made for
different purposes
• All the languages have common xml-rules
• Languages: XHTML, OOXML, Open Document, RSS,
SVG, SOAP, SMIL, MathML...
• List:
– http://en.wikipedia.org/wiki/List_of_XML_markup_languages
XHTML - Example
<?xml version="1.0"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en">
<head>
<title>Minimal XHTML 1.0 Document</title>
</head>
<body>
<p>This is a minimal <a
href="http://www.w3.org/TR/xhtml1/">XHTML 1.0</a>
document.</p>
</body>
</html>
SVG - Example
<?xml version="1.0"?>
<!DOCTYPE svg
PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="100%" height="100%" version="1.1"
xmlns="http://www.w3.org/2000/svg">
<circle cx="100" cy="50" r="40" stroke="black"
stroke-width="2" fill="red"/>
</svg>
MathML (Open Office)
<?xml version="1.0"?>
<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C
MathML 1.01//EN" "math.dtd">
<math:math xmlns:math="http://www.w3.org/1998/Math/MathML">
<math:semantics>
<math:mrow>
<math:mi>x</math:mi>
<math:mo math:stretchy="false">=</math:mo>
<math:mfrac>
<math:mrow>
...
</math:mrow>
<math:annotation math:encoding="StarMath 5.0">x = {-b +sqrt{b^{2}-4{ac}}
} over {2 {a}} </math:annotation>
</math:semantics>
</math:math>
RSS 2.0 - Example
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>W3Schools Home Page</title>
<link>http://www.w3schools.com</link>
<description>Free web building tutorials</description>
<item>
<title>RSS Tutorial</title>
<link>http://www.w3schools.com/rss</link>
<description>New RSS tutorial on W3Schools</description>
</item>
<item>
<title>XML Tutorial</title>
<link>http://www.w3schools.com/xml</link>
<description>New XML tutorial on W3Schools</description>
</item>
</channel>
</rss>
XML Editors
•
•
•
•
•
•
•
XML Spy
EditiX
Microsoft XML Notepad
Visual XML
XML Viewer
Xeena
XML Styler, Morphon, XML Writer…
Rules that Apply to Every XML-Document
WELL FORMED XML - DOCUMENT
Correctness
• There are two levels of correctness of an XML
document:
1. Well-formed. A well-formed document conforms
to all of XML's syntax rules.
2. Valid. A valid document additionally conforms to
some semantic rules.
• Let's first look at the XML's syntax rules (1).
Simple Generic XML Example
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<presentation>
<slide number="1">
<name>Introduction to XML</name>
<contents>XML is ...</contents>
</slide>
</presentation>
XML-Declaration
• XML-declaration is optional in XML 1.0, mandatory
in 1.1.
– Recommendation: use it.
• Version: 1.0 or 1.1
• Encoding: character encoding, default utf-8
• Standalone:
–
–
–
–
is the xml-document linked to external markup declaration
yes: no external markup declarations
no: can have external markup declaration (open issue..)
default: "no"
Comparing Declarations
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<presentation>
<slide>
<name>Introduction to XML</name>
<contents>XML is ...</contents>
</slide>
</presentation>
<?xml version="1.0"?>
<presentation>
<slide>
<name>Introduction to XML</name>
<contents>XML is ...</contents>
</slide>
</presentation>
Same Declaration
Element vs. Tag vs. Attribute
• Element consists of start tag, optional content and an
end tag:
– <name>Introduction to XML</name>
• Start tag
– <name>
• Content
– Introduction to XML
• End tag
– </name>
• Start tag may have attribute
– <slide number="1">
Rules about Elements
• Only one root - element
• Every element contains starting tag and an ending tag
• Content is optional: Empty element
– <x></x> <!-- same as -->
– <x/>
• Tag – names are case-sensitive:
– <X></x> <!-- Error -->
• Elements must be ended with the end tag in correct order:
– <p><i>problem here</p></i> <!– Error 
Rules about Attributes
• XML elements can have attributes in the
start tag.
• Attributes must be quoted:
–
–
–
–
<person sex="female">
<person sex='female'>
<gangster name='George "Shotgun" Ziegler'>
<gangster name="George "Shotgun" Ziegler">
Naming Tags
• Names can contain letters, numbers, and
other characters
• Names must not start with a number or
punctuation character
• Names must not start with the letters xml (or
XML, or Xml, etc)
• Names cannot contain spaces
Well-Formed XML
• XML document is well-formed if it follows the
syntax rules.
• XML document must be well-formed!
– it's not an xml-document, if it does not follow the
rules..
Is this Well-Formed XML Document?
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en">
<head>
<title>Minimal XHTML 1.0 Document</title>
</head>
<body>
<p>This is a minimal <a
href="http://www.w3.org/TR/xhtml1/">XHTML 1.0</a>
document.</p>
</body>
</html>
Is this Well-Formed XML Document?
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en">
<head>
<title>Minimal XHTML 1.0 Document</title>
</head>
<body>
<jorma>This is a minimal <a
href="http://www.w3.org/TR/xhtml1/">XHTML 1.0</a>
document.</jorma>
</body>
</html>
Defining the Structure for XML documents
VALID XML DOCUMENT
Valid XML
• XML document is valid if
– 1) It is well formed AND
– 2) It follows some semantic rules
• XML document is usually linked to an external file,
that has semantic rules for the document.
– The file can be dtd (.dtd) or schema (.xsd)
• Semantic rules?
– Name of tags, order of elements
DTD Linking
Rules for XHTML
elements (order,
names, etc)
<?xml version="1.0"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en">
<head>
<title>Minimal XHTML 1.0 Document</title>
</head>
<body>
<p>This is a minimal <a
href="http://www.w3.org/TR/xhtml1/">XHTML 1.0</a>
document.</p>
</body>
</html>
DTD Linking
Defines the structure, tag names and
order for all xhtml - documents
W3C has created XML-language "XHTML"
by defining it's rules in DTD.
Is this valid XML Document?
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en">
<head>
<title>Minimal XHTML 1.0 Document</title>
</head>
<body>
<jorma>This is a minimal <a
href="http://www.w3.org/TR/xhtml1/">XHTML 1.0</a>
document.</jorma>
</body>
</html>
1.
2.
3.
There is no DTD! What language is this? MathML? SVG? XHTML?
Assuming this is XHTML, what version of XHTML? Transitional? Strict?
Assuming this is XHTML strict, does "jorma" – tag belong to XHTML Language?
Invalid XHTML-document
<?xml version="1.0"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en">
<head>
<title>Minimal XHTML 1.0 Document</title>
</head>
<body>
<jorma>This is a minimal <a
href="http://www.w3.org/TR/xhtml1/">XHTML 1.0</a>
document.</jorma>
</body>
</html>
Validating with W3C Service
Invalid XHTML in Browser?
May work... or not. Browser tries to detect the errors and tries to understand
them. If it works with one browser, are you certain that it works with all other
browsers? And with all the versions with the browsers? What about browsers
in handheld devices?
And it might work now, but what about future? How will Firefox 5.0 handle
incorrect web pages?
Invalid XML in General
• Because of HTML heritage, browsers try to
understand invalid XHTML-pages
• This is not the case in other XML-languages.
• In general, if XML-document is invalid, the
processing of the document is cancelled.
Example: MathML and Open Office
Open the Document in External Editor
Modify and Save the Document
Break the XML file
Open the Document
Result
Nope.. It does not try to understand the errors in
the document. It does not handle the document at all.
Benefits of WF and Valid
• XML has strict rules for WF and Valid
• If application tries to manipulate xmldocument it does not have to try to
understand the possible errors in the
document
• This means that handling xml-files via
programming language is much easier
– If the document is correctly formed, manipulate it
– If it isn't display error
Case: TAMKOtuki
• The menu of TAMKOtuki is saved into XMLdocument:
– http://php.tpu.fi/~pohjus/menu/index.php
• Since you can be sure, that the xml-document is
well formed, reading the contents of the xmldocument is fairly easy
– Showing the contents in Web-page:
• http://tamkotuki.tamk.fi/en/ravintolat.php
– Showing the contents in TAMK-intra:
• https://intra.tamk.fi
– Showing the contents in Mobile Devices..
Download