XML - Gonzaga University

advertisement
XP
New Perspectives on XML
Tutorial 1 – Creating an XML
Document
Jason C. H. Chen, Ph.D.
Professor of MIS
School of Business Administration
Gonzaga University
Spokane, WA 99223 USA
chen@jepson.gonzaga.edu
http://barney.gonzaga.edu/~chen
1
Introducing XML
XP
• XML stands for Extensible Markup Language. A
markup language specifies the structure and
content of a document.
• Because it is extensible, XML can be used to
create a wide variety of document types.
2
Origin of XML
XP
XML
SGML
GML
3
Introducing XML
XP
• XML is a subset of a the Standard Generalized
Markup Language (SGML) which was introduced
in the 1980s. SGML is very complex and can be
costly.
• These reasons led to the creation of Hypertext
Markup Language (HTML), a more easily used
markup language. XML can be seen as sitting
between SGML and HTML – easier to learn than
SGML, but more robust than HTML.
4
Differences to HTML
XP
Sto computer
5
Differences to HTML
XP
(Parser)
6
The Limits of HTML
XP
• HTML was designed for formatting text on a Web page. It
was not designed for dealing with the content of a Web page.
Additional features have been added to HTML, but they do
not solve data description or cataloging issues in an HTML
document.
• Because HTML is not extensible, it cannot be modified to
meet specific needs. Browser developers have added features
making HTML more robust, but this has resulted in a
confusing mix of different HTML standards.
• HTML cannot be applied consistently. Different browsers
require different standards making the final document appear
differently on one browser compared with another.
7
The 10 Primary XML Design Goals
XP
1. XML must be easily usable over the Internet
•
•
XML was developed with the Web in mind.
XML supports major Web protocols such as HTTP and
MIME
2. XML must support a wide variety of applications
•
XML can be used for other applications such as
databases, financial transactions, and voice mail
3. XML must be compatible with SGML
•
because XML is a subset of SGML, many software tools
developed for SGML cab be adapted to XML
8
The 10 Primary XML Design Goals
XP
4. It must be easy to write programs that process XML
documents
•
It is easy for nonprogrammers to write XML code
5. The number of optional features in XML must be
kept small
•
•
SGML supports a wide range of optional features and can
be large and cumbersome.
XML removed this aspect of SGML making it a more
suitable Web-development tool
6. XML documents should be clear and easily
understood
•
•
Like HTML, XML documents are text files.
The contents of an XML document follow a tree-like
structure
9
The 10 Primary XML Design Goals
XP
7. The XML design should be prepared quickly
•
•
If the Web community adopted XML, it was going to
be a viable alternative to HTML
The W3C had to quickly settle on a design for XML
before competing standards emerged
8. The design of XML must be exact and concise
•
XML can be easily processed by computer programs
making it easy for programmers to develop programs
9. XML documents must be easy to create
•
for XML to be practical, XML documents must be as
easy to create as HTML documents
10. Terseness in XML markup is of minimal
importance
10
XML Parsers
XP
• An XML processor (also called XML parser)
evaluates the document to make sure it conforms to
all XML specifications for structure and syntax.
• XML parsers are strict. It is this rigidity built into
XML that ensures XML code accepted by the parser
will work the same everywhere.
• Microsoft’s parser is called MSXML and is built
directly in IE versions 5.0 and above.
• Netscape developed its own parser, called Mozilla,
which is built into version 6.0 and above.
11
The Document Creation Process
XP
This figure shows the document creation process
12
Working with XML Applications
XP
• XML has the ability to create markup languages,
called XML applications. Many have been
developed to work with specific types of
documents.
• Each application uses a defined set of tag names
called a vocabulary. This makes it easier to
exchange information between different
organizations and computer applications.
13
XML Applications
XP
This figure shows some XML applications
14
The Structure of an XML Document
XP
• XML documents consist of three parts
– The prolog
– The document body
– The epilog
• The prolog is optional and provides information
about the document itself
• The document body contains the document’s
content in a hierarchical tree structure.
• The epilog is also optional and contains any final
comments or processing instructions.
15
The Structure of an XML Document:
XP
Creating the Prolog
• The prolog consists of four parts in the following
order:
–
–
–
–
XML declaration
Miscellaneous statements or comments
Document type declaration
Miscellaneous statements or comments
• This order has to be followed or the parser will
generate an error message.
• None of these four parts is required, but it is good
form to include them.
16
Well-Formed and Valid XML
Documents
XP
• There are two categories of XML documents
– Well-formed
– Valid
• An XML document is well-formed if it contains no syntax
errors and fulfills all of the specifications for XML code as
defined by the W3C.
• An XML document is valid if it is well-formed and also
satisfies the rules laid out in the DTD (Document Type
Definition) or schema attached to the document.
17
Well-formedness Constraints
•
XP
Follow the syntax rules setup by W3C
–
XML 1.0 Specification (http://www.w3.org/TR/REC-xml)
• The first line must be a declaration
– <? .. ?> Prolog declaration
18
Well-formedness constraints
for data document
XP
•
A unique Root Element must contain all the
other elements
• Tags must use strict nesting
– Each element nests inside any enclosing
elements properly
– Tags can't overlap
19
Well-formedness Constraints
for data document
•
XP
Naming conventions:
1. Names consist of one or more nonspace
characters. If the name is a single character, it
must be a letter
2. Names can only begin with a letter or underscore.
Beyond the first character, any character can be
used, including Unicode characters
3. Element and attribute names are case-sensitive
(this one trips up a lot of people)
20
The Structure of an XML Document: The
XP
XML Declaration
• The XML declaration is always the first line of code in
an XML document. It tells the processor what follows is
written using XML. It can also provide any information
about how the parser should interpret the code.
• The complete syntax is:
<?xml version=“version number” encoding=“encoding type”
standalone=“yes | no” ?>
• A sample declaration might look like this:
<?xml version=“1.0” encoding=“UTF-8”
standalone=“yes” ?>
21
XML Declaration
XP
Processing Instructions
xml-stylesheet type=“text/xsl” href=“my.xsl”
22
Examples
XP
Well Formed
Not Well Formed
<videocollection>
<title>Tootsie</title>
<title>Jurassic Park</title>
<title>
<title>Tootsie</title>
<title>Jurassic Park</title>
<title>
Mission Impossible
</title>
</videocollection>
Mission Impossible
</title>
Why not?
23
More examples
Well Formed
<videocollection>
<title>Tootsie</title>
<title>Jurassic Park</title>
<title>
Mission Impossible
</title>
<title>Tootsie</title>
</videocollection>
XP
Not Well Formed
<videocollection>
<title>
Tootsie
</videocollection>
</title>
Why not?
24
More examples
Well Formed
<crew>
Sydney Pollak
</crew>
XP
Not Well Formed
<CREW>
Sydney Pollak
</crew>
<crew>
Sydney Pollak
</Crew>
Why not?
25
Well-formedness Constraints
for data document
XP
• Start tag with a mapping end tag
– Also allow EmptyElement
– e.g., <EmptyElement attribute=“ ” />
– <Cover_image Source=“leaves.gif” />
• Attributes are only allowed a single value
enclosed in quotes, in start tag
– e.g., <Weight unit = ‘lb’>165</Weight>
26
More examples
Well Formed
<title id="1">
Tootsie
</title>
XP
Not Well Formed
<title id="1>
Tootsie
</title>
<title id=1>
Tootsie
</title>
Why not?
27
The Structure of an XML Document:
XP
Inserting Comments
• Comments or miscellaneous statements go after
the declaration. Comments may appear anywhere
after the declaration.
• The syntax for comments is:
<!- - comment text - ->
• This is the same syntax for HTML comments
28
Elements and Attributes
XP
• Elements are the basic building blocks of XML
files.
• XML supports two types of elements:
– Closed elements, and
– empty elements
• A closed element, has the following syntax:
<element_name>Content</element_name>
• Example:
<Artist>Miles Davis</Artist>
29
Elements and Attributes
XP
• Element names are case sensitive
• Elements can be nested, as follows:
<CD>Kind of Blue
<TRACK>So What (:22)</TRACK>
<TRACK>Blue in Green (5:37)</TRACK>
</CD>
• Nested elements are called child elements.
• Elements must be nested correctly. Child elements
must be enclosed within their parent elements.
30
Elements and Attributes
XP
• All elements must be nested within a single
document or root element. There can be only one
root element.
• An open or empty element is an element that
contains no content. They can be used to mark
sections of the document for the XML parser.
• An attribute is a feature or characteristic of an
element. Attributes are text strings and must be
placed in single or double quotes. The syntax is:
<element_name attribute=“value”> … </element_name>
31
Elements and Attributes: Adding elements to
XP
the Jazz.XML File
This figure shows the revised document
prolog
document
elements
32
XML Editors
XP
This figure shows available XML editors
33
Your turn …
•
•
•
•
XP
Type in the code using Notepad
Click File then “Save As”
Change “Save as type“ from “*.txt” to “All Files”
Type in Jazz-1.xml on the “File name” prompt
See next slide
34
Use of Notepad as xml editor
XP
35
XP
36
Character References
XP
• Special characters, such as the symbol for the
British pound, can be inserted into your XML
document by using a character reference. The
syntax is:
&#character;
• Character is a entity reference number or name
from the ISO/IEC character set.
• Character references in XML are the same as in
HTML.
37
Character References
XP
This figure shows commonly used character reference numbers
38
Character References
XP
This figure shows the revised Jazz.XML file
character
reference
39
Your turn …
XP
• Modify Jazz-1.xml by adding Character
References
• Save the new version as Jazz-2.xml
40
XP
41
Well-formedness constraints: Character Data
XP
Sections - CDATA
• A CDATA section is a large block of text the XML
processor will interpret only as text.
• Used to include characters that could be confused
with markup delimiters, such as “<“ or “>”
• to be displayed as they are within the inner [ ]
• The syntax to create a CDATA section is:
<! [CDATA [
Text Block
] ]>
42
CDATA Sections: An Example
XP
43
CDATA Sections
XP
• In this example, a CDATA section stores several
HTML tags within an element named
HTMLCODE:
<HTMLCODE>
<![CDATA[
<h1>The Jazz Warehouse</h1>
<h2>Your Online Store for Jazz Music</h2>
] ]>
</HTMLCODE>
44
CDATA Sections
XP
This figure shows the revised Jazz.xml file
CDATA section
45
Your turn …
XP
• Modify Jazz-2.xml by creating CDATA section
• Save the version as Jazz-3.xml
46
XP
47
Displaying an XML Document in a Web
Browser
XP
• XML documents can be opened in Internet
Explorer or in Netscape Navigator.
• If there are no syntax errors. IE will display the
document’s contents in an expandable/collapsible
outline format including all markup tags.
• Netscape will display the contents but neither the
tags nor the nested elements.
48
Displaying an XML Document in a Web
Browser
XP
• To display the Jazz.xml file in a Web
browser:
1. Start the browser and open the Jazz.xml file
located in the Tutorial.01/Tutorial folder of
your Data Disk.
2. Click the minus (-) symbols.
3. Click the resulting plus (+) symbols.
49
Displaying an XML Document in a Web
Browser
XP
This figure shows the revised Jazz.XML file as seen in Internet
Explorer 6.0 and Netscape 6.2
Netscape 7.2 can
display the same
result as I.E.
50
Jazz3.xml
XP
51
XP
Jazz3.xml
52
Homework
XP
• Using notepad create the following three files
stored in the folder of \XML and save them in the
floppy drive (or other source that you can make it
work and show me)
• Test each file and make sure each demonstrate the
required features
• 1. Jazz-1.xml
• 2. Jazz-2.xml
• 3. Jazz-3.xml
53
Linking to a Style Sheet
XP
• The easiest way to turn an XML document into a
formatted document is to link the document to a
style sheet.
• The XML document and the style sheet are
combined by the XML processor to display a
single formatted document.
54
Linking to a Style Sheet
XP
There are two main style sheet languages used with
XML:
– Cascading Style Sheets (CSS) and Extensible Style
Sheets (XSL)
• CSS is supported by most browsers and is
relatively easy to learn and use.
• XSL is more powerful, but not as easy to use as
CSS.
55
Linking to a Style Sheet
XP
• There are some important benefits to using style
sheets:
– By separating content from format, you can concentrate
on the appearance of the document
– Different style sheets can be applied to the same XML
document
– Any style sheet changes will be automatically reflected
in any Web page based upon the style sheet
56
Applying a Style to an Element
XP
• To apply a style sheet to a document, use the
following syntax:
selector {attribute1:value1; attribute2:value2; …}
• selector is an element (or set of elements) from the
XML document.
• attribute and value are the style attributes and
attribute values to be applied to the document.
57
Applying a Style to an Element
XP
• For example:
ARTIST {color:red; font-weight:bold}
• will display the text of the ARTIST element in a
red boldface type.
58
Creating Processing Instructions
XP
• The link from the XML document to a style sheet is
created using a processing statement.
• A processing instruction is a command that gives
instructions to the XML parser.
• For example:
<?xml-stylesheet type=“style” href=“sheet” ?>
• Style is the type of style sheet to access and sheet is
the name and location of the style sheet.
59
Linking to the JW.css Style Sheet
XP
This figure shows how to link the JW.css style sheet to the Jazz.xml file
processing instruction to
access the JW.css style sheet
60
The JW.css Style Sheet
XP
This figure shows the cascading style sheet stored in the JW.css file
61
Final Turn …
XP
Part I.
• Modify Jazz-3.xml by creating Processing
Instruction
• Add style sheet
<?xml-stylesheet type="text/css" href="JW.css" ?>
• Save the final version as Jazz-4.xml
Part II.
• Create the JW.css style sheet (How?)
• Part III.
• Open the Jazz-4.xml from the Web Browser
62
XP
63
The Jazz.xml Document Formatted with the
XP
JW.css Style Sheet
This figure shows
the formatted
Jazz.XML file
64
Another Run
XP
• Jazz-4A.xml
<?xml-stylesheet type="text/css" href="JW-A.css" ?>
• JW-A.css
ARTIST {display:block; font-size: 12pt; color: gold; fontstyle:italic;
font-family:Times New Roman, Serif;
margin-left: 12pt}
TRACK {display:list-item; font-size: 12pt; color:black; list-styletype: disc;
font-family: Aria;
margin-left: 25pt}
65
Last Run
XP
• Jazz-4B.xml
<?xml-stylesheet type="text/css" href="JW-B.css" ?>
• JW-B.css
66
Summary
XP
• Separate content from presentation style.
• Data communication between applications.
Style 1
XML
FILE
Style 2
…
Style N
Presentation 1
Presentation 1
…
Presentation N
67
Summary
XP
• To be well formed
– the first requirement for a XML document
• Conditions
–
–
–
–
–
Only one root element
Elements are well nested and mapped
Naming conventions
Use of Character Data Section
Use of Comments
68
Download