learnxml2 - Radford University

advertisement
Chapter 2 - Markup and Core Concepts
Learning XML
by
Erik T. Ray
Slides were developed by
Jack Davis
College of Information Science
and Technology
Radford University
August 2006
1
XML Syntax
• “Syntax” refers to the rules of a language
• Syntax is needed with any language so that
the documents created with that language are
consistent
• Programs that process documents expect the
syntax rules to be followed, otherwise the
document may not be interpreted correctly
August 2006
2
Components of an XML Document
• XML Declaration
• Elements
• Attributes
• Entities
• Comments
August 2006
3
Components: The XML Declaration
• The XML Declaration:
– Tells the processing program that the document is an
XML document, along with other optional information
– The declaration is always the first line of an XML
document
– Attributes that can be used in the Declaration:
• version
• encoding
• standalone
– Example:
<?xml version=“1.0”? Encoding=“UTF-8” standalone=“yes”?>
August 2006
4
Document Type Declaration
• Document type declarations are used to define
entities or default attribute values. Secondly,
they are used to support validation, a special
mode of parsing that checks grammar and
vocabulary of markup. A validating parser
needs to read a list of declarations for element
rules before it can begin to parse. In both
cases, this is done in document type
declaration section.
• A document type declaration consists of:
- delimeter
<!DOCTYPE
- element name identifies the type element
- dtd id
local path or url
- entity decl
optional list of entity declara.
• dtd identifier supports two methods of
identification: system-specific and public
<!DOCTYPE doc SYSTEM "/usr/simple.dtd">
<!DOCTYPE html PUBLIC "-//w3c//DTD HTML
3.2//EN" "http://www.w3.org/TR … >
August 2006
5
XML Syntax
• “Syntax” refers to the rules of a language
• Syntax is needed with any language so that
the documents created with that language are
consistent
• Programs that process documents expect the
syntax rules to be followed, otherwise the
document may not be interpreted correctly
August 2006
6
Components: XML Elements
•
•
Elements:
–
Used to describe the data. Consist of:
• A start tag
• Content
• An end tag
–
Example:
<element>Content</element>
–
The “root” element of a document is the outermost element,
and contains all of the other elements in the document. There
can be only one root element in a single document
An element that does not contain any content is
known as an “empty element”
August 2006
7
Element Nesting
•
The term “nesting” refers to the process of
containing elements within other elements
•
Terminology:
– Child elements – elements that are contained
within other elements
– Parent elements – elements that contain other
elements
– Sibling elements – elements that share the
same parent element
August 2006
8
Nesting Example
1
2
3
4
5
6
7
8
9
<family_tree>
<mother>Sally</mother>
<father>Joe</father>
<children>
<child>Larry</child>
<child>Curly</child>
<child>Mo</child>
</children>
</family_tree>
August 2006
9
Components: XML Attributes
• Attributes help to describe XML elements
• Attributes are always contained in the start tag
of the element they are describing
• Attributes are known as “name-value pairs”
• Example:
address=“123 Main Street”
August 2006
10
Components: XML Entities
•
Two types of entities:
– General – placeholders for information contained in
the XML document
– Parameter – used within a DTD to reference a
grouping of elements
•
Three types of general entities:
– Character – used in place of special characters
– Content – used for blocks of frequently used text
– Unparsed – used for binary or non-text data, like
image files
August 2006
11
Examples of Entities
•
Character entity:
–
–
–
•
Character: >
Entity reference: > or >
Usage: <formula> x > y </formula>
Content entity:
–
Declaration:
<!ENTITY address “123 Main St”>
–
•
Usage:
<ship_address> &address; <ship_address>
Unparsed entity:
–
Declaration:
<!ENTITY image SYSTEM “sunset.gif” NDATA GIF>
–
Usage:
<picture> &aimage; </picture>
August 2006
12
Components: Comments
• An XML comment is ignored by applications
that process XML
• Comments are commonly used for
documentation, or to add information for
others viewing the document
• The content of the comment is surrounded by
special comment tags: <!– and -->
• Example:
<!--
August 2006
This is a comment -->
13
Well-Formed XML Documents
•
A “well-formed” document is one which adheres to
the syntax rules for XML:
–
An XML document contains one root element
–
All elements must have start and end tags, except for empty
elements
–
Elements must be properly nested
–
All attributes must have a value
–
Attributes can only appear in the start tag and must be unique
to that element
–
Element names are case-sensitive
–
Special characters must be written as entities
–
Names of element can start only with letters or an
underscore, and can contain letters, numbers, hyphens,
periods and underscores
August 2006
14
XML Parsers
• A “parser” is a program that checks the
syntax of an XML document to ensure that the
document is well-formed
• Two types of parsers:
– Non-validating – only checks for syntax
– Validating – checks syntax and verifies the document
against a DTD or Schema
August 2006
15
Download