XML Overview

advertisement
XML Extensible
Markup Language
Chapter 17 - 1
Randy Connolly and Ricardo Hoar
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Textbook to be published by Pearson ©
Ed2015
in early
Pearson
2014
Fundamentals of Web
Development
http://www.funwebdev.com
Section 1 of 7
XML OVERVIEW
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Overview
Introduction
XML is a text-based markup language, but unlike HTML, XML can
be used to mark up any type of data.
Derived from Standard Generalized Markup Language SGML
One of the key benefits of XML data is that as plain text, it can be
read and transferred between applications and different
operating systems as well as being human-readable and
understandable as well.
XML is not only used on the web server and to communicate
asynchronously with the browser, but is also used as a data
interchange format for moving information between systems
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Overview
XML in the web context - Used in many systems
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Well Formed XML
Syntax Rules
For a document to be well-formed XML, it must follow the syntax rules for XML:
•
Element names are composed of any of the valid characters (most
punctuation symbols and spaces are not allowed) in XML.
•
Element names can’t start with a number.
•
There must be a single-root element. A root element is one that contains all
the other elements; for instance, in an HTML document, the root element is
<html>.
•
All elements must have a closing element (or be self-closing).
•
Elements must be properly nested.
•
Elements can contain attributes.
•
Attribute values must always be within quotes.
•
Element and attribute names are case sensitive.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Well Formed XML
Sample Document
XML declaration is analogous to
HTML DOCTYPE
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Valid XML
Requires a DTD
• A valid XML document is one that is well formed and whose
element and content conform to the rules of either its
document type definition (DTD) or its schema.
• A DTD tells the XML parser which elements and attributes to
expect in the document as well as the order and nesting of
those elements.
• A DTD can be defined within an XML document or within an
external file.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML parser
Meaning
• Verifies that an XML document is well formed.
• Checks xml document for syntax errors
• Converts XML document into some type of internal
memory structure
• All contemporary browsers have built-in parsers as do
most web development environments such as PHP and
ASP.NET
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Data Type Definition
Example
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Data Type Definition
Example
The main drawback with DTDs is that they can only
validate the existence and ordering of elements. They
provide no way to validate the values of attributes or
the textual content of elements.
For this type of validation, one must instead use XML
schemas, which have the added advantage of using
XML syntax. Unfortunately, schemas have the
corresponding disadvantage of being long-winded and
harder for humans to read and comprehend; for this
reason, they are typically created with tools.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Schema
Just one example
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XSLT
XML Stylesheet Transformations
XSLT is an XML-based
programming language
that is used for
transforming XML into
other document
formats
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XSLT
Another usage
XSLT is also used on the server side and within JavaScript
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XSLT
Example XSLT document that converts the XML from Listing 17.1 into an HTML list
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XSLT
An XML parser is still needed to perform the actual transformation
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XPath
Another XML Technology
XPath is a standardized syntax for searching an XML
document and for navigating to elements within the
XML document
XPath is typically used as part of the programmatic
manipulation of an XML document in PHP and other
languages
XPath uses a syntax that is similar to the one used in
most operating systems to access directories.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XPath
Learn through example
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML
Tutorial
http://www.tutorialspoint.com/xml/index.htm
Randy Connolly and Ricardo Hoar
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Textbook to be published by Pearson ©
Ed2015
in early
Pearson
2014
Fundamentals of Web
Development
http://www.funwebdev.com
XML Basics
Before proceeding with this tutorial you should have basic knowledge of HTML and Javascript.
XML Tutorial http://www.tutorialspoint.com/cgi-bin/printpage.cgi
XML tags identify the data and are used to store and organize the data, rather
than specifying how to display it like HTML tags
XML Characteristics
• XML is extensible: XML allows you to create your own self-descriptive tags, or
language, that suits your application.
• XML carries the data, does not present it: XML allows you to store the data
irrespective of how it will be presented.
• XML is a public standard: XML was developed by an organization called the
World Wide Web Consortium (W3C) and is available as an open standard.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Usage
list of XML usage
• XML can work behind the scene to simplify the creation of HTML for
large web sites.
• XML can be used to exchange the information between organizations
and systems.
• XML can be used for offloading and reloading of databases.
• XML can be used to store and arrange the data, which can customize
your data handling needs.
• XML can easily be merged with style sheets to create almost any
desired output.
• Virtually, any type of data can be expressed as an XML document.
• Not a programming language it does not perform any computation or
algorithms.
• It is usually stored in a simple text file and is processed by special
software that is capable of interpreting XML.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Syntax
Syntax Rules
<?xml version="1.0" encoding="UTF-8"?>
<message>
<text>Hello, world!</text>
</message>
<?xml version="1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
Two kinds of information in the above example:
The markup, like <contact-info> and
The text, or the character data, Tutorials Point and (040) 123-4567.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Declaration
Syntax Rules for Tags and Elements
• The XML declaration is case sensitive and must begin with
"<?xml>" where "xml" is written in lower-case.
• If document contains XML declaration, then it strictly
needs to be the first statement of the XML document.
• An HTTP protocol can override the value of encoding that
you put in the XML declaration.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Syntax Rules for Tags and Elements
Elements
XML-elements or XML-nodes or XML tags.
XML-elements names are enclosed by triangular brackets < >
<element> .. </element>
or in simple-cases <element />
Nesting of elements: can contain multiple XML-elements as
its children, but the children elements must not overlap
Which is correct ?
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Syntax Rules for Tags and Elements
Root, Attributes, XML References: Entity and Character
Which is correct ?
Only one root element
Case sensitive
<contact-info> ≠ <Contact-Info>
Element attribute (single property); one or more attributes. For example:
<a href="http://www.tutorialspoint.com/">Tutorialspoint!</a>
<a b="x" c="y" b="z">....</a> correct?
Attribute names are defined without quotation marks, whereas attribute values
must always appear in quotation marks
<a b=x>....</a> correct?
XML References begin with the "&”and ends with the symbol ";”
Entity References: &
Character References contains a hash mark (“#”) followed by a number. The
number always refers to the Unicode code of a character. In this case, 65 refers to
alphabet "A".
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Text
Encoding
• XML-elements
attributes
names are case-sensitive, start
and end name need to be
written in the same case.
• To avoid character encoding
problems, all XML files should
be saved as Unicode UTF-8 or
UTF-16 files.
• Whitespace characters like
blanks, tabs and line-breaks
between XML-elements and
between the XML-attributes
will be ignored.
• Some characters are reserved
by the XML syntax.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Documents
Document Prolog and Elements
Document prolog comes at the top of the document, before the root element. It
contains:
XML declaration
Document type declaration
Document Elements are the building blocks of XML, a hierarchy of sections, each
serving a specific purpose. The elements can be containers, with a combination
of text and other elements.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Declaration
Syntax
XML declaration contains details that prepare an XML processor to parse the XML
document. It is optional, but when it is used, it must appear in first line of the
XML document.
Syntax
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Declaration
Rules
•
•
•
•
•
•
•
•
•
•
•
If present in the XML, it must be placed as the first line in the XML document.
If included, it must contain version number attribute.
The Parameter names and values are case-sensitive.
The names are always in lower case.
The order of placing the parameters is important. The correct order is:
version, encoding and standalone.
Either single or double quotes may be used.
The XML declaration has no closing tag i.e. </?xml>
<?xml >
XML declaration with version definition: <?xml version="1.0">
XML declaration with all parameters defined:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
XML declaration with all parameters defined in single quotes:
<?xml version='1.0' encoding='iso-8859-1' standalone='no' ?>
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Tags
Definition and Rules
Start Tag <address>
End Tag </address>
Empty Tag < hr> </hr> ot <hr /> may be used for any element which has no
content
Rules
1. XML tags are case-sensitive
<address>This is wrong syntax</Address> correct?
2. XML tags must be closed in an appropriate order
<outer_element>
<internal_element>
This tag is closed before the outer_element
</internal_element>
</outer_element>
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Elements
• XML elements can be defined as building blocks of an XML. Elements can
behave containers to hold text, elements, attributes, media objects or all of
these.
• Each XML document contains one or more elements, the scope of which are
delimited by start and end tags, or for empty elements, by an empty-element
tag, separated by white spaces
• It associates a name with a value, which is a string of characters. An attribute
is written as: name = "value” double(" ") or single(' ') quotes
• Empty Element (no content) <name attribute1 attribute2.../>
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Elements Rules
• An element name can contain any alphanumeric characters. The only
punctuation marks allowed in names are the hyphen (-), under-score (_) and
period (.).
• Names are case sensitive. For example, Address, address, and ADDRESS are
different names.
• Start and end tags of an element must be identical.
• An element, which is a container, can contain text or elements as seen in the
above example.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Attributes
Attributes are part of the XML elements, Syntax
• An element can have multiple unique attributes or properties is always a
name-value pair.
• An XML attribute has following syntax
<element-name attribute1 attribute2 >
....content..
< /element-name>
where attribute1 and attribute2 has the
following form: name = "value”
• Attributes are used to add a unique label to an element, place the label in a
category, add a Boolean flag, or otherwise associate it with some string of
data.
Two categories of plants, one flowers
and other color. Hence we have two
plant elements with different attributes.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Attribute Types
Attribute Type
Description
StringType
It takes any literal string as a value. CDATA is a StringType. CDATA is character
data. This means, any string of non-markup characters is a legal part of the
attribute.
TokenizedType
This is more constrained type. The validity constraints noted in the grammar are
applied after the attribute value is normalized. The TokenizedType attributes are
given as:
ID: It is used to specify the element as unique.
IDREF: It is used to reference an ID that has been named for another element.
IDREFS: It is used to reference all IDs of an element.
ENTITY: It indicates that the attribute will represent an external entity in the
document.
ENTITIES: It indicates that the attribute will represent external entities in the
document.
NMTOKEN: It is similar to CDATA with restrictions on what data can be part of the
attribute.
NMTOKENS: It is similar to CDATA with restrictions on what data can be part of
the attribute.
numeratedType
This has a list of predefined values in its declaration. out of which, it must assign
one value. There are two types of enumerated attribute:
NotationType: declares that an element will be referenced to a NOTATION
declared somewhere else in the XML document.
Enumeration: defines a specific list of values that the
attribute value must match
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Element Attribute Rules
• An attribute name must not appear more than once in the
same start-tag or empty-element tag.
• An attribute must be declared in the Document Type
Definition (DTD) using an Attribute-List Declaration.
• Attribute values must not contain direct or indirect entity
references to external entities.
• The replacement text of any entity referred to directly or
indirectly in an attribute value must not contain either less
than sign <
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Comments
<!-------Your comment-----> Any text between <! - - And - - >
<?xml version="1.0" encoding="UTF-8" ?>
<!---Students grades are uploaded by months---->
<class_list>
<student>
<name>Tanmay</name>
<grade>A</grade>
</student>
</class_list>
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Character Entities
W3C: “The document entity serves as the root of the entity tree and a startingpoint for an XML processor.”
declared in the document prolog or in a DTD
Types of Character Entities
There are three types of character entities:
1. Predefined Character Entities: Ampersand: & Single quote: '
Greater than: > Less than: < Double quote: "
2. Numbered Character Entities: &# decimal number; #x Hexadecimal number;
3. Named Character Entities 'Acute’ 'ugrave’
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML CDATA Sections
Character Data CDATA
Defined as blocks of text that are not parsed by the parser, but are otherwise
recognized as markup.
The predefined entities such as <, >, and & require typing and are
generally difficult to read in the markup. In such cases, CDATA section can be
used
The above syntax is composed of three sections:
1. CDATA Start section - CDATA begins with the nine-character delimiter
<![CDATA[
2. CDATA End section - CDATA section ends with ]]> delimiter.
3. CData section - Characters between these two enclosures are interpreted as
characters, and not as markup. This section may contain markup characters
(<, >, and &), but they are ignored by the XML processor
ignored by the parser treated as character data and not as markup.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
CDATA Rules
spaces, tabs, and newlines
• CDATA cannot contain the string "]]>" anywhere in the
XML document.
• Nesting is not allowed in CDATA section.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
XML Whitespaces
<name>TanmayPatil</name> different? <name>Tanmay Patil</name>
<address.category="residence”> different? <address category=” residence">
A special attribute named xml:space may be attached to an element. This
indicates that whitespace should not be removed for that element by the
application. You can set this attribute to default or preserve as shown in the
example below:
<!ATTLIST address xml:space (default|preserve) 'preserve’>
Where:
• The value default signals that the default whitespace processing modes of an
application are acceptable for this element;
• The value preserve indicates the application to preserve all the whitespaces.
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
xml
XML Quick Guide pdf
XML Processing is not included page 34 stop
Randy Connolly and Ricardo Hoar
Fundamentals of Web Development
Download