XML Extensible Markup Language Chapter 17 - 1 Randy Connolly and Ricardo Hoar Randy Connolly and Ricardo Hoar Fundamentals of Web Development Textbook to be published by Pearson © Ed2015 in early Pearson 2014 Fundamentals of Web Development http://www.funwebdev.com Section 1 of 7 XML OVERVIEW Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Overview Introduction XML is a text-based markup language, but unlike HTML, XML can be used to mark up any type of data. Derived from Standard Generalized Markup Language SGML One of the key benefits of XML data is that as plain text, it can be read and transferred between applications and different operating systems as well as being human-readable and understandable as well. XML is not only used on the web server and to communicate asynchronously with the browser, but is also used as a data interchange format for moving information between systems Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Overview XML in the web context - Used in many systems Randy Connolly and Ricardo Hoar Fundamentals of Web Development Well Formed XML Syntax Rules For a document to be well-formed XML, it must follow the syntax rules for XML: • Element names are composed of any of the valid characters (most punctuation symbols and spaces are not allowed) in XML. • Element names can’t start with a number. • There must be a single-root element. A root element is one that contains all the other elements; for instance, in an HTML document, the root element is <html>. • All elements must have a closing element (or be self-closing). • Elements must be properly nested. • Elements can contain attributes. • Attribute values must always be within quotes. • Element and attribute names are case sensitive. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Well Formed XML Sample Document XML declaration is analogous to HTML DOCTYPE Randy Connolly and Ricardo Hoar Fundamentals of Web Development Valid XML Requires a DTD • A valid XML document is one that is well formed and whose element and content conform to the rules of either its document type definition (DTD) or its schema. • A DTD tells the XML parser which elements and attributes to expect in the document as well as the order and nesting of those elements. • A DTD can be defined within an XML document or within an external file. Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML parser Meaning • Verifies that an XML document is well formed. • Checks xml document for syntax errors • Converts XML document into some type of internal memory structure • All contemporary browsers have built-in parsers as do most web development environments such as PHP and ASP.NET Randy Connolly and Ricardo Hoar Fundamentals of Web Development Data Type Definition Example Randy Connolly and Ricardo Hoar Fundamentals of Web Development Data Type Definition Example The main drawback with DTDs is that they can only validate the existence and ordering of elements. They provide no way to validate the values of attributes or the textual content of elements. For this type of validation, one must instead use XML schemas, which have the added advantage of using XML syntax. Unfortunately, schemas have the corresponding disadvantage of being long-winded and harder for humans to read and comprehend; for this reason, they are typically created with tools. Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Schema Just one example Randy Connolly and Ricardo Hoar Fundamentals of Web Development XSLT XML Stylesheet Transformations XSLT is an XML-based programming language that is used for transforming XML into other document formats Randy Connolly and Ricardo Hoar Fundamentals of Web Development XSLT Another usage XSLT is also used on the server side and within JavaScript Randy Connolly and Ricardo Hoar Fundamentals of Web Development XSLT Example XSLT document that converts the XML from Listing 17.1 into an HTML list Randy Connolly and Ricardo Hoar Fundamentals of Web Development XSLT An XML parser is still needed to perform the actual transformation Randy Connolly and Ricardo Hoar Fundamentals of Web Development XPath Another XML Technology XPath is a standardized syntax for searching an XML document and for navigating to elements within the XML document XPath is typically used as part of the programmatic manipulation of an XML document in PHP and other languages XPath uses a syntax that is similar to the one used in most operating systems to access directories. Randy Connolly and Ricardo Hoar Fundamentals of Web Development XPath Learn through example Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Tutorial http://www.tutorialspoint.com/xml/index.htm Randy Connolly and Ricardo Hoar Randy Connolly and Ricardo Hoar Fundamentals of Web Development Textbook to be published by Pearson © Ed2015 in early Pearson 2014 Fundamentals of Web Development http://www.funwebdev.com XML Basics Before proceeding with this tutorial you should have basic knowledge of HTML and Javascript. XML Tutorial http://www.tutorialspoint.com/cgi-bin/printpage.cgi XML tags identify the data and are used to store and organize the data, rather than specifying how to display it like HTML tags XML Characteristics • XML is extensible: XML allows you to create your own self-descriptive tags, or language, that suits your application. • XML carries the data, does not present it: XML allows you to store the data irrespective of how it will be presented. • XML is a public standard: XML was developed by an organization called the World Wide Web Consortium (W3C) and is available as an open standard. Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Usage list of XML usage • XML can work behind the scene to simplify the creation of HTML for large web sites. • XML can be used to exchange the information between organizations and systems. • XML can be used for offloading and reloading of databases. • XML can be used to store and arrange the data, which can customize your data handling needs. • XML can easily be merged with style sheets to create almost any desired output. • Virtually, any type of data can be expressed as an XML document. • Not a programming language it does not perform any computation or algorithms. • It is usually stored in a simple text file and is processed by special software that is capable of interpreting XML. Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Syntax Syntax Rules <?xml version="1.0" encoding="UTF-8"?> <message> <text>Hello, world!</text> </message> <?xml version="1.0"?> <contact-info> <name>Tanmay Patil</name> <company>TutorialsPoint</company> <phone>(011) 123-4567</phone> </contact-info> Two kinds of information in the above example: The markup, like <contact-info> and The text, or the character data, Tutorials Point and (040) 123-4567. Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Declaration Syntax Rules for Tags and Elements • The XML declaration is case sensitive and must begin with "<?xml>" where "xml" is written in lower-case. • If document contains XML declaration, then it strictly needs to be the first statement of the XML document. • An HTTP protocol can override the value of encoding that you put in the XML declaration. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Syntax Rules for Tags and Elements Elements XML-elements or XML-nodes or XML tags. XML-elements names are enclosed by triangular brackets < > <element> .. </element> or in simple-cases <element /> Nesting of elements: can contain multiple XML-elements as its children, but the children elements must not overlap Which is correct ? Randy Connolly and Ricardo Hoar Fundamentals of Web Development Syntax Rules for Tags and Elements Root, Attributes, XML References: Entity and Character Which is correct ? Only one root element Case sensitive <contact-info> ≠ <Contact-Info> Element attribute (single property); one or more attributes. For example: <a href="http://www.tutorialspoint.com/">Tutorialspoint!</a> <a b="x" c="y" b="z">....</a> correct? Attribute names are defined without quotation marks, whereas attribute values must always appear in quotation marks <a b=x>....</a> correct? XML References begin with the "&”and ends with the symbol ";” Entity References: &amp; Character References contains a hash mark (“#”) followed by a number. The number always refers to the Unicode code of a character. In this case, 65 refers to alphabet "A". Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Text Encoding • XML-elements attributes names are case-sensitive, start and end name need to be written in the same case. • To avoid character encoding problems, all XML files should be saved as Unicode UTF-8 or UTF-16 files. • Whitespace characters like blanks, tabs and line-breaks between XML-elements and between the XML-attributes will be ignored. • Some characters are reserved by the XML syntax. Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Documents Document Prolog and Elements Document prolog comes at the top of the document, before the root element. It contains: XML declaration Document type declaration Document Elements are the building blocks of XML, a hierarchy of sections, each serving a specific purpose. The elements can be containers, with a combination of text and other elements. Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Declaration Syntax XML declaration contains details that prepare an XML processor to parse the XML document. It is optional, but when it is used, it must appear in first line of the XML document. Syntax Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Declaration Rules • • • • • • • • • • • If present in the XML, it must be placed as the first line in the XML document. If included, it must contain version number attribute. The Parameter names and values are case-sensitive. The names are always in lower case. The order of placing the parameters is important. The correct order is: version, encoding and standalone. Either single or double quotes may be used. The XML declaration has no closing tag i.e. </?xml> <?xml > XML declaration with version definition: <?xml version="1.0"> XML declaration with all parameters defined: <?xml version="1.0" encoding="UTF-8" standalone="no" ?> XML declaration with all parameters defined in single quotes: <?xml version='1.0' encoding='iso-8859-1' standalone='no' ?> Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Tags Definition and Rules Start Tag <address> End Tag </address> Empty Tag < hr> </hr> ot <hr /> may be used for any element which has no content Rules 1. XML tags are case-sensitive <address>This is wrong syntax</Address> correct? 2. XML tags must be closed in an appropriate order <outer_element> <internal_element> This tag is closed before the outer_element </internal_element> </outer_element> Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Elements • XML elements can be defined as building blocks of an XML. Elements can behave containers to hold text, elements, attributes, media objects or all of these. • Each XML document contains one or more elements, the scope of which are delimited by start and end tags, or for empty elements, by an empty-element tag, separated by white spaces • It associates a name with a value, which is a string of characters. An attribute is written as: name = "value” double(" ") or single(' ') quotes • Empty Element (no content) <name attribute1 attribute2.../> Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Elements Rules • An element name can contain any alphanumeric characters. The only punctuation marks allowed in names are the hyphen (-), under-score (_) and period (.). • Names are case sensitive. For example, Address, address, and ADDRESS are different names. • Start and end tags of an element must be identical. • An element, which is a container, can contain text or elements as seen in the above example. Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Attributes Attributes are part of the XML elements, Syntax • An element can have multiple unique attributes or properties is always a name-value pair. • An XML attribute has following syntax <element-name attribute1 attribute2 > ....content.. < /element-name> where attribute1 and attribute2 has the following form: name = "value” • Attributes are used to add a unique label to an element, place the label in a category, add a Boolean flag, or otherwise associate it with some string of data. Two categories of plants, one flowers and other color. Hence we have two plant elements with different attributes. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Attribute Types Attribute Type Description StringType It takes any literal string as a value. CDATA is a StringType. CDATA is character data. This means, any string of non-markup characters is a legal part of the attribute. TokenizedType This is more constrained type. The validity constraints noted in the grammar are applied after the attribute value is normalized. The TokenizedType attributes are given as: ID: It is used to specify the element as unique. IDREF: It is used to reference an ID that has been named for another element. IDREFS: It is used to reference all IDs of an element. ENTITY: It indicates that the attribute will represent an external entity in the document. ENTITIES: It indicates that the attribute will represent external entities in the document. NMTOKEN: It is similar to CDATA with restrictions on what data can be part of the attribute. NMTOKENS: It is similar to CDATA with restrictions on what data can be part of the attribute. numeratedType This has a list of predefined values in its declaration. out of which, it must assign one value. There are two types of enumerated attribute: NotationType: declares that an element will be referenced to a NOTATION declared somewhere else in the XML document. Enumeration: defines a specific list of values that the attribute value must match Randy Connolly and Ricardo Hoar Fundamentals of Web Development Element Attribute Rules • An attribute name must not appear more than once in the same start-tag or empty-element tag. • An attribute must be declared in the Document Type Definition (DTD) using an Attribute-List Declaration. • Attribute values must not contain direct or indirect entity references to external entities. • The replacement text of any entity referred to directly or indirectly in an attribute value must not contain either less than sign < Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Comments <!-------Your comment-----> Any text between <! - - And - - > <?xml version="1.0" encoding="UTF-8" ?> <!---Students grades are uploaded by months----> <class_list> <student> <name>Tanmay</name> <grade>A</grade> </student> </class_list> Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Character Entities W3C: “The document entity serves as the root of the entity tree and a startingpoint for an XML processor.” declared in the document prolog or in a DTD Types of Character Entities There are three types of character entities: 1. Predefined Character Entities: Ampersand: &amp; Single quote: &apos; Greater than: &gt; Less than: &lt; Double quote: &quot; 2. Numbered Character Entities: &# decimal number; #x Hexadecimal number; 3. Named Character Entities 'Acute’ 'ugrave’ Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML CDATA Sections Character Data CDATA Defined as blocks of text that are not parsed by the parser, but are otherwise recognized as markup. The predefined entities such as &lt;, &gt;, and &amp; require typing and are generally difficult to read in the markup. In such cases, CDATA section can be used The above syntax is composed of three sections: 1. CDATA Start section - CDATA begins with the nine-character delimiter <![CDATA[ 2. CDATA End section - CDATA section ends with ]]> delimiter. 3. CData section - Characters between these two enclosures are interpreted as characters, and not as markup. This section may contain markup characters (<, >, and &), but they are ignored by the XML processor ignored by the parser treated as character data and not as markup. Randy Connolly and Ricardo Hoar Fundamentals of Web Development CDATA Rules spaces, tabs, and newlines • CDATA cannot contain the string "]]>" anywhere in the XML document. • Nesting is not allowed in CDATA section. Randy Connolly and Ricardo Hoar Fundamentals of Web Development XML Whitespaces <name>TanmayPatil</name> different? <name>Tanmay Patil</name> <address.category="residence”> different? <address category=” residence"> A special attribute named xml:space may be attached to an element. This indicates that whitespace should not be removed for that element by the application. You can set this attribute to default or preserve as shown in the example below: <!ATTLIST address xml:space (default|preserve) 'preserve’> Where: • The value default signals that the default whitespace processing modes of an application are acceptable for this element; • The value preserve indicates the application to preserve all the whitespaces. Randy Connolly and Ricardo Hoar Fundamentals of Web Development xml XML Quick Guide pdf XML Processing is not included page 34 stop Randy Connolly and Ricardo Hoar Fundamentals of Web Development