Chapter 2 - Markup and Core Concepts Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology Radford University August 2006 1 XML Syntax • “Syntax” refers to the rules of a language • Syntax is needed with any language so that the documents created with that language are consistent • Programs that process documents expect the syntax rules to be followed, otherwise the document may not be interpreted correctly August 2006 2 Components of an XML Document • XML Declaration • Elements • Attributes • Entities • Comments August 2006 3 Components: The XML Declaration • The XML Declaration: – Tells the processing program that the document is an XML document, along with other optional information – The declaration is always the first line of an XML document – Attributes that can be used in the Declaration: • version • encoding • standalone – Example: <?xml version=“1.0”? Encoding=“UTF-8” standalone=“yes”?> August 2006 4 Document Type Declaration • Document type declarations are used to define entities or default attribute values. Secondly, they are used to support validation, a special mode of parsing that checks grammar and vocabulary of markup. A validating parser needs to read a list of declarations for element rules before it can begin to parse. In both cases, this is done in document type declaration section. • A document type declaration consists of: - delimeter <!DOCTYPE - element name identifies the type element - dtd id local path or url - entity decl optional list of entity declara. • dtd identifier supports two methods of identification: system-specific and public <!DOCTYPE doc SYSTEM "/usr/simple.dtd"> <!DOCTYPE html PUBLIC "-//w3c//DTD HTML 3.2//EN" "http://www.w3.org/TR … > August 2006 5 XML Syntax • “Syntax” refers to the rules of a language • Syntax is needed with any language so that the documents created with that language are consistent • Programs that process documents expect the syntax rules to be followed, otherwise the document may not be interpreted correctly August 2006 6 Components: XML Elements • • Elements: – Used to describe the data. Consist of: • A start tag • Content • An end tag – Example: <element>Content</element> – The “root” element of a document is the outermost element, and contains all of the other elements in the document. There can be only one root element in a single document An element that does not contain any content is known as an “empty element” August 2006 7 Element Nesting • The term “nesting” refers to the process of containing elements within other elements • Terminology: – Child elements – elements that are contained within other elements – Parent elements – elements that contain other elements – Sibling elements – elements that share the same parent element August 2006 8 Nesting Example 1 2 3 4 5 6 7 8 9 <family_tree> <mother>Sally</mother> <father>Joe</father> <children> <child>Larry</child> <child>Curly</child> <child>Mo</child> </children> </family_tree> August 2006 9 Components: XML Attributes • Attributes help to describe XML elements • Attributes are always contained in the start tag of the element they are describing • Attributes are known as “name-value pairs” • Example: address=“123 Main Street” August 2006 10 Components: XML Entities • Two types of entities: – General – placeholders for information contained in the XML document – Parameter – used within a DTD to reference a grouping of elements • Three types of general entities: – Character – used in place of special characters – Content – used for blocks of frequently used text – Unparsed – used for binary or non-text data, like image files August 2006 11 Examples of Entities • Character entity: – – – • Character: > Entity reference: &gt; or &#62; Usage: <formula> x &gt; y </formula> Content entity: – Declaration: <!ENTITY address “123 Main St”> – • Usage: <ship_address> &address; <ship_address> Unparsed entity: – Declaration: <!ENTITY image SYSTEM “sunset.gif” NDATA GIF> – Usage: <picture> &aimage; </picture> August 2006 12 Components: Comments • An XML comment is ignored by applications that process XML • Comments are commonly used for documentation, or to add information for others viewing the document • The content of the comment is surrounded by special comment tags: <!– and --> • Example: <!-- August 2006 This is a comment --> 13 Well-Formed XML Documents • A “well-formed” document is one which adheres to the syntax rules for XML: – An XML document contains one root element – All elements must have start and end tags, except for empty elements – Elements must be properly nested – All attributes must have a value – Attributes can only appear in the start tag and must be unique to that element – Element names are case-sensitive – Special characters must be written as entities – Names of element can start only with letters or an underscore, and can contain letters, numbers, hyphens, periods and underscores August 2006 14 XML Parsers • A “parser” is a program that checks the syntax of an XML document to ensure that the document is well-formed • Two types of parsers: – Non-validating – only checks for syntax – Validating – checks syntax and verifies the document against a DTD or Schema August 2006 15