Chapter 1 Introduction to HTML and XHTML

advertisement
XHTML & CSS I
1
Chapter 1
Introduction to HTML and XHTML
Presented by Thomas Powell
Slides adopted from
HTML & XHTML: The Complete Reference, 4th Edition
©2003 Thomas A. Powell
XHTML & CSS I
Markup and (X)HTML
• What is a markup language?
• Essay markup
• Special symbols that indicate what to do or how to present
• Older word processing and typesetting (WordStar, .troff, etc.)
• HTML/XHTML are the not-so-behind the scenes markup
languages that are used to tell Web browsers (“user agents”)
hot to structure and, some may say, display Web pages.
– HTML – Hypertext Markup Language
– XHTML – Extensible Hypertext Markup Language
• Simple difference, XHTML is stricter version of HTML based
upon the rules of XML-we’ll see more later.
1
XHTML & CSS I
Markup and (X)HTML Contd.
1
XHTML & CSS I
Markup Quickstart
• HTML document is a structured text document composed of
elements, entities and text fragments
<b>This is important text! © 2002</b>
• Markup elements are made up of a start tag (e.g. <strong>) and
might include an end tag that contains a closing slash character
(e.g. </strong>).
• The browser applies the meaning of the element to the enclosed
content.
• Under traditional HTML some elements are empty—they enclose no
content and thus they have no close tag (e.g. <hr>). In XHTML all
tags close so we use <hr></hr> or more appropriately <hr />.
1
XHTML & CSS I
Markup Quickstart Contd.
• The start tag of an HTML element may contain attributes that modify
the meaning of the tag.
• In traditional HTML some attributes effected a tag simply by their
existence <hr noshade>
• Under XHTML attribute values are always required so <hr
noshade=“noshade” /> would be the correct XHTML syntax.
• However even under traditional HTML most attributes have a value
<p align=“center”>
• Attribute values should always be quoted with either single or
double quotes.
– Style wise double quotes tends to be more common
– Traditional HTML allowed quotes to be removed on ordinal
attribute values so <p align=center> was allowed.
1
XHTML & CSS I
Markup Quickstart Contd.
1
XHTML & CSS I
HTML 4 Transitional Full Example
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>First HTML Example</title>
</head>
<body>
<h1>Welcome to the World of HTML</h1>
<hr>
<p>HTML <b>really</b> isn't so hard!</p>
<p>You can put in lots of text if you want to. In
fact, you could keep on typing and make up more
sentences and continue on and on.</p>
</body>
</html>
1
XHTML & CSS I
XHTML 1.0 Transitional Full Example
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
lang="en">
<head>
<title>First XHTML Example</title>
</head>
<body>
<h1>Welcome to the World of XHTML</h1>
<hr />
<p>XHTML <b>really</b> isn't so hard!</p>
<p>You can put in lots of text if you want to. In
fact, you could keep on typing and make up more
sentences and continue on and on.</p>
</body>
</html>
1
XHTML & CSS I
Example Overview
•
The preceding example uses some of the most common elements found in (X)HTML
documents:
– The <!DOCTYPE> statement indicates the particular version of HTML or XHTML
being used in the document. In the first example, the transitional 4.01
specification was used, while in the second the transitional XHTML 1.0
specification was employed.
– The <html>, <head>, and <body> tag pairs are used to specify the general
structure of the document. Notice that under XHTML you need to have a little
more information about the language you are using.
– The <title> and </title> tag pair specifies the title of the document that generally
appears in the title bar of the Web browser
.
– The <h1> and </h1> header tag pair creates a headline indicating some
important information.
– The <hr /> tag, which has no end tag making its syntax different in XHTML,
inserts a horizontal rule, or bar, across the screen.
– The <p> and </p> paragraph tag pair indicates a paragraph of text.
1
XHTML & CSS I
Your First Example
• Notice that (X)HTML files are just text file so you can
type it in using Notepad, SimpleText, etc.
• Type in the previous
example and the file as
first.html or first.htm
• Open file with browser using
“Open file” and the document
will display in the browser
window.
1
XHTML & CSS I
Your First Example
• If you make a mistake type in again
and reload the document either using
“Open file” or pressing the reload
button in the browser.
• Make sure you do not save the file
as first.txt or use format like .doc
otherwise the browser may render
the content on screen.
• Also be aware that the browser will
cache pages!
Do the example yourself!
1
XHTML & CSS I
Example Wrap-up
• From the previous example you might surmise that learning
(X)HTML is merely a matter of learning the multitude of markup
tags, such as <b>, <i>, <p>, and so on, that specify the format
and/or structure of documents to browsers.
– This is partially true but like knowing how Microsoft Word
commands works does not make one a writer.
• It should be obvious from the proceeding example that creating
(X)HTML in such a manual fashion is not appropriate.
– We’ll study tools to produce markup in a bit, but regardless of the
tool being used to create a page we should know how markup
works.
1
XHTML & CSS I
(X)HTML: A Structured Language
•
HTML has a very well-defined syntax and all HTML documents should
follow a formal structure.
•
The World Wide Web Consortium (www.w3.org) defines the HTML and
XHTML standards.
•
HTML was defined as an application of the Standard Generalized Markup
Language (SGML)
– SGML is a language to define other languages – a meta or grammar
language if you like
– In SGML you define a Document Type Definition or DTD which
represents the grammar or rules of the language being defined.
•
In 1999 the definition of HTML was rewritten using XML (Extensible Markup
Language) and renamed XHTML.
– In XML you also may use a DTD but an emerging grammar form called
a schema can also be used.
1
XHTML & CSS I
(X)HTML: A Structured Language
•
Looking at the XHTML specification (http://www.w3.org/TR/xhtml1/) you
see the DTD defines the grammar of the language. Here is a small
excerpt:
<!ELEMENT html (head, body)>
<!ATTLIST html
%i18n;
xmlns %URI; #FIXED
'http://www.w3.org/1999/xhtml' >
•
In this fragment we see the definition of the root element html which
encloses a head element followed by a body element and the html
element has an xmlns attribute as well as something called %i18n which
is just a macro that expands to some more attributes such as lang and
dir which specify aspects of the language in use.
•
Reading the DTD we can define the structure of an HTML or XHTML
document as shown on the next few slides.
1
XHTML & CSS I
HTML Structure
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Document Title Goes Here</title>
...Head information describing the document and providing
supplementary information goes here....
</head>
<body>
...Document content and markup go here....
</body>
</html>
1
XHTML & CSS I
XHTML Structure
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title>Document Title Goes Here</title>
...Head information describing the document and providing
supplementary information goes here....
</head>
<body>
...Document content and markup go here....
</body>
</html>
1
XHTML & CSS I
Document Types
• All documents begin with a <!DOCTYPE> declaration.
– In the basic sense it identifies the HTML “dialect” used in a document by
referencing an external DTD.
– A DTD defines the actual elements, attributes, and element
relationships that are valid in documents.
• Modern browsers are aware of the <!DOCTYPE> and will examine it
to determine what rendering mode to enter (standards vs. quirk).
– This process is often dubbed the “doctype switch”
• Using the <!DOCTYPE> declaration allows validation software to
identify the DTD being followed in a document, and verify that the
document is syntactically correct—in other words, that all tags used
are part of a particular specification and are being used correctly.
1
XHTML & CSS I
Document Types Contd
A <!DOCTYPE> statement often looks like
•
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
•
•
or
•
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
or
•
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
•
Notice that the later examples are more appropriate and provide the
actual URL to the DTD in question
1
XHTML & CSS I
1
Common HTML Doctypes
HTML Version
!DOCTYPE Declaration
2.0
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
3.2
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
4.0 Transitional
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
4.0 Frameset
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">
4.0 Strict
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
4.01 Transitional
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
4.01 Frameset
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">
4.01 Strict
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
XHTML & CSS I
1
Common XHTML Doctypes
XHTML Version
Doctype
XHTML 1.0 Transitional
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
XHTML 1.0 Strict
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML 1.0 Frameset
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
XHTML 1.1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
XHTML 2.0 (still in progress)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 2.0//EN" "
http://www.w3.org/TR/xhtml2/DTD/xhtml2.dtd ">
XHTML & CSS I
1
HTML Version Summary
HTML Version
Description
2.0
Classic HTML dialect supported by browsers such as Mosaic. This form of
HTML supports core HTML elements and features such as tables and forms
but does not consider any of the browser innovations of advanced features
such as style sheets, scripting, or frames.
3.0
The proposed replacement for HTML 2.0 that was never widely adopted, most
likely due to the heavy use of browser-specific markup.
3.2
A version of HTML finalized by the W3C in early 1997 that standardized most
of the HTML features introduced in browsers such as Netscape 3. This
version of HTML supports many presentation elements, such as fonts, as well
as early support for some scripting features.
4.0 Transitional
The 4.0 transitional form finalized by the W3C in December of 1997 preserves
most of the presentation elements of HTML 3.2. It provides a basis for
transition to CSS as well as a base set of elements and attributes for multiple
language support, accessibility, and scripting.
4.0 Strict
The strict version of HTML 4.0 removes most of the presentation elements
from the HTML specification, such as fonts, in favor of using Cascading Style
Sheets (CSS) for page formatting.
4.0 Frameset
The frameset specification provides a rigorous syntax for framed documents
that was lacking in previous versions of HTML.
4.01 Tran/Strict/Frame
A minor update to the 4.0 standard that corrects some of the errors in the
original specification.
XHTML & CSS I
1
XHTML Version Summary
XHTML Version
Description
1.0 Transitional
A reformulation of HTML as an XML application. The transitional form
preserves many of the basic presentation features of HTML 4.0 transitional
but applies the strict syntax rules of XML to HTML.
1.0 Strict
A reformulation of HTML 4.0 strict using XML. This language is rule
enforcing and leaves all presentation duties to technologies such as
Cascading Style Sheets (CSS).
1.1
A minor change to XHTML 1.0 that restructures the definition of XHTML
1.0 to modularize it for easy extension. It is not commonly used at the time
of this writing and offers minor gains over XHTML 1.0.
2.0
A new implementation of XHTML circa 2003 that may not provide
backward compatibility with XHTML 1.0 and traditional HTML. XHTML 2 will
likely remove most or all presentational tags left in HTML and will introduce
even more logical ideas to the language.
Given there are numerous versions of HTML and XHTML it is important
to know which browsers support what technologies. A brief overview is given on
the next few slides.
XHTML & CSS I
<html> tag
• Looking deeper at the document we see the <html> tag
delimits the beginning and the end of an HTML document.
• Given that <html> is the common ancestor of an HTML
document it is often called the root element, as it is the root of
an inverted tree structure containing the tags and content of a
document.
• The <html> tag, however, directly contains only the <head>
tag, the <body> tag, and potentially the <frameset> tag
instead of the <body> tag.
• Interestingly <html> is not required under standard HTML
1
XHTML & CSS I
<head> and <body>
•
The head of a document delimited by <head> includes supplementary
information about the document including document title, scripts, styles, meta
information, etc.
•
Most important head element is <title>
– <title> is mandatory under even older HTML specifications
– <title> should be the first tag in the <head> under traditional HTML and
must be the first tag under XHTML
– <title> used for bookmarking, navigation, searching, etc.
– <title> will not render markup -- <title><b>Yow!</b><title>
– The title may however contain entities <title>PINT © 2003</title>
•
The <body> of a document contains the actual content and appropriate markup
to render the page
•
There should be only one head section (<head>) and one body section
(<body>) in a document.
•
Under old HTML, both <head> and <body> are actually optional
1
XHTML & CSS I
Within the <body>
•
HTML follows a content enclosure model of large structures containing
smaller structures.
– Within the body you have block-level elements which define structural content
blocks like paragraphs (<p>) or headings (<h1>).
– Within block structures we see inline elements like bold (<b>), emphasis (<em>)
and so on as well as straight text content and entities such as < or <
which insert the < symbol.
– Typically block elements create “formatting boxes” and cause returns and inline
elements do not cause returns.
•
Further structures like lists (<ul>), images (<img>), scripts (<script>) and
multimedia objects (<object>) are also found in the <body> but may fall
outside the hierarchy you might expect.
•
The concept of tags enclosing only certain types of other tags is dubbed the
content model.
1
XHTML & CSS I
1
The Rules of (X)HTML
•
HTML is not case sensitive, XHTML is
– <b>, <B> are the same under HTML
– <p ALIGN=“center”> and <p align=“center”> are also the same
– XHTML forces lowercase so always use lowercase even in HTML
•
HTML/XHTML attribute values may be case sensitive
– Mostly related to URL values
– <img src=“test.gif”> is the same as <img SRC=“test.gif”> under HTML
– <img src=“test.gif”> may not be same as <img src=“TEST.GIF”>
•
(X)HTML is sensitive up to a single white space character
– <b>This is a test</b>
renders the same as <b> This is
a
test</b>
– Under some elements like <pre> or <textarea> whitespace rules may be different
– Lack of whitespace understanding can create visual problems and result in
wasted bandwidth.
XHTML & CSS I
1
The Rules of (X)HTML Contd.
•
(X)HTML elements should be nested not crossed
• Nested = Good
• Crossed = Bad
<b><i>This is bold and italic </i></b>
<b><i>Don’t do this </b></i>
• (X)HTML follows a content model
– Some tags are only allowed in others
– Example, unordered lists (<ul>) should only contain list items (<li>) so
the common markup <ul><p>test</p></ul> is actually illegal.
• Elements should have close tags unless empty
– <p> should have </p> even though under HTML it is optional
– Empty element should self-close (e.g. <hr />)
• Unused elements may minimize
– <p></p><p></p><p></p> is not going to do what you expect
XHTML & CSS I
The Rules of (X)HTML Contd.
•
Attributes should be quoted
– Under standard HTML ordinals did not have to be <p align=center>
– Under XHTML you always quote <p align=“center”>
•
Browsers ignore unknown attributes and elements
– <bogus>Look at me!</bogus>
– <p today=“Tuesday”>What!?</p>
•
Despite all these rules you find browsers allow just about anything to
render.
– Beware: “Tag soup” HTML common or not does not lend itself to
maintenance and is not futureproof!
– With the rise of XHTML we do actually need to know what is going on!
1
XHTML & CSS I
Major Themes
• Logical and Physical markup
– Logical markup says what something means, physical markup
describes how something looks.
– <b> is physical markup and <strong> is logical markup.
– What is <p>, <h1>, <head>, <body>?
• Standards vs. Practice
– Question: How do most people think about HTML?
– Answer: Physically
– Consider a WYSIWG editor, does it encourage logical markup?
– Consider the value of logical markup but be pragmatic!
– “HTML is the English of the Web” -- poorly spoken but well
understood
1
XHTML & CSS I
Myths about (X)HTML
• (X)HTML is a WYSIWYG design language
• (X)HTML is a programming language
• Traditional HTML is going away soon
• XHTML will take the public by storm
• XHTML is not useful
• Hand coding of (X)HTML is always the way to go
• (X)HTML is all you need to know
1
Download