Lecture 24 Web Site Structure When you decide to have a website for your business or personal interest there are a number of things you have to consider before you start actually building your website. Website planning has various steps: 1. Purpose of Website The first step of website planning should be deciding on the purpose of the website. Determine what it is that you wish to accomplish with the website. Taking the time to clearly define the purpose of the website will affect how successfully you reach the goals you set for the project. 2. Determine Target Audience Ask yourself, "Who is going to be looking at my site?" and "What technologies will your visitors have?" When planning a website you need to assess what the target audience will be, what technologies their systems will have and what their computer experience before you can decide on your website technologies. Determining your target audience during the website planning stage will give you a wealth of information that can be used as the website is further developed. This information can be used when deciding on which website technologies to incorporate, the type of website features you need and what the target audience is looking for. 3. Website Technical Considerations Ask yourself, "What technologies do I need?" The website technologies you will require will depend on the type of website you are building and what type of audience you have decided to target and accommodate. Have your list of website technologies required ready before you move to the next step, securing hosting. 4. Website Hosting Costs Website hosting costs is influenced by website planning. When planning a website be sure that the web host has room to grow with your site. Are there features included with a slightly more expensive hosting package that you will need in the future? 5. Website Budget Ask yourself, "What is my budget?" When planning a website, budget can be a determining factor as to what features the website will have. Seriously assessing what you can do yourself and what you need help with will affect the website budget. What is Web Site? A website, also written as Web site, web site, or simply site, is a set of related web pages served from a single web domain. A website is hosted on at least one web server, accessible via a network such as the Internet or a private local area network through an Internet address known as a Uniform Resource Locator. All publicly accessible websites collectively constitute the World Wide Web. A webpage is a document, typically written in plain text interspersed with formatting instructions of Hypertext Markup Language (HTML, XHTML). A webpage may incorporate elements from other websites with suitable markup anchors. Webpages are accessed and transported with the Hypertext Transfer Protocol (HTTP), which may optionally employ encryption (HTTP Secure, HTTPS) to provide security and privacy for the user of the webpage content. The user's application, often a web browser, renders the page content according to its HTML markup instructions onto a display terminal. The pages of a website can usually be accessed from a simple Uniform Resource Locator (URL) called the web address. The URLs of the pages organize them into a hierarchy, although hyperlinking between them conveys the reader's perceived site structure and guides the reader's navigation of the site which generally includes a home page with most of the links to the site's web content, and a supplementary about, contact and link page. Some websites require a subscription to access some or all of their content. Examples of subscription websites include many business sites, parts of news websites, academic journal websites, gaming websites, file-sharing websites, message boards, web-based email, social networking websites, websites providing real-time stock market data, and websites providing various other services (e.g., websites offering storing and/or sharing of images, files and so forth). Types of Web Sites & Documents Static Web Pages Static web pages don’t change content or layout with every request to the web server. They change only when a web author manually updates them with a text editor or web editing tool like Adobe Dreamweaver. The vast majority of web sites use static pages, and the technique is highly costeffective for publishing web information that doesn’t change substantially over months or even years. Many web content management systems also use static publishing to deliver web content. In the CMS the pages are created and modified in a dynamic database-driven web-editing interface but are then written out to the web server (“published”) as ordinary static pages. Static pages are simple, secure, less prone to technology errors and breakdown, and easily visible by search engines. Dynamic Web Pages Dynamic web pages can adapt their content or appearance depending on the user’s interactions, changes in data supplied by an application, or as an evolution over time, as on a news web site. Using client-side scripting techniques (xml, Ajax techniques, Flash ActionScript), content can be changed quickly on the user’s computer without new page requests to the web server. Most dynamic web content, however, is assembled on the web server using server-side scripting languages (asp, jsp, Perl, php, Python). Both client- and server-side approaches are used in multifaceted web sites with constantly changing content and complex interactive features. Dynamic web pages offer enormous flexibility, but the process of delivering a uniquely assembled mix of content with every page request requires a rapid, high-end web server, and even the most capable server can bog down under many requests for dynamic web pages in a short time. Unless they are carefully optimized, dynamic web content delivery systems are often much less visible to search engines than static pages. Always ask about search visibility when considering the merits of a dynamic web content system. Web Content Management Enterprise Web Content Management Systems Web content management systems enable large numbers of nontechnical content contributors to update and create new web pages with ease within the context of large, enterprise-wide web sites that may contain thousands or even millions of pages of content. These systems offer some variation on these three core features: Editorial workflow, an approval process, and access management for individual web authors Site management of pages, directories, content contributor accounts, and general system operations An interactive user interface, usually browser-based, that doesn’t require technical knowledge of the web, html, or css to create web content In a typical cms-driven web site, the web editing workflow is as follows: 1. A domain expert, local department staffer, or writer adds, updates, or otherwise modifies the content of a page, using a web browser to access the cms features and perform editing and site management functions; 2. The finished content is routed by a series of notifications to the designated approver for content in that area of the larger web site; 3. The approver reviews the new content and either releases it for publication or sends it back for revision; and 4. The cms assembles the approved content for publication and, on larger web sites, is typically published to a “live” server on the Internet at specified intervals during the day. Most cms products can also handle instant site updates if needed. The text, graphic, and site management tools in a cms are designed to allow users with little or no knowledge of html or css to create and manage sophisticated web content. Most large corporate, enterprise, and university sites are now managed with a cms in a decentralized editorial environment where hundreds of individual authors, content approvers, editors, and media contributors create most of the content for the enterprise’s sites. Most enterprise cms products use a database to store web content. Text and media files (graphics, photos, podcasts, videos) are often stored as xml to facilitate reuse and enable flexible presentation options, permitting content to be updated simultaneously on a variety of web pages. cms products use templates to provide a consistent user interface, enterprise identity branding, and typographic presentation throughout the site. cms templates increasingly are complex xslt (Extensible Style Language Transformation) files that modify and transform xml content into web pages for viewing in conventional web browsers, in special formats for visually impaired readers, on mobile devices like cell phones, and in convenient print formats. Blogs Owing to their ease of use and the ready availability of supporting software, web logs, or blogs, are the most popular, inexpensive, and widespread form of web content management. Blog software such as Blogger, Roller, or WordPress allows nontechnical users to combine text, graphics, and digital media files easily into interactive web pages. A blog is actually a simple cms, typically designed to support three core features: Easy publication of text, graphics, and multimedia content on the web Built-in tools that enable blog readers to post comments (an optional feature) Built-in rss features that allow subscribers to see when a blog site has been updated The typical blog content genre is an online diary of life events (personal blogs) or short commentary on particular subject (politics, technology, specialized topics), but blog software can easily be adapted to support collaborative work within social groups or internal and external enterprise communications. For example, many universities have adopted blog software as a simple cms that allows nontechnical faculty and administrators to quickly post notices, emergency announcements, and other timely material. For a small (ten-to-twenty-page), special-purpose, small business, or department web site, a blogbased site may be all you need to get up and running quickly with a set of friendly, nontechnical editing tools and (usually) such built-in features as calendars, automated category and navigation controls, and automatic RSS feeds. If the blog metaphor of posted-content-plus-reader-comments doesn’t suit your purpose, turn off the comments features and you have a friendly web site development and editing tool plus a lightweight CMS in one inexpensive package. Wikis A wiki is a specialized form of content-managed web site designed to support the easy collaborative creation of web pages by groups of users. Wikis differ from blogs and other cms options in that wikis allow all users to change the content of the wiki pages, not just to post comments about the content. Wikis such as the well-known Wikipedia online encyclopedia can be publicly accessible and edited by any user, but wiki software can also be used to support more private collaboration projects, where only members of the group can see and edit the wiki content. Popular commercial wiki tools like PBwiki, MediaWiki (used by Wikipedia), and JotSpot offer search, browsing, and editing features, as well as account management and security features to limit access to selected users. In wikis the changes to content are typically visible instantly after changes are made, and the workflow model is “open,” without a formal approval process for new content changes and additions. This open model allows fast progress and updates by many contributors, but may not be suitable for projects that handle sensitive or controversial material that is visible to the reading public on your enterprise intranet or the larger World Wide Web audience. RSS Really Simple Syndication is a great way to generate a set of “headlines” and web links that can appear many places at once on the Internet or your local enterprise intranet. rss is a family of xmlbased feed formats that can automatically provide an updated set of headlines, web links, or short content snippets to many forms of Internet media. rss can be read by a variety of display software, including many email programs, major web browsers (Firefox, Internet Explorer, Opera, Safari), specialized rss aggregator software like Surfpack or FeedDemon, and web portal sites such as iGoogle, MyYahoo!, and other customizable corporate and Internet portals. Most blog software can generate rss feeds to notify users of updated content, and there are many special-purpose rss feed authoring programs on the market. Once the rss feed file is created by a blog or generated by desktop rss software and placed on a web server, the feed can be addressed with a conventional url (uniform resource locator) just like a web page (http://whatever-site.com/my-rss-feed.xml). Every time you update the rss feed file your users see the new headlines in their email, web browser, or portal page. What is Domain Name A domain name is an identification string that defines a realm of administrative autonomy, authority, or control on the Internet. Domain names are formed by the rules and procedures of the Domain Name System (DNS). Technically, any name registered in the DNS is a domain name. Domain names are used in various networking contexts and application-specific naming and addressing purposes. In general, a domain name represents an Internet Protocol (IP) resource, such as a personal computer used to access the Internet, a server computer hosting a web site, or the web site itself or any other service communicated via the Internet. Domain names are organized in subordinate levels (subdomains) of the DNS root domain, which is nameless. The first-level set of domain names are the top-level domains (TLDs), including the generic top-level domains (gTLDs), such as the prominent domains com, info, net and org, and the country code top-level domains (ccTLDs). Below these top-level domains in the DNS hierarchy are the second-level and third-level domain names that are typically open for reservation by end-users who wish to connect local area networks to the Internet, create other publicly accessible Internet resources or run web sites. The registration of these domain names is usually administered by domain name registrars who sell their services to the public. A fully qualified domain name (FQDN) is a domain name that is completely specified in the hierarchy of the DNS, having no omitted parts. Domain names are usually written in lowercase, although labels in the Domain Name System are case-insensitive. Top Level Domain The top-level domains such as .com and .net and .org are the highest level of domain names of the Internet. A top-level domain is also called a TLD. Top-level domains form the DNS root zone of the hierarchical Domain Name System. Every domain name ends in a top-level or first-level domain label. Second Level Domain Below the top-level domains in the domain name hierarchy are the second-level domain (SLD) names. These are the names directly to the left of .com, .net, and the other top-level domains. As an example, in the domain example.co.uk, co is the second-level domain. Next are third-level domains, which are written immediately to the left of a second-level domain. There can be fourth- and fifth-level domains, and so on, with virtually no limitation. What is HTML HyperText Markup Language (HTML) is the main markup language for creating web pages and other information that can be displayed in a web browser. A file in HTML is first of all, a file saved with a recognizable extension. The most common extension is htm as the traditional format of Microsoft Windows files. The other extension that is s valid is html. Any one of these two extensions makes it a valid HTML file. If the file you are working on has already been saved, it should have a valid extension already. Otherwise, after creating a text file, to save it as HTML, make sure you provide a valid extension. An HTML document is an ASCII text file that contains embedded HTML tags. On a UNIX server, it typically has a filename extension of .html. In general, the HTML tags are used to identify the structure of the document and to identify hyperlinks (to be highlighted) and their associated URLs. HTML identifies the structure of the document and it suggests the layout of the document. The display capabilities of the Web browser determine the appearance of the HTML document on the screeen. Using HTML you can identify: The title of the document The hierarchical structure of the document with header levels and section names Bulleted, numbered, and nested lists Insertion points for graphics Special emphasis for key words or phrases Preformatted areas of the document Hyperlinks and associated URLs HTML cannot control the: Typeface used for any document component Point size of any specific font Width or height of the screen Centering, spacing, or line breaks of information, except in preformatted text Background, foreground, or highlight colors These things all depend on the browser, which may allow the user to control them. Autoflowing and Autowrapping The most basic element in the HTML document is the paragraph. The Web browser flows all the contents of the paragraph together from left to right and from top to bottom given the current window or display size. This is called autoflowing. How you break lines in that paragraph in the HTML is irrelevant when that page is displayed by a Web browser. The Web browser wraps anything that doesn't fit on the current line, putting it on the next line. For example, a paragraph that displays six lines long on an 8-inch wide window rewraps to be about 12 lines long if the user resizes the Web browser window to be half as wide. This is called autowrapping. Your document will be read by both graphical and character-based Web browsers. Furthermore, there will be display differences with graphical Web browsers given different screen resolutions. So just because one browser breaks a line at one place, that doesn't mean others will do so at the same place. Just remember that on the Web, you live in a world that is left-justified and flows from top to bottom. HTML Tag Syntax When writing HTML, you add "tags" to the text in order to create the structure. These tags tell the browser how to display the text or graphics in the document. HTML tags are encapsulated within less-than (<) and greater-than (>) brackets. Some of the tags are single-element tags that can stand by themselves. These are referred to as standalone tags. The syntax is simple: <tag> The most common standalone tag is <P>, which ends a paragraph. Other tags are used in pairs. The beginning tag tells the Web browser to start the tag function and the ending tag tells the Web browser to stop. The ending tag is created by adding a forward slash (/) to the beginning tag. The syntax is: <tag>object</tag> The tag identifies the function that is being applied to the object. For example, if you wanted to add special emphasis to a phrase, you would encapsulate the phrase with the <EM> tagging pair as illustrated: <EM>text to emphasize</EM> Many of the standalone tags and the beginning tag of tagging pairs can have options included. So to be complete the syntax is: <tag option1 option2 option3> Document Construction Guidelines Now let's look at the three tagging pairs used to create the highest level of structure in an HTML document: <HTML> entire HTML document </HTML> <HEAD> document header information </HEAD> <BODY> body of the HTML document </BODY> The following is a skeletal HTML document that shows the required nesting of these three tagging pairs: <HTML> <HEAD> Head elements </HEAD> <BODY> Body elements and content </BODY> </HTML> The Header The HTML header contains several notable items which include: 1. doctype - This gives a description of the type of HTML document this is. 2. meta name="description" - This gives a description of the page for search engines. 3. meta name="keywords" - This line sets keywords which search engines may use to find your page. 4. title - Defines the name of your document for your browser. Elements in the Header Elements allowed in the HTML 4.0 strict HEAD element are: BASE - Defines the base location for resources in the current HTML document. Supports the TARGET attribute in frame and transitional document type definitions. LINK - Used to set relationships of other documents with this document. META - Used to set specific characteristics of the web page and provide information to readers and search engines. SCRIPT - Used to embed script in the header of an HTML document. STYLE - Used to embed a style sheet in the HTML document. TITLE - Sets the document title. HTML BODY The HTML body element will define the rest of the HTML page which is the bulk of your document. It will include headers, paragraphs, lists, tables, and more. An example body section: <body text="#000000" bgcolor="#FFFFFF" link="#0000FF" vlink="#000080" alink="#FF0000"> <h1 style="text-align: center">HTML Document Structure</h1> <p> This is a sample HTML file. </p> </body> </html> This example controls the body background, wallpaper, and link color directly rather than using style sheets. The BODY Element Tags and Attributes The <body> tag is used to start the BODY element and the </body> tag ends it. It is used to divide a web page within one or more sections. Its tags and attributes are: <body> - Designates the start of the body. ONLOAD - Used to specify the name of a script to run when the document is loaded. ONUNLOAD - Used to specify the name of a script to run when the document exits. BACKGROUND="clouds.gif" - (Depreciated) Defines the name of a file to use for the background for the page. The background can be specified as in the following line. BGCOLOR="white" - (Depreciated) Designates the page background color. TEXT="black" - (Depreciated) Designates the color of the page's text. LINK="blue" - (Depreciated) Designates the color of links that have not been visited. ALINK="red" - (Depreciated) Designates the color of the link currently being visited. VliNK="green" - (Depreciated) Designates the color of visited links. </body> - Designates the end of the body.