Web Publishing Systems

advertisement
Web Publishing Architecture

Look at the various components of Web
publishing, many of which are common to
most Web applications.

HTML Document Publishing

CGI Scripting Applications

Content Management Systems
The Web Browser is…
A program available everywhere.
 A generalized information interface.
 A client that connects to distributed
servers.
 A single point of control over the
Web fought over by Microsoft and
Netscape.

The Web, Circa 1993
Key Challenges Were on the
Client

How to present information in a Web
browser.
Developed by Pei
Wei in 1992,
Viola was an
application
toolkit, built on
top of the X
Window System.
Its www browser
was a sample
application,
integrating styled
text and
graphics.
In this
example,
the Viola
browser
embedded
another
application
and its
controls.
World Wide Web Wizards
Workshop (July 1993)
Early attempt to forge
common development
agenda.
 Tension between slowmoving standards
development vs. seatof-the-pants
innovation

HTML

Hypertext Markup Language
A simple SGML vocabulary or tagset
 Control content and layout of
presentation.
 Human readable data format.

The Web, Circa 1995

Publication Models
Key Challenges Were on the
Server
•
Publishing Becomes a Server-side
Application
•
•
•
Apache, mod_perl and Perl.
Didn’t Much Depend On Client-Side
Capabilities
Development of Custom Content
Management Systems
•
Manage the publishing process
The Web Server…
HyperText Transfer Protocol
(HTTP)



HTTP is a Request/Response Protocol
"HTTP is a protocol with the lightness and
speed necessary for a distributed collaborative
hypermedia information system. " Tim BernersLee, 1992, Basic HTTP
Achieves a loose coupling of client and servers.

References: HTTP 1.1 Spec
Anatomy of a Request

Browser locates server (oreilly.com) and
makes a connection to port number 80 (in
a typical configuration) on that machine.
Full Request
GET /index.html HTTP/1.1
Host: localhost
Accept: image/gif, image/x-xbitmap,
image/jpeg, image/pjpeg, image/xbm,
*/*
Accept-Language: en
Connection: Keep-Alive
User-Agent: Mozilla/4.0 (compatible;
MSIE 4.5; Mac_PowerPC)
Server

Returns status of request.
HTTP/1.1

200 OK
Sends header info followed by a blank line.
Content-type: text/html Content-length: 3896


Sends document or data from a CGI program.
Objects embedded in document such as
images generate new requests to the server.
The Apache Web Server

The Apache Group, an Open Source
software project, has developed the
leading Web server with over 50% of all
servers.

Web servers are fairly stable technology.

Reference: Apache.org, Netcraft survey

Apache: The Definitive Guide
Apache Directories

Have you set up a Web server?
/usr/local/apache is unix/linux
installation directory
 /htdocs is directory for HTML files.


/cgi-bin is for scripts.

/conf is configuration directory where file
httpd.conf lives.
Configuring a Web Server

Site administrator usually takes care of
the following server configuration issues
by editing httpd.conf:

Document and content type mapping

Authentication and Access Control

Logging

Virtual Servers
URL Management

Decision about URLs:

Relative vs. Absolute links on the site.

Permanent addressing vs. current addressing
/98/09/21/document.html
today.html

What are you going to do when things
change?
• URLs can be brittle.
Authentication



Authentication is asking
a user to provide
identification, usually a
user name and password.
Basic Authentication
uses the htaccess file.
More sophisticated
applications will manage
this information in a user
database.
Apache section
Logs
Found in logs directory: access.log
Log entry tells you:
IP number – Date/Time – Request

152.163.201.137 - - [20/Sep/2001:02:10:08 -0700] "GET / HTTP/1.0" 200 8087
Logs Processing

Some of the tasks surrounding logs:
Log rotation (Day, week, month)
 Log compression (files grow large)
 Log file parsing and reporting
 Reverse DNS lookup


References: Lincoln Stein, Yahoo's list of
tools, Marketwave's Hitlist Examples
Server Hardware and OS



Server farms or hosting services are set up to
manage the hardware, the OS and the network
for 24/7 operation.
Properly configured PC's can be powerful
enough to handle sizable load, obviating the
need for more expensive servers from Sun.
Small dedicated Web server devices such as the
Cobalt server with embedded Linux and Web
administration.
Web Publishing
HTML Authoring Systems
 Server Side Includes
 CGI Applications
 Templates

Authoring Systems


Debate over whether to show or hide
HTML to authors.
Page Creation Tools

HTML Editors
• Homesite; BBEdit.

Web Site Authoring Systems
• FrontPage; GoLive; NetObjects; Dreamweaver

Market share estimate of authoring tools.
(Security Space)
Server Side Includes


Insert dynamic information such as date
or time.
Include file shared by a set of documents.


One way to create a consistent page layout
across the site.
Example: Use server-side include to put
common information for a page header or
footer in a separate file and source it
from all documents.
CGI Applications
Common Gateway Interface
A web server passes control to an
application, which generates a dynamic
HTML document and returns it to the
server.



Forms-based Input and Interaction
Session management
Transactions
Scripting

Perl became the favored scripting
language for Web applications.

CGI modules in Perl and Python provide a
higher-level interface for the programmer
and hide the low level details.

Script installed in server's cgi-bin directory.

HTML document containing form references
the CGI script.

Sample Perl CGI script
Stateless Transactions

HTTP is a stateless protocol. Each
interaction is independent of the others.

Maintaining state or session tracking is
necessary for a number of applications
such as shopping carts.
Application Servers
Web Application Stack
OS
Web S erve r
Appli cation
Serve r
DB
Open
Source
Linux
Apach e
PHP
Sun
Micro soft
IBM
Macromedia
Sola ris
Apach e
JSP
Window s
IIIS
ASP
Linux
Apach e
Websph ere
Window s
Apach e
Cold Fusion
MyS QL
Oracle
SQLServe r
DB2
SQLServe r
Characteristics
Embed programming code inside of
HTML documents.
 Languages like PHP, Cold Fusion and
ASP can be viewed as extensions to
HTML.
 One consideration is whether there’s
clean separation between code and
documents.

Cold Fusion

Cold Fusion from Allaire/Macromedia is a
Windows/NT/2000 application.

Server is configured so that files ending in
.cfm are passed to the Cold Fusion
application server.
Cold Fusion and HTML file
<H2>New Form</H2>
<FORM ACTION="searchquery.cfm"
METHOD="Post">
Last Name: <Input Type="text"
Name="LastName">
<Input Type="Submit"
Value="Search">
</FORM>
Application file (.cfm)
<CFQUERY Name="EmployeeList" Datasource="Examples">
Select * From Employees
WHERE LastName = '#LastName#'
</CFQUERY>
<body>
<H2>Results</H2>
<CFOUTPUT>
<P>The search for #Form.LastName# returned the following:
</CFOUTPUT>
<CFOUTPUT QUERY="EmployeeList">
<HR>
#FirstName# #LastName# (Phone: #PhoneNumber#)
<BR>
</CFOUTPUT>
Database Servers

Flat-file database, dbm files

Free


Mid-range


MySQL and Postgres
MS Access and SQL Server
Commercial High-end

Oracle 8i, Sybase, IBM’s DB2
Database Woes

Generating pages dynamically can impact
a site’s performance and administration.

Many applications find ways of generating
static pages and caching them

Should documents be stored in the database?
Databases

The standard application interfaces to the
database are through SQL and/or ODBC.

SQL can be used to create or modify data
records in the database as well as to select
sets of data from it.
SQL Example:

SELECT NAME, ADDR FROM EMPLOYEES
WHERE NAME EQ "DALE DOUGHERTY"

Languages such as Perl, Python and Java all provide
fairly standard interfaces for accessing databases.

Earlier Cold Fusion example simply embeds SQL
statement in an HTML document. The CF
application passes the query to the database server,
which processes the request and returns the data to
the application, which passes it back to the web
server.
Application Server Issues
What degree of technical expertise
is required to build applications?
 How portable is the application? How
much does it tie you to one OS or
Web server or language?
 Is the server API proprietary or
standardized?

Application Service Provider
(ASP)
A Web site is increasingly put
together as a set of components
that could be software or services
sourced from different sites.
 ASPs are providers of services rather
than software. Take away the
burden of owning and maintaining
software.

Content Management
A specialized application server
 A system for managing the
production, development and
delivery of content by a team of
producers.

CMS Features






Manages "metadata" to build collections of
documents and create different views.
Generates content from database
Provides for staging of content; replication.
Administrative interface to manage scheduling
and workflow
Manage interactions with customers and keep
track of vital information.
Allow for distribution of information in
multiple formats.
Implementing Layouts in CMS

Which Layout Strategy Will You Use?

Server Side Includes (SSI)

Style sheets (CSS)
• Table layout vs block positioning

Templates

XSLT (transformation of XML into HTML)
CS (Community Server)

Content Management System written
using Apache, Perl, MySQL

Used for O’Reilly Network, XML.com and
Perl.com.

Demo
Other CMS

Vignette


Ars Digita


Expensive, commercial CMS system
Java-based platform.
Zope

Python-based
Advantages of CMS
An cost-effective way to manage
information and users.
 A consistent administrative interface for
building and managing complex Web
sites.
 A robust development platform that
provides common publishing
functionality and allows customization.

Other Major Components

Advertising Server

Search Engine

Conferencing System
Ad Server

Software or Service?
The ad server provides for the dynamic
rotation of advertising banners on a site, and
the collection of data to track impressions
and click-throughs.
 Ad traffic adminstrator sets up campaigns to
run on the server.
 Advertisers use the server to get real-time
reporting on how ad is doing.

Search Engine

Search engine provides a full-text index of
a site or a collection of sites.

Webmaster needs to configure indexer to
run at certain intervals, either to
regenerate complete index or simply to
update it.

References: Atomz
Conferencing and Chat Systems

Sites use conferencing and chat systems
to create community and increase user
involvement.
Conferencing or Bulletin Board Systems
 Chat
 Instant Messaging
 Polls and Surveys

Mailing List Software

Email remains the dominant form of
communication on the Web. The ability to
capture email addresses and send regular
email to users is very valuable.

Major Domo, ListServ, Lyris
Flow

Weblogs



Commentary; Directing Attention to Interesting Items
on the Web
Personal Writing Space
Tools
• Manila from Userland
• Others such as Blogger

RSS



Rich Site Summary
Headlines
Enhance to send more metadata
Example: Meerkat

An Open Wire Service
An RSS aggregator
 A guide to technical information
produced by RSS channels.
 Information is sorted by channel and
technology.
 Can be customized and personalized.

Summary

Publishing is a server-side application.



Most functionality is controlled by the
application server.
Content management systems provide a
standard set of capabilities but most CMS
applications require a high degree of
customization.
Software choices are often dictated by
hardware and OS selection, although they
don’t need to be.
Download