Web Publishing Architecture Look at the various components of Web publishing, many of which are common to most Web applications. HTML Document Publishing CGI Scripting Applications Content Management Systems The Web Browser is… A program available everywhere. A generalized information interface. A client that connects to distributed servers. A single point of control over the Web fought over by Microsoft and Netscape. The Web, Circa 1993 Key Challenges Were on the Client How to present information in a Web browser. Developed by Pei Wei in 1992, Viola was an application toolkit, built on top of the X Window System. Its www browser was a sample application, integrating styled text and graphics. In this example, the Viola browser embedded another application and its controls. World Wide Web Wizards Workshop (July 1993) Early attempt to forge common development agenda. Tension between slowmoving standards development vs. seatof-the-pants innovation HTML Hypertext Markup Language A simple SGML vocabulary or tagset Control content and layout of presentation. Human readable data format. The Web, Circa 1995 Publication Models Key Challenges Were on the Server • Publishing Becomes a Server-side Application • • • Apache, mod_perl and Perl. Didn’t Much Depend On Client-Side Capabilities Development of Custom Content Management Systems • Manage the publishing process The Web Server… HyperText Transfer Protocol (HTTP) HTTP is a Request/Response Protocol "HTTP is a protocol with the lightness and speed necessary for a distributed collaborative hypermedia information system. " Tim BernersLee, 1992, Basic HTTP Achieves a loose coupling of client and servers. References: HTTP 1.1 Spec Anatomy of a Request Browser locates server (oreilly.com) and makes a connection to port number 80 (in a typical configuration) on that machine. Full Request GET /index.html HTTP/1.1 Host: localhost Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/xbm, */* Accept-Language: en Connection: Keep-Alive User-Agent: Mozilla/4.0 (compatible; MSIE 4.5; Mac_PowerPC) Server Returns status of request. HTTP/1.1 200 OK Sends header info followed by a blank line. Content-type: text/html Content-length: 3896 Sends document or data from a CGI program. Objects embedded in document such as images generate new requests to the server. The Apache Web Server The Apache Group, an Open Source software project, has developed the leading Web server with over 50% of all servers. Web servers are fairly stable technology. Reference: Apache.org, Netcraft survey Apache: The Definitive Guide Apache Directories Have you set up a Web server? /usr/local/apache is unix/linux installation directory /htdocs is directory for HTML files. /cgi-bin is for scripts. /conf is configuration directory where file httpd.conf lives. Configuring a Web Server Site administrator usually takes care of the following server configuration issues by editing httpd.conf: Document and content type mapping Authentication and Access Control Logging Virtual Servers URL Management Decision about URLs: Relative vs. Absolute links on the site. Permanent addressing vs. current addressing /98/09/21/document.html today.html What are you going to do when things change? • URLs can be brittle. Authentication Authentication is asking a user to provide identification, usually a user name and password. Basic Authentication uses the htaccess file. More sophisticated applications will manage this information in a user database. Apache section Logs Found in logs directory: access.log Log entry tells you: IP number – Date/Time – Request 152.163.201.137 - - [20/Sep/2001:02:10:08 -0700] "GET / HTTP/1.0" 200 8087 Logs Processing Some of the tasks surrounding logs: Log rotation (Day, week, month) Log compression (files grow large) Log file parsing and reporting Reverse DNS lookup References: Lincoln Stein, Yahoo's list of tools, Marketwave's Hitlist Examples Server Hardware and OS Server farms or hosting services are set up to manage the hardware, the OS and the network for 24/7 operation. Properly configured PC's can be powerful enough to handle sizable load, obviating the need for more expensive servers from Sun. Small dedicated Web server devices such as the Cobalt server with embedded Linux and Web administration. Web Publishing HTML Authoring Systems Server Side Includes CGI Applications Templates Authoring Systems Debate over whether to show or hide HTML to authors. Page Creation Tools HTML Editors • Homesite; BBEdit. Web Site Authoring Systems • FrontPage; GoLive; NetObjects; Dreamweaver Market share estimate of authoring tools. (Security Space) Server Side Includes Insert dynamic information such as date or time. Include file shared by a set of documents. One way to create a consistent page layout across the site. Example: Use server-side include to put common information for a page header or footer in a separate file and source it from all documents. CGI Applications Common Gateway Interface A web server passes control to an application, which generates a dynamic HTML document and returns it to the server. Forms-based Input and Interaction Session management Transactions Scripting Perl became the favored scripting language for Web applications. CGI modules in Perl and Python provide a higher-level interface for the programmer and hide the low level details. Script installed in server's cgi-bin directory. HTML document containing form references the CGI script. Sample Perl CGI script Stateless Transactions HTTP is a stateless protocol. Each interaction is independent of the others. Maintaining state or session tracking is necessary for a number of applications such as shopping carts. Application Servers Web Application Stack OS Web S erve r Appli cation Serve r DB Open Source Linux Apach e PHP Sun Micro soft IBM Macromedia Sola ris Apach e JSP Window s IIIS ASP Linux Apach e Websph ere Window s Apach e Cold Fusion MyS QL Oracle SQLServe r DB2 SQLServe r Characteristics Embed programming code inside of HTML documents. Languages like PHP, Cold Fusion and ASP can be viewed as extensions to HTML. One consideration is whether there’s clean separation between code and documents. Cold Fusion Cold Fusion from Allaire/Macromedia is a Windows/NT/2000 application. Server is configured so that files ending in .cfm are passed to the Cold Fusion application server. Cold Fusion and HTML file <H2>New Form</H2> <FORM ACTION="searchquery.cfm" METHOD="Post"> Last Name: <Input Type="text" Name="LastName"> <Input Type="Submit" Value="Search"> </FORM> Application file (.cfm) <CFQUERY Name="EmployeeList" Datasource="Examples"> Select * From Employees WHERE LastName = '#LastName#' </CFQUERY> <body> <H2>Results</H2> <CFOUTPUT> <P>The search for #Form.LastName# returned the following: </CFOUTPUT> <CFOUTPUT QUERY="EmployeeList"> <HR> #FirstName# #LastName# (Phone: #PhoneNumber#) <BR> </CFOUTPUT> Database Servers Flat-file database, dbm files Free Mid-range MySQL and Postgres MS Access and SQL Server Commercial High-end Oracle 8i, Sybase, IBM’s DB2 Database Woes Generating pages dynamically can impact a site’s performance and administration. Many applications find ways of generating static pages and caching them Should documents be stored in the database? Databases The standard application interfaces to the database are through SQL and/or ODBC. SQL can be used to create or modify data records in the database as well as to select sets of data from it. SQL Example: SELECT NAME, ADDR FROM EMPLOYEES WHERE NAME EQ "DALE DOUGHERTY" Languages such as Perl, Python and Java all provide fairly standard interfaces for accessing databases. Earlier Cold Fusion example simply embeds SQL statement in an HTML document. The CF application passes the query to the database server, which processes the request and returns the data to the application, which passes it back to the web server. Application Server Issues What degree of technical expertise is required to build applications? How portable is the application? How much does it tie you to one OS or Web server or language? Is the server API proprietary or standardized? Application Service Provider (ASP) A Web site is increasingly put together as a set of components that could be software or services sourced from different sites. ASPs are providers of services rather than software. Take away the burden of owning and maintaining software. Content Management A specialized application server A system for managing the production, development and delivery of content by a team of producers. CMS Features Manages "metadata" to build collections of documents and create different views. Generates content from database Provides for staging of content; replication. Administrative interface to manage scheduling and workflow Manage interactions with customers and keep track of vital information. Allow for distribution of information in multiple formats. Implementing Layouts in CMS Which Layout Strategy Will You Use? Server Side Includes (SSI) Style sheets (CSS) • Table layout vs block positioning Templates XSLT (transformation of XML into HTML) CS (Community Server) Content Management System written using Apache, Perl, MySQL Used for O’Reilly Network, XML.com and Perl.com. Demo Other CMS Vignette Ars Digita Expensive, commercial CMS system Java-based platform. Zope Python-based Advantages of CMS An cost-effective way to manage information and users. A consistent administrative interface for building and managing complex Web sites. A robust development platform that provides common publishing functionality and allows customization. Other Major Components Advertising Server Search Engine Conferencing System Ad Server Software or Service? The ad server provides for the dynamic rotation of advertising banners on a site, and the collection of data to track impressions and click-throughs. Ad traffic adminstrator sets up campaigns to run on the server. Advertisers use the server to get real-time reporting on how ad is doing. Search Engine Search engine provides a full-text index of a site or a collection of sites. Webmaster needs to configure indexer to run at certain intervals, either to regenerate complete index or simply to update it. References: Atomz Conferencing and Chat Systems Sites use conferencing and chat systems to create community and increase user involvement. Conferencing or Bulletin Board Systems Chat Instant Messaging Polls and Surveys Mailing List Software Email remains the dominant form of communication on the Web. The ability to capture email addresses and send regular email to users is very valuable. Major Domo, ListServ, Lyris Flow Weblogs Commentary; Directing Attention to Interesting Items on the Web Personal Writing Space Tools • Manila from Userland • Others such as Blogger RSS Rich Site Summary Headlines Enhance to send more metadata Example: Meerkat An Open Wire Service An RSS aggregator A guide to technical information produced by RSS channels. Information is sorted by channel and technology. Can be customized and personalized. Summary Publishing is a server-side application. Most functionality is controlled by the application server. Content management systems provide a standard set of capabilities but most CMS applications require a high degree of customization. Software choices are often dictated by hardware and OS selection, although they don’t need to be.