DATABASES AND WEB TECHNOLOGY Dr. Awad Khalil Computer Science Department AUC Content The Web Building Blocks The Web as a Database Application Platform Database Gateways for Database Connectivity Client-side Approach Server-side Approach CGI (Common Gatewa y Interface) API (Application Programming Interface) Available Tools and Technologies Scripting languages such as JavaScript, Perl, and PHP HTTP cookies Netscape API (NSAPI) and Microsoft Information Server API ( ISAPI). ODBC (Open DataBase Connectivity) Java and JDBC, SQLJ, Servlets, and JavaServer Pages (JSP). Microsoft’s Web Solution Platform with Active Server Pages (ASPs) and Active Data Objects (ADO) Oracle’s Internet Platform The Web Building Blocks Internet World W ide W eb (WWW or W eb) Web Page HyperText Markup Language Hyperlink Web browser Static W eb page Dynamic W eb page Web server Web site HyperText Transfer Protocol Internet Worldwide network of computers connected together through a standard network protocol known as Transmission Control Protocol/Internet Protocol, or TCP/IP, You can think of the Internet as the “highway” on which data travel, as in the phrase “the information super-highway.” The terms Internet and World Wide Web are often used interchangeably, but they are not synonyms. World Wide Web (WWW or Web) Worldwide network of servers that store collections of specially formatted files known as Web pages. The Web is just one of many services provided by the Internet. However, it is the World Wide Web that truly opened the Internet to the World. Web Page A text document containing labels and special commands (or tags) written in a Markup Language (HTML or XML). HyperText Markup Language Standard document-formatting language for Web pages. Hyperlink Web pages are linked to each other – that is, each Web page can call other Web pages – creating the effect of a “web.” Because this link can connect to different types of documents such as text, graphics, animated graphics, video, and audio, it is known as a “hyperlink.” Web browser The end user application used to browse or navigate (jump from page to page) through the Web. The browser is a graphical interface that runs on the client computer, and its main function is to display Web pages. The Web browser (e.g., Netscape Navigator, Internet Explorer) requests Web pages from the Web server. Static Web page A Web page whose contents remain the same (when viewed in a browser) unless the page is manually edited. An example of a static Web page is a standard price list posted by a manufacturer. Dynamic Web page A Web page whose contents are automaticall y created and tailored to an end user’s needs each time the end user requests the page. For example, an end user can access a Web page that displays the latest stock selections entered by that end user. Web server A specialized application whose only function is to “listen” for clients’ requests and to send the requested Web page(s). Web site Term used to refer to the Web server and the collection of Web pages stored on the local hard disk of the server computer or an accessible shared directory. The Web server and the Web client communicate using a special protocol known as HyperText Transfer Protocol or HTTP. HyperText Transfer Protocol The standard protocol used by the Web browser and Web server to communicate. Intranet & Extranet Intranet is a locally owned and operated Internet whose access is carefully controlled. Intranet architecture is based on the same basic components of the Internet: Web servers, clients using Web browser, the TCP/IP and HTTP protocols, and HTML/XML formatting. An Extranet extends the Intranet to the corporation’s value chain. Intranet Services HTTP HTTP Request An HTTP request consists of a header indicating the type of request, the name of a resource, the HTTP version, followed by optional body. The main HTTP request types are: GET – Retrieves (gets) the resource the user has requested. POST – Transfers (posts) data to the specified resource. HEAD – Returns onl y an HTTP header instead of response data. PUT Uploads the resource to the server. DELETE – Deletes the resource from the server. OPTIONS – Requests the server’s configuration options. HTTP is currently a stateless protocol – the server retains no information between requests. This means that the information a user enters on one page is not automatically available on the next page requested. The stateless property of HTTP makes it difficult to support the concept of a session that is essential to basic DBMS transactions. HTML HTML is the document formatting language used to design most Web pages. It is a simple, yet powerful, platform-independent document language. Originally developed by Tim Berners-Lee while at CERN but was standardized in November 1995. In early 2000, W3C produced XHTML 1.0 as a reformulation of HTML 4 in XML. HTML Tags: HTML tags cover formatting, structural, and semantic markup of an HTML document. <HTML> <HEAD> <TITLE> <BODY> <A> <IMG> <B> <I> <HR> Good morning <B> Egypt </B> <HR WIDTH="200"> In 1989, <B><I>Tim Brayners</I></B> developed the first version of HTML <HTML> <HEAD> <TITLE>Page1</TITLE> </HEAD> <BODY> Good Morning, <B><I>Egypt</I></B> </BODY> </HTML> HTML Document Structure H T M L docum ent T itle <HTM L> <HEAD> < T IT L E > T itle o f y o u r d o c u m e n t g o e sh e re < /T IT L E . < /H E A D > <BO DY> Y o u r d o c u m e n t g o e s h e re < /B O D Y < /H T M L > B o d y s e c tio n A Complete HTML Page <HTML> <HEAD> <TITLE>The Universal Construction Company</TITLE> </HEAD> <BODY> <IMG SRC="welcome.gif"> <H1>The Universal Construction Company !</H1> <H2>Specialists in Construction Since 1920</H2> <CENTER> <IMG SRC="logo.gif" HEIGHT=60 WIDTH=100> </CENTER> <P> Our Company offers a highly specialized group of <B><I>highly eexperienced engineers </I></B> with the <I><B>state-of-the-art technology </B></I>for constructing modern buildings. Write us at:<BR> P.O. Box 555<BR> Cairo 11511, Egypt<BR> Or visit our <A HREF="http://www.ucc.com/technical/">UCC Technical Site</A> <HR> </BODY> </HTML> XML versus HTML HTML HTML encompasses formatting, structural, and semantic markup. <dt>Hot Cop <dd> by Jacques Morali, Henri Belolo and Victor Willis <ul> <li>Producer: Jacques Morali <li>Publisher: PolyGram Records <li>Length: 6:20 <li>Written: 1978 <li>Artist: Village People </ul> HTML is designed for a specific application; to convey information to human (usually visually, through a web browser). XML XML markup describes a document’s structure and meaning. It does not describe the formatting of the elements on the page. Formatting can be added to a document with a style sheet.. <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR> 1978</YEAR> <ARTIST>Village People</ARTIST> </SONG> XML has no specific application; it is designed for whatever use you need it for. The Web as a Database Application Platform 1- Requirements The ability to access valuable corporate data in a secure manner. Data and vendor independent connectivit y to allow freedom of choice in the selection of the DBMS now and in the future The ability to interface to the database independent of any proprietary Web browser or Web server. A connectivit y solution that takes advantage of all the features of an organization’s DBMS. An open-architecture approach to allow interoperability with a variet y of s ystems and technologies; for example, support for: Different Web servers; Microsoft’s (Distributed) Common Object Model (DCOM/COM); CORBA/IIOP (Internet Inter-ORB protocol); Java/RMI (Remote Method Invocation). A cost-effective solution that allows for scalability, growth, and changes in strategic directions. Support for transactions that span multiple HTTP requests. Support for session- and application-based authentication. Minimal administration overhead. 2- Web-DBMS Architecture The Web-DBMS architecture is a three-tier architecture in which: A Web Browser acting as the “thin” client. A Web Server acting as the application server. A Database Server acting as the third tier. Web – DBMS Approach Advantages Simplicity Platform independence Graphical User Interface Standardization Cross-platform support Transparent network access Scalable deployment Innovation Disadavantages Reliability Security Cost Scalability Limited functionality of HTML Stateless Bandwidth Performance Immaturity of development tools Components of a Web Database Application Web database applications may be created using various approaches. However, there are a number of components that will form essential building blocks for such applications. In other words, a Web database application should comprise the following four layers (i.e., components): Browser layer. Application logic layer. Database connection layer. Database layer. Browser Layer The browser is the client of a Web database application, and it has two major functions: First, it handles the layout and display of HTML documents. Second, it executes the client-side extension functionality such as Java, JavaScript, and ActiveX (a method to extend a browser’s capabilities). The two most popular browsers at the present are Netscape Navigator (Netscape for short) and Microsoft Internet Explorer (IE). They have their own advantages and disadvantages with respect to Web database applications. IE versus Netscape Both browsers support Java, JavaScript, and IE also supports ActiveX. Netscape is supported on numerous platforms, whereas IE runs only on Microsoft systems (e.g., Windows 2000, 98 and NT). IE offers compatibility with other Microsoft products and can easily be integrated with existing tools such as Word, Excel and PowerPoint. The drawback is that IE heavily dependent on the Windows platforms and other Microsoft proprietary systems. ActiveX is designed to extend the functionality of IE, and works only on Windows platforms and Macintosh System 7+. The Application Logic Layer The application logic layer is the part of a Web database application with which a developer will spend the most time. It is responsible for: collecting data for a query (e.g., an SQL statement); preparing and sending the query to the database via the database connection layer; retrieving the results from the connection layer; formatting the data for display. The Database Connection Layer This is the component which actuall y links a database to the Web server. Many current Web database building tools offer database connectivity solutions and they are used to simplify the connection process. The database connection layer provides a link between the application logic layer and the DBMS. Connection solutions come in many forms, such as DBMS net protocols, API (Application Programming Interface) or class libraries, and programs that are themselves database clients. Some of these solutions resulted in tools being specifically designed for developing Web database applications. In Oracle, for example, there are native API libraries for connection and a number of tools, such as Web Publishing Assistant, for developing Oracle applications on the Web. The connection layer within a Web database application must accomplish a number of goals. They have to provide access to the underlying database, and they also need to be easy to use, efficient, flexible, robust, reliable, and secure. Different tools and methods fulfill these goals to different extents. The Database Layer This is the place where the underlying database resides within the Web database application. The database is responsible for storing, retrieving, and updating data based on user requirements, and the DBMS can provide efficiency and security measures. In many cases, when developing a Web database application, the underlying database has already been in existence. A major task, therefore, is to link the database to the Web (the connection layer) and to develop the application logic layer. Database Gateways for Database Connectivity A Web database gateway (middleware)is a bridge between the Web and a DBMS, and its objective is to provide a Web-based application the ability to manipulate data stored in the database. Web database gateways link stateful systems (i.e., databases) with a stateless, connectionless protocol (i.e., HTTP). HTTP is a stateless protocol in the sense that each connection is closed once the server provides a response. Thus, a Web server will not normally keep any record about previous requests. This results in an important difference between a Web-based client-server application and a traditional client-server application. In a Web-based application, only one transaction can occur on a connection. In other words, the connection is created for a specific request from the client. Once the request has been satisfied, the connection is closed. Thus, every request involving access to the database will have to incur the overhead of making the connection. In a traditional application, multiple transactions can occur on the same connection. The overhead of making the connection will only occur once at the beginning of each database session. Types of Solutions There are a number of different ways for creating web database gateways. Generally, they can be grouped into two categories: Client-Side Solutions. Server-Side Solutions. Client-side Solutions The client-side solutions include two types of approaches for connections: browser extensions external applications. Browser extensions are add-ons to the core Web browser that enhance and augment the browser’s original functionality. Specific methods include plug-ins for Netscape and IE, and ActiveX controls for IE. Also, both types of browsers (Netscape and IE) support Java and JavaScript languages (i.e., Java applets and JavaScript can be used to extend browsers’ capabilities). External applications are helper applications or viewers. They are typically existing database clients that reside on the client machine and are launched by the Web browser in a particular Web application. Using external applications is a quick and easy way to bring legacy database applications online, but the resulting system is neither open nor portable. Legacy database clients do not take advantages of the platform independence and language independence available through many Web solutions. Legacy clients are resistant to change, meaning that any modification to the client program must be propagated via costly manual installations throughout the user base. Browser Extensions These types of gateways take advantage of the resources of the client machine to aid server-side database access. Remember, however, it is advantageous to have a thin client. Thus, the scope of such programming on the client-side should be limited. A very large part of the database application should be on the server-side. Browser extensions can be created by incorporating support for: Scripting Languages (JavaScript, Jscript, VBScript, Perl, and PHP) Java (Applets) ActiveX Plug-Ins Scripting Languages JavaScript & JScript JavaScript and Jscript are virtuall y identical interpreted scripting languages from Netscape and MicroSoft, respectively. JavaScript is a scripting language that allows programmers to create and customize applications on the Internet and Intranets. On the client-side, it can be used to perform simple data manipulation such as mathematical calculations and form validation. JavaScript code is normally sent as a part of HTML document and is executed b y the browser upon receipt (the browser must have the script language interpreter). On the server-side, LiveWire (an online development environment for server-side JavaScript) works with Netscape, providing gateway functionality such as access to databases. It should note that JavaScript has little to do with Java language. JavaScript was originall y called LiveScript, but it was changed to benefit from the excitement surrounding Java. The only relationship between JavaScript and Java is a gateway between the former and Java applets (Web applications written in Java). JavaScript provides developers a simple way to access certain properties and methods of Java applets on the same page without having to understand or modify the Java source code of the applet. As a database gateway, JavaScript on the client-side does not offer much without the aid of a complementary approach such as Java, plug-ins, and CGI (Common Gateway Interface). For examples, If a Java applet on a page of HTML has access to a database, a programmer can write JavaScript code to manipulate the applet. If there is a form on the HTML document and if an action parameter for that form refers to a CGI program that has access to a database, a programmer can write JavaScript code to manipulate the data elements within the form and then submit it (i.e., submit a kind of request to a DBMS). JavaScript can improve the performance of a Web database application if it is used for client-side state management. It can eliminate the need to transfer state data repeatedly between the browser and the Web server. Instead of sending an HTTP request each time it updates an application state, it sends the state only once as the final action. However, there are some side effects resulted from this gain in performance. For example, it may result in the application becoming less robust if state management is completely on the client-side. If the client accidentally or deliberately exits, the session state is lost. VBScript VBScript is a Microsoft proprietary interpreted scripting language whose goals and operations are virtually identical to those JavaScript/Jscript. VBScript, however, has syntax more like Visual Basic than Java. VBScript is a procedural language and so uses subroutines as the basic unit. Perl & PHP Perl (Practical Extraction and Report Language) is an interpreted programming language with extensive, easy-to-use text processing capabilities. It is now one of the most widely used languages for server-side programming. Although Perl was originally developed on the Unix platform, it was always intended as a cross-platform language and there is now a version of Perl for the Windows platform (called ActivePerl) PHP (Hypertext Processor) is another popular open source HTML-embedded scripting language that is supported by many Web servers including Apache HTTP Server and Microsoft’s Internet Information Server (IIS), and is the preferred linux Web scripting language. The goal of the language is to allow Web developers to write d ynamicall y-generated pages quickly. A popular choice nowadays is to use the open source combinations of Apache HTTP Server, PHP, and one of the database systems mySQL or PostgreSQL. Java (Applets) As mentioned earlier, Java applets can be manipulated by JavaScript functions to access databases. In general, Java applets can be downloaded into a browser and executed on the client-side (the browser should have the bytecode interpreter). The connection to the database is made through appropriate APIs (Application Programming Interface, such as JDBC and ODBC). ActiveX ActiveX is a way to extend Microsoft IE’s (Internet Explorer) capabilities. An ActiveX control is a component on the browser that adds functionality which cannot be obtained in HTML, such as access to file on the client, other applications, complex user interfaces, and additional hardware devices. ActiveX is similar to Microsoft OLE (Object Linking and Embedding), and ActiveX controls can be developed by any organization and individual. At the present, there are more than one thousand ActiveX controls, including controls for database access, are available for developers to incorporate into Web applications. A number of commercial ActiveX controls offer database connectivity. Because ActiveX has abilities similar to OLE, it supports most or all the functionality available to any Windows program. Like JavaScript, ActiveX can aid in minimizing network traffic. In many cases, this technique results in improved performance. ActiveX can also offer rich GUIs. The more flexible interface, executed entirely on the client-side, make operations more efficient for users. Plug-Ins Plug-ins are D ynamic Link Libraries (DLL) that allow data of new MIME (Multipurpose Internet Mail Extensions) to be viewed or heard. Plug-ins can be installed to run seamlessly inside the browser window, transparent to the user. They have full access to the client’s resources. To create a plug-in, the developer writes an application using the plug-in API and native calls. The code is then compiled as a DLL. Installing a plug-in is just a matter of copying the DLL into the directory where the browser looks for plug-ins. The next time that the browser is run, the MIME type(s) that the new plug-in supports will be opened with the plug-in. One plug-in may support multiple MIME types. There are a number of important issues concerning plug-ins: Plug-ins incur installation requirements. Because the y are native code not packaged with the browser itself, plug-ins must be installed on the client machine. Plug-ins are platform-dependent. Whenever a change is made, it must be made on all supported platforms. The latest version of IE supports the same plug-in architecture as Netscape. Thus, a plug-in should work on both browsers provided they are running on the same platform. Connection to databases Plug-ins can operate like any stand-alone applications on the client-side. They can be used to create direct socket connections to databases via the DBMS net protocols (such as SQL*Net for Oracle). Plug-ins can also use JDBC, ODBC, OLE, and any other methods to connect to databases. Performance Plug-ins are loaded on demand. When a user starts up a browser, the installed plug-ins are registered with the browser along with their supported MIME types, but the plug-in themselves are not loaded. When a plug-in for a particular MIME type is requested, the code is then loaded into memory. Because plug-ins use native code, their executions are fast. External Applications External helper applications can be new or legacy database clients, or a terminal emulator. If there are existing traditional client-server database applications which reside on the same machine as the browser, then they can be launched by the browser and execute as usual. This approach may be an appropriate interim solution for migrating from an existing client-server application to a purely Web-based one. It is straightforward to configure the browser to launch existing applications. It just involves the registration of a new MIME type and the associated application name. Using the external applications approach, the existing database applications need not be changed. However, it means that all the maintenance burdens associated with traditional client-server applications will remain. Any change to the external application will require a very costly reinstallation on all client machines. Because this is not a pure Web-based solution, many advantages offered by Web-based applications cannot be realized. Traditional client-server database applications usually offer good performance. They do not incur the overhead of requiring repeated connections to the database. External database clients can make one connection to the remote database and use that connection for as many transactions as necessary for the session, closing it only when finished. Server-side Solutions Server-side solutions are more widely adopted than the client-side solutions. A main reason for this is that the Web database architecture requires the client to be as thin as possible. The Web server should not only host all the documents, but should also be responsible for dealing with all the requests from the client. Server-side versus Client-side Solutions Server Side listening for HTTP requests. checking the validity of the request. finding the requested resource. requesting authentication if necessary. delivering requested resource. spawning programs if required. passing variables to programs. delivering output of programs to the requester. displaying error message if necessary. Client Side rendering HTML documents. allowing users to navigate HTML links. displaying image. sending HTML form data to a URL. interpreting Java applets. executing plug-ins. executing external helper applications. interpreting JavaScript and other scripting language programs. executing ActiveX controls in case of IE. Web-to-Database Middleware Server-side Solutions CGI (Common Gateway Interface) HTTP Server APIs and Server modules CGI (Common Gateway Interface) CGI is a protocol for allowing Web browsers to communicate with Web servers, such as sending data to the servers. Upon receiving the data, the Web server can then pass them to a specified external program (residing on the server host machine) via environment variables or standard input stream (STDIN). The external program is called a CGI program or CGI script. Because CGI is a protocol, not a library of functions written specifically for any particular Web server, CGI programs/scripts are language-independent. As long as the program/script conforms to the specification of the CGI protocol, it can be written in any language such as C, C++ or Java. In short, CGI is the protocol governing communications among browsers, servers, and CGI programs. In general, a Web server is only able to send documents and to tell a browser what kinds of documents it is sending. By using CGI, the server can also launch external programs (i.e., CGI programs). When the server recognizes that a URL points to a file, it returns the contents of that file. When the URL points to a CGI program, the server will execute it and then sends back the output of the program’s execution to the browser as if it were a file. Before the server launches a CGI program, it prepares a number of environment variables representing the current state of the server, who is requesting the action, and so on. The program collects this information and reads STDIN. It then carries out the necessary processing and writes its output to STDOUT (the standard output stream). In particular, the program must send the MIME header information prior to the main body of the output. This header information specifies the type of the output. The CGI approach enables access to databases from the browser. The Web client can invoke a CGI program/script via a browser, and then the program performs the required action and accesses the database via the gateway. The outcome of accessing the database is then returned to the client via the Web server. The CGI Environment CGI (Common Gateway Interface) Invoking and executing CGI programs from a Web browser is mostly transparent to the user. The following steps need to be taken in order for a CGI program to execute successfully: The user (Web client) calls the CGI program by clicking on a link or by pressing a button. The program can also be invoked when the browser loads an HTML document (hence being able to create a dynamic Web page). The browser contacts the Web server asking for permission to run the CGI program. The server checks the configuration and access files to ensure that the program exists and the client has access authorization to the program. The server prepares the environment variables and launches the program. The program executes and reads the environment variables and STD IN. The program sends the appropriate MIME headers to STDOUT followed by the remainder of the output and terminates. The server sends the data in STDOUT (i.e., the output from the program’s execution) to the browser and closes the connection. CGI (Common Gateway Interface) CGI is the de facto standard for interfacing Web clients and servers with external applications, and is arguably the most commonly adopted approach for interfacing Web applications to data sources (such as databases). Advantages Simplicity, Language independence, Web server independence, Wide acceptance. Disadvantages The first notable drawback of CGI is that the communication between a client (browser) and the database server must always go through the Web server in the middle, which may cause a bottleneck if there is a large number of users accessing the Web server simultaneously. For every request submitted by a Web client or every response delivered by the database server, the Web server has to convert data from or to an HTML document. This incurs a significant overhead to query processing. The second disadvantage of CGI is the lack of efficiency and transaction support in a CGI program. For every query submitted through CGI, the database server has to perform the same logon and logout procedure, even for subsequent queries submitted by the same user. The CGI program could handle queries in batch mode, but then support for online database transactions that contain multiple interactive queries would be difficult. The third major shortcoming of CGI is due to the fact that the server has to generate a new process or thread for each CGI program. For a popular site (like Yahoo), there can easily be hundreds or even thousands of processes competing for memory, disk, and processor time. This situation can incur significant overhead. Last but not least, extra measures have to be taken to ensure server security. CGI itself does not provide any security measures, and therefore, developers of CGI programs must be security conscious. Any request for unauthorized action must be spotted and stopped. HTTP Server APIs and Server Modules HTTP server (Web server) APIs and modules are the server equivalent of browser extensions. The central theme of Web database sites created with HTTP server APIs or modules is that the database access programs coexist with the server. They share the address space and run-time process of the server. This approach is in direct contrast with the architecture of CGI, in which CGI programs run as separate processes and in separate memory spaces from the HTTP server. At the present, there are two main APIs: the Netscape Server API (NSAPI) and Microsoft Information Server API (ISAPI). Instead of creating a separate process for each CGI program, the API offers a way to create an interface between the server and the external programs using dynamic linking or shared objects. Programs are loaded as part of the server, giving them full access to all the I/O functions of the server. In addition, only one copy of the program is loaded and shared among multiple requests to the server. Server modules are just prefabricated applications written in some server APIs. Developers can often purchase commercial modules to aid or replace the development of an application feature. Sometimes, the functionality required in a Web database application can be found as an existing server module. Vendors of Web servers usually provide proprietary server modules to support their products. There are a very large number of server modules that are commercially available, and the number is still rising. For example, Oracle provides the “Oracle PL/SQL” ” module, which contains procedures to drive databasebacked Web sites. The Oracle module supports both NSAPI and ISAPI. Advantages of server APIs and modules Having database access programs coexist with the HTTP server improves Web database access b y improving speed, resource sharing, and the range of functionality. Speed Server API programs run as dynamically loaded libraries or modules. A server API program is usually loaded the first time the resource is requested, and therefore, only the first user who requests that program will incur the overhead of loading the dynamic libraries. Alternatively, the server can force this first instantiation so that no user will incur the loading overhead. This technique is called preloading. Either way, the API approach is more efficient than CGI. Resource sharing Unlike a CGI program, a server API program shares address space with other instances of itself and with the HTTP server. This means that any common data required by the different threads and instances need exist only in one place. This common storage area can be accessed by concurrent and separate instances of the server API program. The same principle applies to common functions and code. The same set of functions and code are loaded just once and can be shared by multiple server API programs. The above techniques save space and improve performance. Range of functionality CGI programs have access to a Web transaction only at certain limited points. It has no control over the HTTP authentication scheme. It has no contact with the inner workings of the HTTP server, because a CGI program is considered external to the server. In contrast, server API programs are closely linked to the server; they exist in conjunction with or as part of the server. They can customize the authentication method as well as transmission encryption methods. Server API programs can also customize the way access logging is performed, providing more detailed transaction logs than are available by default. Provide Web page or site security by inserting an authentication “layer” requiring an identifier and a password outside that of the Web browser’s own security methods. Log incoming and outgoing activity by tracking more information than the Web server does, and store it in a format not limited to those available with the Web server. Serve data out to browsing clients in a different way than the Web server would (or even could) by itself. Overall, server APIs provide a very flexible and powerful solutions in extending the capabilities of Web servers. However, the approach is much more complex than CGI, requiring specialized programmers with a deep understanding of the Web server and with sophisticated programming skills. Important issues Server architecture dependence Server APIs are closely tied to the server they work with. The only way to provide efficient cross-server support is for vendors to adhere to the same API standard. If a common API standard is used, programs written for one server will work just as well with another server. However, setting up standards involves compromises among competitors. In many cases, they are hard to come by. Platform dependence Server APIs and modules are also dependent on computing platforms. Netscape servers are supported on multiple platforms. Nevertheless, each supporting version is dependent on that platform. Similarly, Microsoft server is only available for Windows NT. Programming language Both Netscape servers and Microsoft Web servers can be extended using a variety of programming languages and facilities. In addition, Microsoft provides an application environment called Active Server Pages. Active Server Pages is an open, compile-free application environment in which developers can combine HTML, scripts and reusable ActiveX server components to create dynamic and powerful web-based business solutions. Features Language-independent Runs in differentprocess from core Web server Open standard Web server architectureindependent Supports distributed computing Multiple, extensible roles Memory sharing with other processes Does NOT create new process for each instance Easy to use CGI APIs * * * * * * * * * Connecting to the Database We have seen various approaches that enable browsers to communicate with Web servers, and in turn allow Web clients to have access to databases. For example, CGI and API programs can be invoked by the Web client to access the underlying database. In general, database connectivity solutions include the use of: Native database APIs. Database-independent APIs (such as ODBC). Template-driven database access packages. Third part y class libraries. Tools and Related Technologies Scripting languages such as JavaScript, Perl, and PHP HTTP cookies Netscape API (NSAPI) and Microsoft Information Server API (ISAPI). ODBC (Open DataBase Connectivity) Java and JDBC, SQLJ, Servlets, and JavaServer Pages (JSP). Microsoft’s Web Solution Platform with Active Server Pages (ASPs) and Active Data Objects (ADO) Oracle’s Internet Platform HTTP Cookies One way to make CGI scripts more interactive is to use cookies. A cookie is a piece of information that the client stores on behalf of the server. The information that is stored in the cookie comes from the server as part of the server’s response to an HTTP request. A client may have man y cookies stored at an y given time, each one associated with a particular Web site or Web page. Each time the client visits that site/page, the browser packages the cookie with the HTTP request. The Web server can then use the information in the cookie to identify the user and, depending on the nature of the information collected, possibly personalize the appearance of the Web page. The Web server can also add or change the information within the cookie before returning it. Cookies can be used to store registration information or preferences. Example: $!/bin/sh echo “Content-type:text/html” echo “Set-cookie: UserID=conn-ci0; expires = Friday 30-Apr-02 12:00:00 GMT” eCHO “Set-cookie: Password=guest; expires = Friday 30-Apr-02 12:00:00 GMT” echo “” Netscape API Netscape LiveWire Pro provides server-side JavaScript constructs for database connectivity using the Netscape API (NSAPI). JavaScript is compiled into bytecode and the interpreted by the LiveWire Pro server extension running in conjunction with the Netscape server. JavaScript can accomplish many of the tasks usually associated with retrieving and working with information from a database, including: Connecting to and disconnecting from the database; Beginning, , and rolling back an SQL query; Displaying the results of an SQL query; Creating updateable cursors for viewing, inserting, deleting, and modifying data. Accessing binary large objects (BLOBs) for multimedia content, such as images and sounds. Netscape API – An Example //Connect to the database, and check connection successful database.connect(ORACLE, my_server, auser_name, auser_password, Company_db) If (!database.connected()) write(“Error connecting to database”) else { // Set up a cursor for query; second parameter indicates that updates will occur through cursor myCursor = database.cursor(“SELECT * FROM Employee”, TRUE) // Loop over all the records and update salary field while (myCursor.next()) { myCursor.salary = myCursor.salary * 1.05 myCursor.updateRow(Employee) } // Finally, disconnect from the database database.disconnect() } ODBC (Open DataBase Connectivity) ODBC is an SQL-based product of Microsoft with the objective of providing a common interface for accessing heterogeneous SQL databases. This interface (built on the ‘C’ language) provides a high degree of interoperability: a single application can access different SQL DBMSs through a common set of code. This enables a developer to build and distribute a client-server application without targeting a specific DBMS. ODBC has emerged as a de facto industry standard for the following reasons: Applications are not tied to a proprietary vendor API; SQL statements can be explicitly included in source code or constructed dynamically at runtime; An application can ignore the underlying data communications protocols; Data can be sent and received in a format that is convenient to the application; ODBC is designed in conjunction with the X/Open and ISO Call-Level Interface (CLI) standards; There are ODBC drivers available today for many of the popular DBMSs; ODBC Architecture The ODBC interface defines the following: A librar y of function calls that allow an application to connect to a DBMS, execute SQL statements, and retrieve results; A standard wa y to connect and log on to DBMSs; A standard representation of data t ypes; A standard set of error codes; SQL s yntax based on the X/Open and ISO Call-Level Interface (CLI) specifications. The ODBC architecture has four components: Application – which performs processing and calls ODBC functions to submit SQL statements to the DBMS and to retrieve results from DBMS. Driver Manager – which loads and unloads drivers on behalf of an application. Driver and Database Agent – which process ODBC function calls, submit SQL requests to a specific data source, and return results to the applications. Data source – which consists of the data the user wants to access and its associated DBMS, its host operating system, and network platform, if any. Windows Applications Use ODBC to Access Databases Java Java is a proprietary language developed by Sun Microsystems. Java is rapidly becoming the de facto standard programming language for Web computing. Java is a type-safe, object-oriented, distributed, interpreted, robust, secure, architecture neutral, portable, high-performance, multi-threaded, and dynamic programming language that is interesting because of its potential for building Web applications (applets) and server applications (servlets). J2EE – Java 2 Enterprise Edition J2EE – Java 2 Enterprise Edition – aimed at robust, scalable, multi-user, and secure enterprise application. The cornerstone of J2EE is Enterprise JavaBeans (EJB), a standard for building server-side components in Java. We are particularl y interested in two J2EE components: JDBC JavaServer Pages Enterprise JavaBeans (EJB) is a server-side component architecture for the business tier, encapsulating business and data logic. JDBC Connectivity Using ODBC drivers The Pure JDBC platform SQLJ SQLJ is a specification for Java with static embedded SQL. SQLJ comprises a set of clauses that extend Java to include SQL constructs. An SQLJ translator transforms the SQLJ clauses into standard Java code that accesses the database through a call-level interface. SQLJ is based on static embedded SQL while JDBC is based on dynamic SQL (allows a calling program to compose SQL at runtime). Java Servlets Servlets are programs that run on a Java-enabled Web server and build Web pages, analogous to CGI programming. Servlets have a number of advantages over CGI, such as: Improved performance – with servlets a lightweight thread inside JVM handles each request. Portability – Java servlets adhere to the “write once, run an ywhere” philosophy of Java. Extensibility – Java servlets can utilize Java code from an y source and can access the large set of APIs available for the Java platform. Simpler session management – A typical CGI program uses cookies on either the client or server (or both) to maintain some sense of state or session. On the other hand, servlets can maintain state and session identity because they are persistent and all client requests are processed until the servlet is shut down by the Web server. Improved security and reliability – servlets have the added advantages of benefiting from the in-built Jave security model and inherit Java type safety, making the servlets more reliable. JSP (JavaServer Pages) JSP is a Java-based server-side scripting language that allows static HTML to be mixed with dynamicallygenerated HTML. The HTML developers can use their normal Web page building tools (for example, Microsoft’s FrontPage or Macromedia’s Dreamweaver) and then modify the HTML file and embed the dynamic content within special tags. JSP works with most Web servers including Apache HTTP Server and Microsoft Internet Information Server (with plug-ins from IBM’s WebSphere, LiveSoftware’s Jrun, or New Atlanta’s ServletExec). The JSP engine transforms JSP tags, Java code, and static HTML content into Java code, which is then automatically organized by the JSP engine into an underlying Java servlet, after which the servlet is then automatically compiled into Java bytecodes. Thus, when a client requests a JSP page, a generated precompiled servlet does all the work. JSP gives both efficient performance and the flexibility of rapid development with no need to manually compile code. Microsoft Web Solution Platform Microsoft Web Solution Platform has been created for building and deploying interoperable Web solutions. It is the precursor to Microsoft’s “.net” – a vision for the third generation of the Internet where “software is delivered as a service, accessible by any device, any time, any place, and is fully programmable and personalizable” ”. There are various tools, services, and technologies in the platform such as Windows 2000, Exchange Server, BizTalkServer, Visual Studio, HTML/XML, scripting (JScript, VBScript, …), and components (Java or ActiveX). Object Linking and Embedding (OLE) OLE is an object-oriented technology that enables development of reusable software components. Instead of traditional procedural programming in which each component implements the functionality it requires, the OLE architecture allows applications to use shared objects that provide specific functionality. Objects like text documents, charts, spreadsheets, e-mail messages, graphics, and sound clips all appear as objects to the OLE application. When objects are embedded or linked, they appear within the client application. When the linked data needs to be edited, the user double-clicks the object, and the application that created it is started. Component Object Model Component objects are objects that provide services to other client applications. The Component Object Model (COM)) is an object-based model consisting of both a specification that defines the interface between objects within a system and a concrete implementation, packaged as a Dynamic Link Library (DLL). COM is a service to establish a connection between a client application and an object and its associated services. COM provides a standard method of finding and instantiating objects, and for the communication between the client and the component. One of the major strengths of COM lies in the fact that it provides a binary interoperability standard; that is, the method for bringing the client and object together is independent of any programming language that created the client and object. Distributed Component Object (DCOM) Distributed Component Object Model (DCOM) extends the COM architecture to provide a distributed component-based computing environment, allowing components to look the same to clients on a remote machine as on a local machine. DCOM does this by replacing the interprocess communication between client and component with an appropriate network protocol. DCOM is very suited to the three-tier architecture. COM+ COM+ provides the basis for Microsoft’s new framework for unifying and integrating the PC and the Internet. The Web Solution Platform is “An architectural framework for building modern, scalable, multi-tier distributed computing solutions, that can be delivered over any network”, which defines a common set of services including components. There are several core components to this architecture: Active Server Pages (ASP) ActiveX Data Objects (ADO) Universal Data Access The Microsoft ODBC technology provides a common interface for accessing heterogeneous SQL databases. ODBC has man y limitations when used as a programming interface. Microsoft packaged Access and Visual C++ with Data Access Objects (DAO). The object model of DAO consists of objects such as Databases, TableDefs, QueryDefs, Recordsets, fields, and properties. Next, Microsoft defined a set of data objects, collectivel y known as OLE DB (Object Linking Embedding for DataBases), that allows OLE-oriented applications to share and manipulate sets of data as objects. OLE DB provides low-level access to an y data source, including relational and non-relational databases, email and file systems, text and graphics, custom business objects, and more. OLE DB is an object-oriented specification based on a C++ API. Universal Data Access Active Server Pages and ActiveX Data Objects Active Server Pages (ASP) is a programming model that allows dynamic, interactive Web pages to be created on the Web server, analogous to JavaServer Pages (JSP). The pages can be based on what browser type the user has, on what language the user’s machine supports, and on what personal preferences the user has chosen. ASP was introduced with the Microsoft Internet Information Server (IIS) and supports ActiveX scripting, allowing a large number of different scripting engines to be used, within a single ASP script if necessary. Native support is provided for VBScript (the default scripting language for ASP) and JScript. ASP Architecture ASP provide the flexibility of CGI, without the performance overhead. Unlike CGI, ASP runs in-process with the server, and is multi-threaded and optimized to handle a large volume of users. ASP is built around files with the extension “.asp”, which can contain any combination of the following: Text; HTML tags, Script commands and output expressions. An ASP script starts to run when a browser requests an “.asp” file from the Web server. The Web server then calls ASP, which reads through the requested file from top to bottom, executes any commands, and sends the generated HTML page back to the browser. It is possible to generate client-side scripts within a server-side generated HTML file by simply including the script as text within the ASP script. ActiveX Data Objects Active Data Objects (ADO) is a programming extension of ASP supported by the Microsoft Internet Information Server (IIS) for database connectivity. ADO is designed as an easy-to-use application level interface to OLE DB. ADO supports the following key features: Independent-created objects; Support for stored procedures, with input and output parameters and return parameters; Different cursor t ypes, including the potential for the support of different back-end-specific cursors; Batch updating; Support for limits on the number of returned rows and other quer y goals; Support for multiple recordsets returned from stored procedures or batch statements. Comparison of ASP and JSP ASP and JSP are designed to enable developers to separate page design from programming logic through the use of callable components, and both provide an alternative to CGI programming that simplifies Web page development and deployment. Platform and Server Independence – JSP conforms to the “Write Once, Run Anywhere” philosophy of the Java environment. Thus, JSP can run on any Java-enabled Web server and is supported by a wide variety of vendor tools. In contrast, ASP is primarily restricted to Microsoft Windowa-based platforms. Extensibility – Although both technologies use a combination of scripting and tagging to create d ynamic Web pages, JSP allows developers to extend the JSP tags available. Reusability – JSP components (JavaBeans, EJB, and custom tags) are reusable across platforms. For example, an EJB component can access distributed databases across a variety of platforms (e.g., UNIX, Windows). Security and reliability – JSP has the added advantages of benefiting from the in-built Java security model and the inherent Java type safety, making JSP potentially more reliable. Microsoft Access and Web Page Generation Microsoft Access 2000 provides three wizards for automatic generating HTML pages based on tables, queries, forms, or reports in the database: Static pages – the user can export data to HTML format. Dynamic pages, using Active Server Pages – with this approach, the user can export data to an .asp file on the Web server, by specifying the name of the current database, a username, and password to connect to the database, and the URL of the Web server that will store the ASP file. Dynamic pages, using data access pages – Data access pages are Web pages bound directly to the data in the database. Data access pages can be used like Access forms, except that these pages are stored as external files, rather than within the database or database project. Data access pages are written in dynamic HTML (DHTML), an extension of HTML that allows dynamic objects as part of the Web page. Unlike ASP files, a data access page is created within Access using a wizard or in design view employing many of the same tools that are used to create Access forms. Microsoft .net Microsoft is launching a platform called .net – a vision for the next generation of Internet. This vision is motivated by a shift from individual Web sites to clusters of computers, devices, and mechanisms that collaborate to provide improved user services. The intention is to allow people to control how, when, and what information is delivered to them. Among the components in this new platform are ASP.net and ADO.net: ASP.net – is the next version of ASP that has been reengineered to improve performance and scalability. ADO.net – is the next version of ADO with new classes that expose data access services to programmer. Oracle Internet Platform The Oracle Internet Platform, comprising Oracle Internet Application Server (iAS) and Oracle DBMS, is aimed particularly at providing extensibility for distributed environment. It is n-tier architecture based on industry standards such as: HTTP and HTML/XML for Web enablement. The Object Management Group’s CORBA technology for manipulating objects. Internet Inter-Object Protocol (IIOP) for object interoperability and Remote Method Invocation (RMI). Java, Enterprise JavaBeans (EJB), JDBC and SQLJ for database connectivity, Java servlets, and JSP. It also supports Java Messaging Service (JMS), Java Naming and Directory Interface (JNDI), and it allows stored procedures to be written in Java. Oracle Internet Application Server (iAS)