CS4273: Distributed System Technologies and Programming I Lecture 1: Introduction to WWW WWW, HTTP and Java • • • • The world wide web (WWW) is an architecture for accessing linked documents all over the Internet. The earliest web system was developed by a group of researchers in physics at CERN (European Nuclear Research Center) in 89. It’s for the easy distribution of information, such as reports, blueprints, photos, etc. The 1st demo on hypertext was in 91. A graphical web browser Mosaic was released in Feb 1993 (by NCSA). One of the authors left the group to form a company Netscape Communications Corp. A year later Netscape went public in 1995 (sold to America on-Line in 98). A WWW consortium was set up in 1994, aiming at further developing the Web, standardizing protocols, etc. Its home page: http://www.w3.org 2 Architecture of WWW • • • A typical Client/Server model. Client: a web browser that speaks HTTP and understands hypertext. Server: a server process listening to TCP port 80 (port for HTTP svr) for incoming requests. It keeps hypertext (or hypermedia) files for clients’ retrievals. client hyper link server server disk disk Interne t 3 How does a web page get displayed in your web browser? When you click link http://www.cs.cityu.edu.hk/~jia/cs4273/cs4273.html: 1. browser gets the URL from your input. 2. browser asks DNS for the IP address of www.cs.cityu.edu.hk. 3. DNS replies with 144.214.120.3. 4. browser makes a TCP connection to port 80 on 144.214.120.3. 5. it sends a GET /~jia/cs4273/cs4273.html request (http protocol). 6. HTTP server on 144.214.120.3 sends file cs4273.html back to the client 7. browser processes the format of cs4273.html and displays it. 8. browser fetches & displays images embedded in cs4273.html one by one by using GET. 9. TCP connection closes after a timeout. DNS Client Web Server 4 HTTP (Hyper Text Transfer Protocol) HTTP is an ASCII protocol, use telnet to try HTTP commands: slx1% telnet www.cs.cityu.edu.hk 80 Trying 144.214.120.3... Connected to www.cs.cityu.edu.hk. Escape character is '^]'. GET /~jia/cs4273/cs4273.html HTTP/1.1 <cr> Host: www.cs.cityu.edu.hk <cr> <cr> 5 HTTP (Cont.) HTTP/1.1 200 Document follows Date: Mon, 24 Mar 1997 02:08:05 GMT Server: NCSA/1.4.2 Content-type: text/html Last-modified: Mon, 24 Mar 1997 02:07:11 GMT Content-length: 380 <html> <head> <title> Xiaohua Jia's Home Page</title> </head> <body><li> DSc in Information Science, The Univ of Tokyo, Japan, 1991 <li> Msc in Computer Science, The Univ of Sci & Tech of China, 1986 </body> </html> Connection closed by foreign host. slx1% 6 HTTP Built-in Methods Method Description GET request to read a web page HEAD request to read a web page’s header information PUT request to store the content of a web page POST append the content to a named web page DELETE remove the web page LINK connect two existing resources UNLINK break the existing link 7 URL (Uniform Resource Locators) • The browser needs to access different types of data all over the Internet. In order to retrieve a piece of information from the Internet, the browser needs to specify: – what is the data (naming) ? – where it is (locating) ? – what protocol does the server speak ? • URL effectively serves as a page’s worldwide name. It has 3 parts: – protocol name – DNS name (server’s location) – local file name • http://www.cs.cityu.edu.hk/~jia, the same as: http://www.cs.cityu.edu.hk/home/lec/jia/www/index.html 8 Protocols in URL name used for example http hyper text http://www.cs.cityu.edu.hk/~jia ftp ftp ftp://ftp.cs.vu.nl/pub/minix/README file local file access file:///C/My Documents/java/fox6.jpg news news group news:comp.os.minix gopher Gopher gopher://gopher.tc.umn.edu/lib mailto sending mails mailto:kim@acm.org 9 HTML (Hyper Text Markup Language) • By embedding standard markup commands within an HTML file, browsers can display web pages in nice formats. • It is different from word processing software, such as MSword, which is “What You See Is What You Get”. • HTML is an application of ISO standard 8879, SGML (Standard Generalized Markup Language). • You can view the source file of a displayed web page, and you also can save it and use it as the template to write your own web page! For more about HTML, http://www.ncsa.uiuc.edu/General/Internet/WWW 10 Example of a home page in HTML <html> <head> <TITLE>A Simple HTML Example</TITLE> </head> <body> <H1>HTML is Easy To Learn</H1> <P>Welcome to the world of HTML. This is the first paragraph. While short it is still a paragraph!</P> <P>And this is the second paragraph.</P> </body> </html> 11 Main Features of HTML • HTML is static, only for displaying information. • HTML can load image files. • HTML has fill-out forms to collect information from clients and pass it back to the server (a CGI program) for processing. • There are many tools to help you design web pages, such as FrontPage, Dreamweaver, etc. 12 Limitations of HTML • • • Lack of extensibility. It does not allow you to define your own tags or attributes. Lack of structure or data description. It has no schema description and no support for checking structural validity. Mixture of data & display formats. It is difficult for data search and not efficient for data transfer (redundant transmission of formatting statements). 13 XML (Extensible Markup Language) • XML was developed to enhance data description and data exchange over the Internet. • XML is an intermediate language for data exchange. It is NOT a language for end-presentation. An XML document need be converted into other languages for end-presentation, such as HTML, WML (Wireless ML), CML (Commerce ML), MathML (Mathematical ML), SpeechML, etc. • Both XML and HTML come from SGML. HTML is an “application” of SGML, while XML is a subset of SGML. XML contains about 20% of SGML’s syntax but has over 80% of its power. • XML is a meta language, which allows you to define your own markup language. 14 A Silly Example of Using XML //silly.html, embedding xml tags in HTML file <html> <body> <xml id="cdcat"> <CATALOG> <CD> <TITLE>Empire Burlesque</TITLE> <ARTIST>Bob Dylan</ARTIST> </CD> <CD> <TITLE>Hide your heart</TITLE> <ARTIST>Bonnie Tyler</ARTIST> </CD> </CATALOG> </xml> <table border="1" datasrc="#cdcat" align="center"> <tr> <td><span datafld="ARTIST"></span></td> <td><span datafld="TITLE"></span></td> </tr> </table></body> </html> Demo at http://www.cs.cityu.edu.hk/~jia/xml/silly.html 15 A More Sensible Example of Embedding XML in HTML XML separates the data from HTML (formatting statements). In the following example, xml data are in a separate file “CDdata.xml”. <html> <body> <xml id="cdcat" src=“CDdata.xml"></xml> <table border="1" datasrc="#cdcat" align="center"> <tr> <td><span datafld="ARTIST"></span></td> <td><span datafld="TITLE"></span></td> <td><span datafld="YEAR"></span></td> </tr> </table> </body> </html> Demo at http://www.cs.cityu.edu.hk/~jia/xml/CDdata.html 16 XML document An XML document usually consists of three parts (files): • Document Type Definition (DTD): define the logical structure and storage layout. – See an example in www/xml/book.dtd • Entities (XML): entities are data, whose types are defined in .dtd. It has a pointer to the .xsl file to display it. – See an example in www/xml/book.xml • Display format (XSL): a program to process and display XML data. – See an example in www/xml/book.xsl 17 XSL (eXtensible Style-sheet Language) XSL was proposed to specify the display format of an XML document: • XSL is almost a fully fledged programming language, which allows you to program the processing / formatting of XML data. It has instructions such as, apply-templates, for-each, value-of, etc. • XSL files take extension “.xsl”. The same set of XML data can have multiple “.xsl” files for different display format. • XML data is separated from display format, which makes transmission of XML data efficient and clean (easy to search & read). See demo at – http://www.cs.cityu.edu.hk/~jia/xml/book.xml – a different display style in book1.xsl 18 Web Programming Technologies at Client – Server Sides Client side technologies Server side technologies JavaScript/JScript VBScript ASP (Active Server Page) Java Applet Java Servlet / JSP (JavaServer Page) CGI (in C, C++, Java, Perl, etc.) Perl (Practical extraction & report lang) Script 19 Client Side Technology – JavaScript • • • • JavaScript was originally created by Netscape. Jscript is Microsoft’s version of JavaScript. It is a script language, used in HTML documents and executed in a web browser. IE and Netscape contain the JavaScript interpreter. It is a fully powered programming language, which has complex data types, control structures, function definitions, event handlings. It is a de facto standard of client side script language. Demo at http://www.cs.cityu.edu.hk/~jia/JavaScript/Janimation.html or JScriptTest.html 20 Client Side Technology – VBScript • VBScript (Visual Basic Script) is a subset of Microsoft Visual Basic. • VBScript is particularly valuable when it is used with Microsoft ASP (Active Server Pages), a server side program that creates dynamic content sent to the client’s browsers. • VBScript is the de facto language for ASP. Demo at http://www.cs.cityu.edu.hk/~jia/ASP/VBScriptTest.html 21 An Example of Server-side Technology: ASP <%@ Language = "VBScript" %> <html> <head> <title > An ASP Example </title></head> <body> <form name="form1" method=“POST"> <input name="TextBox1" type="text" id="TextBox1" /> <% ss = Request.Form("TextBox1") %> <input name="TextBox2" type="text" id="TextBox2" value="<%= ss %>" /> <input type="submit" name="Button1" value="Button" id="Button1" /> </form> </body> </html> N.B. ASP statements are enclosed by <% %>. Demo at http://msec2.cs.cityu.edu.hk/jia/ASP/ASPtest.asp 22 Client Scripting vs. Server Scripting • Client-side scripting interacts directly with end-users, which reduces the number of trips to the server and makes uses of browser’s functions for display control. • Client scripting is browser dependent. • Client scripts are viewable. • Server-side scripts, e.g. ASP, are executed on the server, which generates responses sent to the browser for display (web server must have the script parser). • When a client requests an ASP file, it is parsed by an ActiveX component, and scripting codes (i.e., ASP statements) are executed as they are encountered. • Each time when there’s a client event, e.g., button-click or data-input, the browser asks the server to process the event and send back the new display file. 23 Java Applet (a Client Side Technology) Java Architecture • Java is a programming language which is Object Oriented, platformindependent, interpreted, multithreading, fast, secure, robust, …… • A Java source program is compiled to Java byte-code. • A Java virtual machine sitting on each platform, interpreting Java bytecode to local machine instructions and executing them. Java Source Code Java Complie r Java Byte-code (Applets) Java Virtual Machine Native Machine Instructions 24 An easy start of Java Edit a Java program in file “helloJava.java”: class helloJava { public static void main(String args[]) { System.out.println(“Hello Java!”); } } Compile the program: slx1% javac helloJava.java it generates a file “helloJava.class” which is Java byte-code (note: the .class file takes the same name as the .java program file). Run Java interpreter to execute it: slx1% java helloJava // no need of extension name .class 25 An Easy Start of Java Applet Two kinds of Java programs: • Ordinary application programs: run independently as an application. • Applets: require supporting environment like IE or AppletViewer. 26 Our First Java Applet Edit the applet program in a file “helloApplet.java” (note: the file name must be the same as the applet name in the program!): import java.awt.*; import javax.swing.*; public class helloApplet extends JApplet { public void paint(Graphics g) { g.drawString(“Hello Java Applet!”, 5, 50); } } Compile the program: slx1% javac helloApplet.java it produces a file “helloApplet.class”. Embed the applet into an HTML file: …. <applet code=”helloApplet.class” width=150 height=25></applet> Demo at http://www.cs.cityu.edu.hk/~jia/java/gui/test.html 27