What is the Web? The World Wide Web is a system of Internet servers through which several Internet protocols can be accessed using a single interface. Most protocols on the internet are available through the web. The web creates a user-friendly environment through which you can access many services. The Web can work with multimedia and advanced programming languages. The main protocol used by the web to transfer information is HyperText Transfer Protocol (HTTP). Hypertext documents contain links that connect to other documents or files. The user can activate these links or 'hot spots' (through a mouse button click, for example) and the target document will then be transferred on to the client machine. These 'hot spots' are created using the HyperText Markup Language (HTML) which can turn picture, text etc into a hyperlink. It is because of these links it is called the Web, and interconnecting web of hypertext documents. Internet and Web – Client/Server Architecture The Internet depends on the client-server architecture. Your computer runs software called the client and it interacts with another piece of software known as the server located on a remote computer. The client is usually a browser. Browsers interact with the server using a set of instructions called protocols. These protocols help in the accurate transfer of data requests made through requests from a browser and responses from the server. An example of client server interaction is as follows: the client (browser) requests an HTML file stored on the remote machine through the server software. The server locates this file and passes it to the client. The client then displays this file on your machine. In this case, the HTML page is static. Static pages do not change until the developer modifies them. There are many protocols available on the Internet. The World Wide Web, which is a part of the Internet, brings all these protocols under one roof. You can thus use HTTP, FTP, Telnet, Email etc through your browser. Example: Suppose you have requested an HTML document from a remote computer using a web browser. The browser searches for the remote computer and when it locates it, your computer passes the request to a program called the server running on this distant computer. The server then examines your request and attempts to locate the HTML file on its hard disk. When the file is located, the server sends this file to your computer. If this HTML document has embedded image, video, and/or sound files, the information and the content of such files are also passed to the browser. When the data is received from the server, the client, (a browser) displays the HTML page. The client (eg: browser) is completely responsible for the display of the web page, with no involvement from the servers' side. Once the server sends the data to the requesting computer, it is finished with the interaction. When all requested data is received, the client-server connection is lost. Thus, the next time this client asks for some information from the server, the server will treat it as a new request without any recollection of previous requests. This means that client-server interaction is "stateless" with every new request generating a new response. Transferring Data on the Web In order to view sites on the web, some type of data transfer must occur, to transfer the web page you wish to view to your computer to enable you to view it. Usually this type of transfer starts with some sort of event. These events can come from different sources, for example, when you launch your web browser and web address or click on a hyperlink, you are generating an event that will transfer data from a server to your computer. Other events can be generated from the instructions in a program. For example there are various programs that can help with the uploading and downloading of data, these programs generate events. What is a Protocol? Definition: In simple terms, protocols are “Agreed-upon methods of communications used by computers.” OR “When data is being transmitted between two or more devices something needs to govern the controls that keep this data intact. A formal description of message formats and the rules two computers must follow to exchange those messages. Protocols can describe low-level details of machine-tomachine interfaces (e.g., the order in which bits and bytes are sent across wire) or high-level exchanges between application programs (e.g., the way in which two programs transfer a file across the Internet).” (Source: http://www.ichnet.org/glossary.htm) Internet Protocols Internet Protocols (set of instructions) are used to transfer files or data from one machine to the other. All computers on the Internet communicate with each other using the Transmission Control Protocol / Internet Protocol (TCP/IP). Thus, data is sent from the server to the client (and vice-versa) using TCP/IP. Mostly, the client is your browser and the server is a program running on a different computer. You use the browser on your computer (called the client machine in Internet lingo) to access the information on another computer (called the server machine). This server machine can be located anywhere in the world. Other protocols used on the internet include: File Transfer Protocol (FTP) used in FTP applications. It is is primarily used to upload and download files. HyperText Transfer Protocol (HTTP) employed on the World Wide Web. The Telnet protocol allows you to connect to another machine. Once connected, your computer behaves like a terminal of the distant machine and you can utilize all the resources on it if you have required permissions SMTP (Simple Mail Transport Protocol): used for email. Internet Programming Languages Although HTML is the most widespread language used on the web, many programming languages are used in combination with HTML. As the web has increased in size, so has the demand for more complex programs and wider ranges of applications. Because of this, a number of tools and languages are becoming standards on the web. These include: 1. CGI (Common Gateway Interface) This allows the web server software to communicate with other programs running on the server. These external programs are called CGI scripts or CGI program and are usually written in Perl or 'C'. CGI programs are generally used to process information submitted via a form on a web page by a visitor. 2. Javascript/Jscript/VBScript Javascript is a programming language, which runs on the browser. NOTE:Javascript is not a subset of Java, infact, the two languages 3. 4. 5. 6. share little in common (yes, they share the basic concepts but the syntax is different); Javascript runs on the browser (client) and does not require any server software. Thus, it is a client-side scripting language. Since all execution takes place on the browser, Javascript is responsible for most of the interactivity on a web page. Form validation, image change or text color change on mouseover, creating mouse trails are all possible through Javascript. Java Developed by Sun Microsystems, Java is a powerful, object-oriented language. A lot many platform dependency issues have been ironed out with the advent of Java Java can be most commonly seen on the Internet in the form of applets embedded in an HTML page. Applets are small Java programs that run on a Java compatible browser. ASP Active Server Pages is a technology promoted by Microsoft. The ASP utilizes some special tags, which can be embedded in the HTML code, to generate dynamic web pages. ASP scripts run on the server, typically, the IIS on Windows NT. ASP pages carry the .asp extension that differentiates them from plain HTML pages and instructs the web server to pass the pages through the ASP interpreter. PHP PHP is a server side scripting language similar to ASP. PHP code is embedded inside the HTML page and can link to databases to generate dynamic HTML content. XML The eXtensible Markup Language is a web page developing language that enables programmers to create customized tags. These customized tags can provide the much-needed functionality not available with HTML. XML documents can be accessed using JSP, PHP etc. URLs - What is a URL? URL stands for Uniform Resource Locator, which means it is a uniform (same everywhere) way to locate a file (resource) on the Internet. The URL specifies where a file is located, and how to get it. Every file on the Internet has a unique address. Web software, such as your browser, use the URL to retrieve a file from the computer on which it resides. The actual URL is a set of four numbers separated by periods (139.179.40.4). These octets are difficult for web users to remember, so many (not all) numeric URLs can be represented by a alphanumeric (text and numbers) value. For example, www.ctp.bilkent.edu.tr The Internet Domain Name System (DNS) translates the alphanumeric address to a numeric value. URL Format: Protocol://site address/path/filename Example: http://www.ctp.bilkent.edu.tr/~russell/outline106.htm The structure of this URL is: Protocol: http Host computer name: www Domain name: simplygraphix Domain type: com Path: /portfolio File name 4.html Like was mentioned before, the protocol does not have to be http, it can also be FTP, File, mailto, https, telnet etc. Site address: The site address consists of the host computer name, the domain name and the domain type. The domain name should be descriptive for easy comprehension and is usually the name of the organization or company. There are various domain types. Some of them are listed below: com: specifies commercial entities net: highlights networks or network providers org: organizations (usually non-profit) edu: colleges and universities (education providers) gov: government agencies mil: military entities of the United States of America For countries other than the U.S.A., the URL can be longer. The general format of such URLs is: machine name. domain name. domain type. country code. This represents a more localized domain name. The country code is a twoletter extension standardized by the International Standards Organization as ISO 3166. Some country codes are given below: tr: Turkey de: Germany ca: Canada jp: Japan uk: United Kingdom Domain types can also be different for different countries. For example, an educational site can have the domain name www.school.ac.uk in the United Kingdom. Thus ac (academic) is used instead of edu. Similarly com is represented as co for Indian domain names. Path name: Path name specifies the hierarchic location of the file on the computer. For instance, in http://www.ctp.bilkent.edu.tr/~russell/outline106.htm the file outline106.htm is located in the russell subdirectory under the servers root directory. Port: Browsers communicate with the server using entry points called ports. A port is the name given to an endpoint of a logical connection. Port numbers identify types of ports. Associated with each protocol is a default port number, such as HTTP defaults to port 80. The server administrator can configure the server to handle http requests at a different port. In such cases, the port number has to be supplied as a part of the URL. The port number is placed at the end of the URL after a colon. www.some-address.com:50 In this example, if the port number is omitted, any http requests are directed to port 80. Common port numbers for other protocols include: 21 FTP 23 Telnet 25 Simple Mail Transfer Protocol (SMTP) 53 Domain Name Server (DNS) 80 Hyper Text Transfer Protocol (HTTP) 107 Remote Telnet Service (rtelnet) 110 Post Office Protocol – Version 3 (POP3) HTTP protocol- What is HTTP? Computers on the World Wide Web use the HyperText Transfer Protocol to talk with each other. The HTTP provides a set of instructions for accurate information exchange. The communication between the client (your browser) and the server (a software located on a remote computer) involves requests sent by the client and responses from the server. Each client-server transaction, whether a request or a response, consists of three main parts 1. A response or request line 2. Header information 3. The body A client connects to the server at port 80 (unless its been changed by the system administrator) and sends in its request. The request line from the client consists of a request method, the address of the file requested and the HTTP version number.