GEC Group of Colleges Dept. of Computer Sc. & Engg. and IT Teaching Notes CS-802 Web Engg Prepared by Ms.Priyashree sharma 1 WEB ENGINEERING (CS-802) UNIT-1 Web Engineering: Introduction, History, Evolution and Need, Time line, Motivation, Categories &Characteristics of Web Applications, Web Engineering Models, Software Engineering v/s Web Engineering. World Wide Web: Introduction to TCP/IP and WAP, DNS, Email, TelNet, HTTP and FTP. Browser and search engines: Introduction, Search fundamentals, Search strategies, Directories search engines and Meta search engines, Working of the search engines.Web Servers: Introduction, Features, caching, case study-IIS, Apache. UNIT-2 Information Architecture:Role, Collaboration and Communication, Organizing Information, Organizational Challenges, Organizing Web sites parameters and Intranets Website Design:Development, Development phases, Design issues, Conceptual Design, High-Level Design, Indexing the Right Stuff, Grouping Content. Architectural Page Mockups, Design Sketches, Navigation Systems. Searching Systems, Good & bad web design, Process of Web Publishing. Web site enhancement, submission of website to search engines. Web security:issues, security audit. Web effort estimation, Productivity Measurement,Quality usability and reliability. 2 Requirements Engineering for WebApplications:Introduction,Fundamentals, Requirement Source, Type, ,Notations Tools. Principles Requirements Engineering Activities, Adapting RE Methods to Web Application. UNIT-3 Technologies for Web Applications I:HTML and DHTML: Introduction, Structure of documents, Elements, Linking, Anchor Attributes,Image Maps,MetaInformation,ImagePreliminaries,Layouts,Backgrounds,Colors and Text, Fonts, Tables, Frames and layers, Audio and Video with HTML Database integration, CSS, Positioning with Style sheets, Forms Control, Form Elements. Introduction to CGI, PERL, JAVA SCRIPT, JSP, PHP, ASP & AJAX. Cookies: Creating and Reading UNIT-4 Technologies for Web Applications XML: Introduction, HTML Vs XML, Validation of documents, DTD, Ways to use, XML for data files, Embedding XML into HTML documents, Converting XML to HTML for Display, Displaying XML using CSS and XSL, Rewriting HTML as XML,Relationship between HTML, SGML and web personalization , Semantic web, Semantic Web Services,Ontology. UNIT-5 E Commerce:Business Models, Infrastructure, Creating an Ecommerce Web Site, Environment and Opportunities. Modes & Approaches, Marketing & Advertising Concepts. Electronic Publishing issues, approaches, legalities and technologies, Secure Web document, Digital Signatures and Firewalls, Cyber crime and laws, IT Act.Electronic Cash, Electronic Payment Systems:RTGS, NEFT, Internet Banking, Credit/Debit Card. Security:Digital Certificates & Signatures, SSL, SET, 3D Secure Protocol. 3 INDEX S.NO PAGE NO. 1 NAME OF TOPIC INTRODUCTION OF WEB ENGG 2 EVOLUTION OF WEB ENGINEERING 6 3 NEED OF WEB ENGINEERING 7 4 CATEGORIES OF WEB APPLICATIONS 8 5 5 CHARACTERISTICS AND COMPLEXITY OF 9 WEB APPLICATIONS 6 WEB ENGINEERING MODELS 7 SOFTWARE ENGINEERING (SWE) V/s WEB 13 ENGINERING (WEBE) 8 WORLD WIDE WEB (WWW) 14 9 TCP /IP PROTOCOL 15-16 10 WAP (WIRELESS APPLICATION PROTOCOL 17-18 11 DOMAIN NAME SPACE 19-20 12 E- MAIL (ELECTRONIC MAIL) 21 13 14 TELNET 10-12 22 HYPERTEXT TRANSFER PROTOCOL 4 23 15 FILE TRANSFER PROTOCOL 24 16 BROWSERS AND SEARCH ENGINES 25 17 SEARCH FUNDAMENTALS 26 18 DIRECTORIES SEARCH ENGINES AND META 27 SEARCH ENGINES 19 WORKING OF SEARCH ENGINES 28-29 20 WEB SERVERS 30 21 CACHING 31 22 IIS 32 23 APACHE 33 5 INTRODUCTION Web engineering actively promotes systematic, disciplined and quantifiable approaches towards successful development of high-quality, ubiquitously usable Web-based systems and applications. Web engineering focuses on the methodologies, techniques and tools that are the foundation of Web application development and which support their design, development, evolution, and evaluation. Web application development has certain characteristics that make it different from traditional software, information system, or computer application development. Web engineering is multidisciplinary and encompasses contributions from diverse areas: systems analysis and design, software engineering, hypermedia/hypertext engineering, requirements engineering, humancomputer interaction, user interface, information engineering, information indexing and retrieval, testing, modelling and simulation, project management, and graphic design and presentation. Web engineering is neither a clone, nor a subset of software engineering, although both involve programming and software development. While Web Engineering uses software engineering principles, it encompasses new approaches, methodologies, tools, techniques, and guidelines to meet the unique requirements. Proponents of Web engineering supported the establishment of Web engineering as a discipline at an early stage of Web. First Workshop on Web Engineering was held in conjunction with World Wide Web Conference held in Brisbane, Australia, in 1998. San Murugesan, Yogesh Deshpande, Steve Hansen and Athula Ginige, from University of Western Sydney, Australia formally promoted Web engineering as a new discipline in the first ICSE workshop on Web Engineering in 1999.Since then they published a series of papers in a number of journals, conferences and magazines to promote their view and got wide support. Major arguments for Web engineering as a new discipline are: Web-based Information Systems (WIS) development process is different and unique. 6 Web engineering is multi-disciplinary; no single discipline (such as software engineering) can provide complete theory basis, body of knowledge and practices to guide WIS development. Issues of evolution and lifecycle management when compared to more 'traditional' applications. EVOLUTION OF WEB ENGINEERING Web development within an organization depends upon several factors. The motivation depends upon the initial purpose of using the Web (Web 'presence' or becoming a Web-based organization), the customer’s expectations and the competitive environment.The drive to systematize development is subject to overall perception of the Web and conscious policy decisions within the organization. For example, a low level perception of the Web is likely to lead to ad hoc, sporadic efforts.As a starting point in understanding the problem domains that the Web currently can address, Table 3 presents a taxonomy of Web applications updated after Ginige and Murugesan.The order of these categories roughly illustrates the evolution of Web applications. Organizations that started their Web development early may also have followed a similar order in the past. Although, it is possible to start Web development with applications in any category, this table has been useful to explain to organizations with modest presence on the Web how they might improve or benefit from incremental exposure, thus keeping the risks to the minimum. 7 NEED FOR WEB ENGINEERING The need for Web Engineering is felt (or dismissed) according to perceptions of the developers and Managers, their experiences in creating applications made feasible by the new technologies, and the Complexity of Web applications.In the early stages of Web development, White and powell1, Identified and emphasized the need for engineering as in Web Document Engineering and Web Site engineering. Web Engineering, more generally, explicitly recognizes the fact that good. Development requires multidisciplinary efforts and does not fit neatly into any of the existing disciplines. 8 CATEGORIES OF WEB APPLICATIONS CATEGORIES EXAMPLES Informational Online newspapers, product catalogues, newsletters, service manuals, classifieds, e-books Interactive Registration forms, customized information User-provided information presentation, games Customized access Transaction E-shopping, ordering goods and services, banking Workflow Planning and scheduling systems, inventory management, status monitoring Collaborative work Distributed authoring systems, Environment collaborative design , tools Online communities, marketplace Chat groups, recommender systems, Marketplaces, auctions Web Portals electronic shopping malls, intermediaries Web Services Enterprise applications, information and business intermediaries 9 CHARACTERISTICS AND COMPLEXITY OF WEB APPLICATIONS Web applications vary widely: from small-scale, short-lived services to large-scale enterprise Applications distributed across the Internet and corporate intranets. Over the years, Web applications have evolved and become more complex ñ they range from simple, read-only applications, to full-fledged information systems. This complexity may be in terms of performance (number of hits per second), for example the Slashdot site Olympics sites receiving hundreds of thousands of hits per minute in terms of dynamic nature of information, the use of multimedia or in other ways.They may provide vast, dynamic information in multiple media formats (graphics, images and video) or may be relatively simple. Nevertheless, they all demand balance between information content, aesthetics and performance. Characteristics of Web Applications Simple Web-based system Primarily textual information in noncore applications Information content fairly static Simple navigation Infrequent access or limited usefulness Limited interactivity and functionality Standalone system Developed by a single individual or by a very small team Security requirements minimal (because of mainly one-way flow of information) Easy to create Advanced web based systems Dynamic Web pages because information Changes with time and user’s needs. Large volume of information Difficult to navigate and find information Integrated with database and other Deployed in mission-critical applications Prepared for seamless evolution May require a larger development team with expertise in diverse areas Calls for risk or security assessment and management Needs configuration control and management Feedback from users either Necessitates project plan and unnecessary or not sought management Web site mainly as an identity for the Requires a sound development process current clientele, and not as a medium and for methodology communication 10 WEB ENGINEERING MODELS There are various types of web engineering models they are as follows 1) Content- The content model represents the domain concepts and the relationships between them. 2) Navigation- The navigation model is used to represent navigable nodes and the links between nodes. 3) Presentation- The presentation model provides an abstract view on the user interface UI) of a web application. It is a platform-independent specification without coincide ring concrete aspects like colors, fonts, and position of UI elements. 4) Process- The process model visualizes the workflows of the processes which are invoked from certain navigation nodes. Our MDE approach comprise the generation of draft models of each concern, i.e. Initial versions that require further refinement. In the following paragraphs we sketches the main modeling elements for the main concerns and gives an overview to the model transformation (informal description). Content models are represented in UWE as plain UML class diagrams. A First draft of a content model is obtained by a set of model-to-model transformation s using as source models the use cases and the corresponding workflows (graphically represented as activity diagrams). These transformations are Objects nodes that model the data used in the workflows are translated into content Classes using the name of the object note as the class name.If an action pin is connected to an object node directly or through an action, then it can be assumed that this pin represents a property of the class modeled by the object node.In that way, the name of the pin is used to determine whether an attribute or association is created by comparing the name with existing content classes.Navigation Models describe the navigation structure of a web application using a set of stereotyped classes defined for the web domain, such as navigation classes and links, menus, etc.Link book navigation model which was generated based on the requirements models.The following is a very brief overview of some modeling elements part of the Uww profile .A navigation Class (visualized as) represents a navigable node of the hypertext structure; a navigation Link shows a direct link between navigation classes. Alternative navigation paths are handled by menu) and the so-called access primitives are used to reach multiple instances of a navigation class (index), or to select items query.Web applications frequently support business logic as well. An entry and/or exit points of the business processes is modeled by a process Class in the navigation model, the linkage between each other and to the navigation 11 classes is modeled by a process Link. The model transformations from requirements (use cases and workflows) to the Navigation structure model encompass the following steps: Creation of navigation Classes for browsing, use cases; processing use Cases are transformed into process Classes.Tagged values of the use cases are transformed into equally named tags of the Generated classes .Relationships between use cases are translated into associations between created navigation and process classes. The associations are stereotyped with Process Link if at least one related class is a process class, navigation-Link, otherwise.: Generated navigation model a menu is introduced whenever a navigation class has several outgoing links.The source of the links is changed to the menu which is connected to the navigation class by a composition.A navigation class can be created to serve as home of the application, if it has not been modeled explicitly.In addition, each process class included in the navigation specification can be modeled as a detailed workflow in the form of a UML activity diagram (not included in this work). It is the result of a refinement process that starts from the workflow of the Requirements model.Presentation models are designed based on the information provided by the navigation models and the information available in workflows of the requirements models, e.g. Rich UI features. A UML nested class diagram is selected as visualization technique. Presentation model describes the basic structure of the user interface, i.e., which UI elements (e.g. text, images, anchors, forms) are used to represent the navigation nodes.The basic presentation modeling elements are the presentation Group which are directly based on nodes from the navigation model, i.e. navigation classes, menus, access primitives, and process classes. A presentation group) or a form are used to include a set of other UI elements, liketex (), text Input(), button(), selection() etc.The top level elements of the presentation model are classes with the stereotype presentation Group.The second level of presentation elements consists of input and Output elements. Presentation model similarly to the navigation model requires a Main class, which is not modeled explicitly during the requirements specification. This presentation group is named Home and contains all presentation groups created from use cases inside a class presentation Alternatives and an anchor for every presentation group. 12 SOFTWARE ENGINEERING (SWE) V/s WEB ENGINERING (WEBE) Although both of them follows the disciplined approach to develop, deploy and maintained the applications.The basic difference is that the requirements (Scope) of web projects is different from the software projects.Software projects have the various models like Waterfall, Spiral, Incremental, etc.But there is no defined models for Web Applications project, as the requirements are dynamic (not fixed). For simplicity we can define model for web projects as PDCA model. (P)Lan, (D)o, (C)heck and (A)ct.In the planning stage, you ‘Plan’ as what are the requirements, concept, planning, costing, and timeline and get the approval from the customer before starting the project.Next comes the ‘Do’ – which is defined as “How the concept has to be designed and developed”.Here the prototype (models base on blueprint) has to be build and then get it reviewed from the customer. PDCA cycle is the base of all the models.WEBE is more complex then SWE, as former is dependent on the various types of browsers, OS, and servers like Web server, application servers.WEB Apps is more complex then SW Apps, in the sense that to build such applications, you have to know at least HTML, Database, Server side scripting language, Java scripts and Photoshop for editing images. WORLD WIDE WEB (WWW) The World Wide Web is a system of Internet servers that support specially formatted documents. The documents are formatted in a markup language called HTML (Hypertext Markup Language) that supports links to other 13 documents, as well as graphics, audio, and video files.This means you can jump from one document to another simply by clicking on hot spots. Not all Internet servers are part of the World Wide Web.Individual document pages on the World Wide Web are called web pages and are accessed with a software application running on the user's computer, commonly called a web browser.Web pages may contain text, images, videos, and other multimedia components, as well as navigation features consisting of hyperlinks. TCP /IP PROTOCOL There are four layers of the TCP/IP reference model (DARPA model as named by the US Government Agency) The ISO-OSI reference model is composed of seven layers.Transmission Control Protocol (TCP) One-to-one and connection-oriented reliable protocol Used in the accurate transmission of large amount of data Slower compared to UDP because of additional error checking being performed. 14 LAYERS OF TCP/IP APPLICATION TRANSPORT INTERNET DATA LINK PHYSICAL APPLICATION LAYER Provides applications with the ability to access the services of the other layers. New protocols and services are always being developed in this category. TRANSPORT LAYER Sequencing and transmission of packets Acknowledgment of receipts Recovery of packets Flow control In essence, it engages in host-to-host transportation of data packets and the delivery of them to the application layer. INTERNET LAYER This layer is also known as Internet layer. The main purpose of this layer is to organize or handle the movement of data on network. By movement of data, we generally mean routing of data over the network. The main protocol used at this layer is IP. While ICMP used by popular ‘ping’ command) and IGMP are also used at this layer. 15 DATA LINK LAYER The data link layer is the protocol layer that transfers data between adjacent network nodes in a wide area network or between nodes on the same local area network segment.The data link layer provides the functional and procedural means to transfer data between network entities and might provide the means to detect and possibly correct errors that may occur in the physical layer. PHYSICAL LAYER The physical layer consists of the basic networking hardware transmission technologies of a network. It is a fundamental layer underlying the logical data structures of the higher level functions in a network.The physical layer defines the means of transmitting raw bits rather than logical data packets over a physical link connecting network nodes. WAP (WIRELESS APPLICATION PROTOCOL) Wireless Application Protocol (WAP) is a technical standard for accessing information over a mobile wireless network. A WAP browser is a web browser for mobile devices such as mobile phones that uses the protocol. Before the introduction of WAP, mobile service providers had limited opportunities to offer interactive data services, but needed interactivity to support Internet and Web applications such as: Email by mobile phone Tracking of stock-market prices Sports results News headlines 16 Music downloads The WAP standard described a protocol suite allowing the interoperability of WAP equipment, and software with different network technologies SUCH AS GSM AND IS-95 The original WAP model provided a simple platform for access to web-like WML services and e-mail using mobile phones in Europe and the SE Asian regions. As of 2009 it continues with a considerable user base. The later versions of WAP, primarily targeting the United States market, were designed for a different requirement - to enable full web XHTML access using mobile devices with a higher specification and cost, and with a higher degree of software complexity. Considerable discussion has addressed the question whether the WAP protocol design was appropriate.Some have suggested that the bandwidth-sparingsimple interface of Gopher would be a better match for mobile phones and Personal digital assistants (PDAs).[ The initial design of WAP specifically aimed at protocol independence across a range of different protocols (SMS, IP over PPP over a circuit switched bearer, IP over GPRS, etc.).This has led to a protocol considerably more complex than an approach directly over IP might have caused. Most controversial, especially for many from the IP side, was the design of WAP over IP. WAP's transmission layer protocol, WTP, uses its own retransmission mechanisms over UDP to attempt to solve the problem of the inadequacy of TCP over high-packet-loss networks. 17 DOMAIN NAME SPACE The domain name space consists of a tree of domain names. Each node or leaf in the tree has zero or more resource records, which hold information associated with the domain name.The tree sub-divides into zones beginning at the root zone. A DNS zone may consist of only one domain, or may consist of many domains and sub-domains, depending on the administrative authority delegated to the manager. The hierarchical Domain Name System, organized into zones, each served by a name server Administrative responsibility over any zone may be divided by creating additional zones.Authority is said to be delegated for a portion of the old space, usually in the form of sub-domains, to another name server and administrative entity. The old zone ceases to be authoritative for the new zone. 18 DNS (DOMAIN NAME SYSTEM) The Domain Name System (DNS) is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network.It associates various information with domain names assigned to each of the participating entities. Most prominently, it translates domain names, which can be easily memorized by humans, to the numerical IP addresses needed for the purpose of computer services and devices worldwide.The Domain Name System is an essential component of the functionality of most Internet services because it is the Internet's primary directory service. An often-used analogy to explain the Domain Name System is that it serves as the phone book for the Internet by translating human-friendly computer hostnames into IP addresses.For example, the domain name www.example.com translates to the addresses 93.184.216.119 (IPv4) and 2606:2800:220:6d: 26bf:1447:1097:aa7 (IPv6).Unlike a phone book, the DNS can be quickly updated, allowing a service's location on the network to change without affecting the end users, who continue to use the same host name.Users take advantage of this when they use meaningful Uniform Resource Locators (URLs), and e-mail addresses without having to know how the computer actually locates the services. E- MAIL (ELECTRONIC MAIL) Electronic mail, most commonly referred to as email or e-mail since c 1993, is a method of exchanging digital messages from an author to one or more recipients. 19 Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the same time, in common with instant messaging.Today's email systems are based on a store-and-forward model. Email servers accept, forward, deliver, and store messages.Neither the users nor their computers are required to be online simultaneously; they need connect only briefly, typically to a mail server, for as long as it takes to send or receive messages.Historically, the term electronic mail was used generically for any electronic document transmission. For example, several writers in the early 1970s used the term to describe fax document transmission.A result, it is difficult to find the first citation for the use of the term with the more specific meaning it has today.An Internet email message consists of three components, the message envelope, the message header, and the message body.The message header contains control information, including, minimally, an originator's email address and one or more recipient addresses. Usually descriptive information is also added, such as a subject header field and a message submission date/time stamp. For an Email An account on a mail server and supporting software on your PC The username and password will allow you to access your account All e-mail programs allow you to Send, Compose, Reply, and Forward mail Send mail electronically via the Internet. TELNET A terminal emulation program for TCP/IP networks such as the Internet. The Telnet program runs on your computer and connects your PC to a server on the network.You can then enter commands through the Telnet program and they will be executed as if you were entering them directly on the server console.This enables you to control the server and communicate with other servers on the 20 network. To start a Telnet session, you must log in to a server by entering a valid username and password.Telnet is a common way to remotely control Web servers. HYPERTEXT TRANSFER PROTOCOL The Hypertext Transfer Protocol (HTTP) is an application for distributed, collaborative, hypermedia information systems.HTTP is the foundation of data communication for the World.Hypertext is structured text that uses logical links (hyperlinks) between nodes containing text. HTTP is the protocol to exchange or transfer hypertext.The standards development of HTTP was coordinated by the Internet Engineering Task Force (IETF) and the World Wide Web Consortium(W3C), culminating in the publication of a series of Requests for Comments (RFCs), most notably RFC 2616 (June 1999), which defined HTTP/1.1, the version of HTTP most commonly used today.Default port number is 80. 21 FILE TRANSFER PROTOCOL The File Transfer Protocol (FTP) is a standard network protocol used to transfer computer files from one host to another host over a TCP-based network, such as the Internet.FTP is built on a client-server architecture and uses separate control and data connections between the client and the server. Used for downloading from most MP3 sites, Designed for faster file transfer over the Internet compared to using the HTTP protocol FTP sites can be configured alongside a web site to support FTP file transfer FTP default ports are 20 and 21. 22 BROWSERS AND SEARCH ENGINES A web browser is considered a software application that allows people to access, retrieve and view information on the internet. The information that may be “browsed” can be in the form of text content on a web page, an image, video, audio etc.The most popular web browsers currently in use are Firefox, Google Chrome, Internet Explorer, Opera and Safari.The main purpose of a search engine is to search for information on the Internet. They are software programs that search for websites based on keywords that the user types in. The search engine then goes through their databases of information in order to locate the information you are looking for.The main search engines currently be used are Google, Bing, and Yahoo. SEARCH FUNDAMENTALS It includes Information bar, Search form area, Directory area, Links 23 • Search Terminology- Search tool, Query, Query syntax, Query semantics, Hit, Match, Relevancy score • Pattern Matching Queries- Enter Keyword(s), Search Engine returns URLs. • Boolean Queries- George Boole, AND OR NOT • Search Domain- Current Web, Newsgroups, Specialized Databases, Internet • Search Subjects-A way to view the search queries of anonymous users in real time such as How busy , “Spy” on other users, “See” modifications, Various interests, Personal interests SEARCH STRATEGIES • Wildcard- A special character that can be added to a phrase while searching and the search engine or subject directory looks for all possible endings. The results will provide all possible documents in their database that have those letters. • Plus and Minus Signs-Used before a keyword or phrase should retrieve results that include that specific keyword or phrase. The minus sign used before a keyword or phrase should retrieve results that exclude that specific keyword or phrase. • Quotation Marks and Brackets- Assist with narrowing the search results from the search tools. When quotation marks or brackets are used, the search engine will only retrieve documents that have those key terms appearing together. • Pipe Symbol (|) the pipe (|) symbol, located on most keyboards on the righthand side between the delete a return key will assist with narrowing down results within a broad category. • Boolean Operators- Used the same way the plus and minus signs are used. The AND Boolean operator, is similar to the plus sign and the NOT Boolean operator is similar to the minus sign. The OR Boolean operator tells the search engine to retrieve one term or the other. • Near- Indicates to the search tools that those terms must be located within a certain number of words. The results may vary depending on the search tool. To illustrate, some search tools may try to locate the terms within 2, 10 or 25 words of each other. The command to use is NEAR/# • Nesting- Allows the user to perform multiple tasks and build a complex search. The parentheses are used to group the key words and Boolean operators together. This is an excellent technique for complex searching. 24 DIRECTORIES SEARCH ENGINES AND META SEARCH ENGINES Subject directories usually have smaller databases than search engines. Directories classify web documents or sites into a subject classification scheme; they are usually compiled by hand or in some type of logical order. Subject directories also begin with general topics allow the user to narrow to a specific category. They usually provide limited search results of available pages on the Web. The information collected from a subject directory will generally contain more related information dealing with a particular subject matter. Information retrieved from subject directories will not be as comprehensive as the information located from a search engine. Many of these directories include browsing and searching capabilities .A meta-search engine is a search tool that sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source. Meta search engines enable users to enter search criteria once and access several search engines simultaneously. Meta search engines operate on the premise that the Web is too large for any one search engine to index it all and that more comprehensive search results can be obtained by combining the results from several search engines.This also may save the user from having to use multiple search engines separately. WORKING OF SEARCH ENGINES Search engines have two major functions: crawling and building an index, and providing search users with a ranked list of the websites they've determined are the most relevant.Crawling and Indexing- The billions of documents, pages, files, news, videos, and media on the World Wide Web 25 Providing Answers- Providing answers to user queries, most frequently through lists of relevant pages that they've retrieved and ranked for relevancy. The link structure of the web serves to bind all of the pages together. Links allow the search engines' automated robots, called "crawlers" or "spiders," to reach the many billions of interconnected documents on the web. Once the engines find these pages, they decipher the code from them and store selected pieces in massive databases, to be recalled later when needed for a search query.To accomplish the monumental task of holding billions of pages that can be accessed in a fraction of a second, the search engine companies have constructed datacenters all over the world. These monstrous storage facilities hold thousands of machines processing large quantities of information very quickly.When a person performs a search at any of the major engines, they demand results instantaneously; even a one- or twosecond delay can cause dissatisfaction, so the engines work hard to provide answers as fast as possible. Search engines are answer machines. When a person performs an online search, the search engine scours its corpus of billions of documents and does two things: first, it returns only those results that are relevant or useful to the searcher's query; second, it ranks those results according to the popularity of the websites serving the information. It is both relevance and popularity that the process of SEO is meant to influence. 26 WEB SERVERS A web server is an information technology that processes requests via HTTP, the basic network used to distribute information on the World. The term can refer either to the entire computer, an appliance, or specifically to the software that accepts and supervises the HTTP requests. The most common use of web servers is to host websites, but there are other uses such as gaming, data storage, running enterprise, handling email, FTP, or other web uses. Features Virtual hosting to serve many web sites using one IP address Large file support to be able to serve files whose size is greater than 2 GB on 32 bit OS Bandwidth throttling to limit the speed of responses in order to not saturate the network and to be able to serve more clients Server-side scripting to generate dynamic web pages, still keeping web server and website implementations separate from each other 27 Caching A Web cache sits between one or more Web servers (also known as origin servers) and a client or many clients, and watches requests come by, saving copies of the responses — like HTML pages, images and files (collectively known as representations) — for itself.Then, if there is another request for the same URL, it can use the response that it has, instead of asking the origin server for it again.There are two main reasons that Web caches are used: To reduce latency — because the request is satisfied from the cache (which is closer to the client) instead of the origin server, it takes less time for it to get the representation and display it. This makes the Web seem more responsive. To reduce network traffic — because representations are reused, it reduces the amount of bandwidth used by a client. This saves money if the client is paying for traffic, and keeps their bandwidth requirements lower and more manageable. 28 IIS IIS (Internet Information Services) is Microsoft’s web server offering, playing second fiddle to market leader Apache. As is expected of a core Microsoft product, it only runs and is bundled on Windows operating systems, but is otherwise free for use. It is a closed software product and supported by solely by Microsoft. Although development is not as open and quick as the open-source usersupported nature of Apache, a behemoth like Microsoft can throw formidable support and development resources at its products, and IIS has fortunately benefitted from this. Actually it is one of the few Microsoft products that even its detractors (grudgingly) agree can stand toe-to-toe with its open source rival and even trounce it soundly in some areas. There is a lite version called IIS Express that has been installable as a standalone freeware server from Windows XP SP3 onwards. But this version only supports http and https. 29 Solid feature, performance and security improvements over the years have meant that IIS has steadily improved and gained ground and market share on Apache, from about 21% in 2010 to about 32% as at Feb 2014.Security has been one area of significant gain, making huge leaps from the days of IIS 6.0’s vulnerability to the infamous Code Red worm.All is not yet perfect however; for instance IIS has been called out as still being poor at supporting PFS (Perfect Forward secrecy) – a property of key cryptography that ensures a long-term key will not be compromised if a single component session key is compromised or broken. Still, the IIS-Apache security comparison may not be fair to IIS. IIS vulnerability may also be largely blamed on its operating system parent since most malware targets Windows, and Linux (Apache’s main choice of OS) is itself an offshoot of the inherently iron-clad Unix OS.IIS utilizes external web extensions to implement some features. For example FTP publishing, application request routing, media services and URL rewriting are all new features introduced in IIS 7.5 via extensions. And IIS offers strong support for the Microsoft products .NET (framework) and ASPX (scripting), so if your website relies heavily on these, IIS is a clear frontrunner as a choice of web server.IIS offers in-depth diagnostic tools such as failed request tracing, request monitoring and runtime data, in addition to virtual hosting support. But a major concern is that choosing IIS necessitates also picking Windows, with its attendant high cost and security implications compared to Linux. APACHE Apache, or to use its full royal title The Apache HTTP web server, is an open source Web server application managed by the Apache Software Foundation. The server software is freely distributed, and the open source license means users can edit the underlying code to tweak performance and contribute to the future development of the program – a major source of its beloved status among its proponents. Support, fixes and development are handled by the loyal user community and coordinated by the Apache Software Foundation. Although Apache will run on all major operating systems, it is most frequently used in combination with Linux. These two, combined with MySQL database and PHP scripting language, comprise the popular LAMP Web server solution. Apache boasts an impressive repertoire. Many features are implemented as compiled modules to extend the core functionality.These can range from serverside programming language support to authentication schemes.Some common language interfaces support Perl, Python, Tcl, and PHP.Apache also supports virtual hosting, which enables one machine to host and simultaneously server several different websites, and a number of good, well-developed GUI interfaces.Another notable feature is webpage compression to reduce their size 30 over http. This is also achieved by an external module, one called mod_gzip. And security is one of Apache’s noted strengths.When it comes to performance, conventional wisdom has it that Apache is just OK, a bit better than IIS but quite a bit slower than its main open-source rival Nginx.This has been borne out by objective tests. Though by no means slow for most general tasks, Apache is still held back by two of its main features:Feature bloat: Apache is frequently compared to MS Word – an extremely feature-rich application in which 90% of users only use about 10% of the features on a regular basis.Apache is a processbased server, unlike many of its rivals that are event-based or asynchronous in nature. In a process-based server, each simultaneous connection requires a separate thread and this incurs significant overhead. An asynchronous server, on the other hand, is event-driven and handles requests in a single or very few threads. 31