CS-802 - Gwalior Engineering College

advertisement
GEC Group of Colleges
Dept. of Computer Sc. & Engg. and IT
Teaching Notes
CS-802
Web Engg
Prepared by
Ms.Priyashree sharma
1
WEB ENGINEERING (CS-802)
UNIT-1 Web Engineering: Introduction, History, Evolution and Need, Time
line, Motivation, Categories &Characteristics of Web Applications, Web
Engineering Models, Software Engineering v/s Web Engineering. World
Wide Web: Introduction to TCP/IP and WAP, DNS, Email, TelNet, HTTP
and FTP. Browser and search engines: Introduction, Search fundamentals,
Search strategies, Directories search engines and Meta search engines,
Working of the search engines.Web Servers: Introduction, Features,
caching, case study-IIS, Apache.
UNIT-2 Information Architecture:Role, Collaboration and Communication,
Organizing Information, Organizational Challenges, Organizing Web sites
parameters and Intranets Website Design:Development, Development
phases, Design issues, Conceptual Design, High-Level Design, Indexing
the Right Stuff, Grouping Content. Architectural Page Mockups, Design
Sketches, Navigation Systems. Searching Systems, Good & bad web
design, Process of Web Publishing. Web site enhancement, submission of
website to search engines. Web security:issues, security audit. Web effort
estimation, Productivity Measurement,Quality usability and reliability.
2
Requirements Engineering for WebApplications:Introduction,Fundamentals,
Requirement Source, Type, ,Notations Tools. Principles Requirements
Engineering Activities, Adapting RE Methods to Web Application.
UNIT-3
Technologies for Web Applications I:HTML and DHTML:
Introduction, Structure of documents, Elements, Linking, Anchor
Attributes,Image
Maps,MetaInformation,ImagePreliminaries,Layouts,Backgrounds,Colors
and Text, Fonts, Tables, Frames and layers, Audio and Video with HTML
Database integration, CSS, Positioning with Style sheets, Forms Control,
Form Elements. Introduction to CGI, PERL, JAVA SCRIPT, JSP, PHP,
ASP & AJAX. Cookies: Creating and Reading
UNIT-4
Technologies for Web Applications XML: Introduction, HTML Vs XML,
Validation of documents, DTD, Ways to use, XML for data files, Embedding
XML into HTML documents, Converting XML to HTML for Display,
Displaying XML using CSS and XSL, Rewriting HTML as XML,Relationship
between HTML, SGML and web personalization , Semantic web, Semantic
Web Services,Ontology.
UNIT-5
E Commerce:Business Models, Infrastructure, Creating an Ecommerce
Web Site, Environment and Opportunities. Modes & Approaches,
Marketing & Advertising Concepts. Electronic Publishing issues,
approaches, legalities and technologies, Secure Web document, Digital
Signatures and Firewalls, Cyber crime and laws, IT Act.Electronic Cash,
Electronic Payment Systems:RTGS, NEFT, Internet Banking, Credit/Debit
Card. Security:Digital Certificates & Signatures, SSL, SET, 3D Secure
Protocol.
3
INDEX
S.NO
PAGE NO.
1
NAME OF TOPIC
INTRODUCTION OF WEB ENGG
2
EVOLUTION OF WEB ENGINEERING
6
3
NEED OF WEB ENGINEERING
7
4
CATEGORIES OF WEB APPLICATIONS
8
5
5
CHARACTERISTICS AND COMPLEXITY OF 9
WEB APPLICATIONS
6
WEB ENGINEERING MODELS
7
SOFTWARE ENGINEERING (SWE) V/s WEB 13
ENGINERING (WEBE)
8
WORLD WIDE WEB (WWW)
14
9
TCP /IP PROTOCOL
15-16
10
WAP (WIRELESS APPLICATION PROTOCOL
17-18
11
DOMAIN NAME SPACE
19-20
12
E- MAIL (ELECTRONIC MAIL)
21
13
14
TELNET
10-12
22
HYPERTEXT TRANSFER PROTOCOL
4
23
15
FILE TRANSFER PROTOCOL
24
16
BROWSERS AND SEARCH ENGINES
25
17
SEARCH FUNDAMENTALS
26
18
DIRECTORIES SEARCH ENGINES AND META 27
SEARCH ENGINES
19
WORKING OF SEARCH ENGINES
28-29
20
WEB SERVERS
30
21
CACHING
31
22
IIS
32
23
APACHE
33
5
INTRODUCTION
 Web engineering actively promotes systematic, disciplined and quantifiable
approaches towards successful development of high-quality, ubiquitously
usable Web-based systems and applications.
 Web engineering focuses on the methodologies, techniques and tools that are
the foundation of Web application development and which support their
design, development, evolution, and evaluation.
 Web application development has certain characteristics that make it different
from traditional software, information system, or computer application
development.
 Web engineering is multidisciplinary and encompasses contributions from
diverse areas: systems analysis and design, software engineering,
hypermedia/hypertext engineering, requirements engineering, humancomputer interaction, user interface, information engineering, information
indexing and retrieval, testing, modelling and simulation, project management,
and graphic design and presentation.
 Web engineering is neither a clone, nor a subset of software engineering,
although both involve programming and software development. While Web
Engineering uses software engineering principles, it encompasses new
approaches, methodologies, tools, techniques, and guidelines to meet the
unique requirements.
Proponents of Web engineering supported the establishment of Web
engineering as a discipline at an early stage of Web. First Workshop on Web
Engineering was held in conjunction with World Wide Web Conference held in
Brisbane, Australia, in 1998. San Murugesan, Yogesh Deshpande, Steve Hansen
and Athula Ginige, from University of Western Sydney, Australia formally
promoted Web engineering as a new discipline in the first ICSE workshop on Web
Engineering in 1999.Since then they published a series of papers in a number of
journals, conferences and magazines to promote their view and got wide support.
Major arguments for Web engineering as a new discipline are:
 Web-based Information Systems (WIS) development process is different
and unique.
6
 Web engineering is multi-disciplinary; no single discipline (such as software
engineering) can provide complete theory basis, body of knowledge and
practices to guide WIS development.
 Issues of evolution and lifecycle management when compared to more
'traditional' applications.
EVOLUTION OF WEB ENGINEERING
Web development within an organization depends upon several factors. The
motivation depends upon the initial purpose of using the Web (Web 'presence' or
becoming a Web-based organization), the customer’s expectations and the
competitive environment.The drive to systematize development is subject to
overall perception of the Web and conscious policy decisions within the
organization. For example, a low level perception of the Web is likely to lead to ad
hoc, sporadic efforts.As a starting point in understanding the problem domains
that the Web currently can address, Table 3 presents a taxonomy of Web
applications updated after Ginige and Murugesan.The order of these categories
roughly illustrates the evolution of Web applications. Organizations that started
their Web development early may also have followed a similar order in the past.
Although, it is possible to start Web development with applications in any
category, this table has been useful to explain to organizations with modest
presence on the Web how they might improve or benefit from incremental
exposure, thus keeping the risks to the minimum.
7
NEED FOR WEB ENGINEERING
The need for Web Engineering is felt (or dismissed) according to perceptions of
the developers and Managers, their experiences in creating applications made
feasible by the new technologies, and the
Complexity of Web applications.In the early stages of Web development, White
and powell1, Identified and emphasized the need for engineering as in Web
Document Engineering and Web Site engineering.
Web Engineering, more generally, explicitly recognizes the fact that good.
Development requires multidisciplinary efforts and does not fit neatly into any of
the existing disciplines.
8
CATEGORIES OF WEB APPLICATIONS
CATEGORIES
EXAMPLES
Informational
Online newspapers, product catalogues,
newsletters,
service
manuals,
classifieds, e-books
Interactive
Registration
forms,
customized
information
 User-provided information
presentation, games
 Customized access
Transaction
E-shopping, ordering goods and
services, banking
Workflow
Planning and scheduling systems,
inventory
management, status monitoring
Collaborative work
Distributed
authoring
systems,
Environment
collaborative design , tools
Online communities, marketplace
Chat groups, recommender systems,
Marketplaces, auctions
Web Portals
electronic
shopping
malls,
intermediaries
Web Services
Enterprise applications, information
and business
intermediaries
9
CHARACTERISTICS AND COMPLEXITY OF WEB APPLICATIONS
Web applications vary widely: from small-scale, short-lived services to large-scale
enterprise Applications distributed across the Internet and corporate intranets.
Over the years, Web applications have evolved and become more complex ñ they
range from simple, read-only applications, to full-fledged information systems.
This complexity may be in terms of performance (number of hits per second), for
example the Slashdot site Olympics sites receiving hundreds of thousands of hits
per minute in terms of dynamic nature of information, the use of multimedia or in
other ways.They may provide vast, dynamic information in multiple media formats
(graphics, images and video) or may be relatively simple. Nevertheless, they all
demand balance between information content, aesthetics and performance.
Characteristics of Web Applications
Simple Web-based system
Primarily textual information in noncore
applications
Information content fairly static
Simple navigation
Infrequent access or limited usefulness
Limited interactivity and functionality
Standalone system
Developed by a single individual or by a
very
small team
Security
requirements
minimal
(because of
mainly one-way flow of information)
Easy to create
Advanced web based systems
Dynamic
Web
pages
because
information
Changes with time and user’s needs.
Large volume of information
Difficult to navigate and find
information
Integrated with database and other
Deployed in mission-critical applications
Prepared for seamless evolution
May require a larger development team
with expertise in diverse areas
Calls for risk or security assessment and
management
Needs configuration control and
management
Feedback
from
users
either Necessitates
project
plan
and
unnecessary or not sought
management
Web site mainly as an identity for the
Requires a sound development process
current clientele, and not as a medium and
for
methodology
communication
10
WEB ENGINEERING MODELS
There are various types of web engineering models they are as follows
1) Content- The content model represents the domain concepts and the
relationships between them.
2) Navigation- The navigation model is used to represent navigable nodes
and the links between nodes.
3) Presentation- The presentation model provides an abstract view on the
user interface UI) of a web application. It is a platform-independent
specification without coincide ring concrete aspects like colors, fonts, and
position of UI elements.
4) Process- The process model visualizes the workflows of the processes
which are invoked from certain navigation nodes.
Our MDE approach comprise the generation of draft models of each concern, i.e.
Initial versions that require further refinement. In the following paragraphs we
sketches the main modeling elements for the main concerns and gives an
overview to the model transformation (informal description).
Content models are represented in UWE as plain UML class diagrams.
A First draft of a content model is obtained by a set of model-to-model
transformation s using as source models the use cases and the corresponding
workflows (graphically represented as activity diagrams).
These transformations are Objects nodes that model the data used in the
workflows are translated into content Classes using the name of the object note
as the class name.If an action pin is connected to an object node directly or
through an action, then it can be assumed that this pin represents a property of
the class modeled by the object node.In that way, the name of the pin is used to
determine whether an attribute or association is created by comparing the name
with existing content classes.Navigation Models describe the navigation structure
of a web application using a set of stereotyped classes defined for the web
domain, such as navigation classes and links, menus, etc.Link book navigation
model which was generated based on the requirements models.The following is a
very brief overview of some modeling elements part of the Uww profile .A
navigation Class (visualized as) represents a navigable node of the hypertext
structure; a navigation Link shows a direct link between navigation classes.
Alternative navigation paths are handled by menu) and the so-called access
primitives are used to reach multiple instances of a navigation class (index), or to
select items query.Web applications frequently support business logic as well. An
entry and/or exit points of the business processes is modeled by a process Class in
the navigation model, the linkage between each other and to the navigation
11
classes is modeled by a process Link. The model transformations from
requirements (use cases and workflows) to the Navigation structure model
encompass the following steps: Creation of navigation Classes for browsing, use
cases; processing use Cases are transformed into process Classes.Tagged values of
the use cases are transformed into equally named tags of the Generated classes
.Relationships between use cases are translated into associations between
created navigation and process classes. The associations are stereotyped with
Process Link if at least one related class is a process class, navigation-Link,
otherwise.: Generated navigation model a menu is introduced whenever a
navigation class has several outgoing links.The source of the links is changed to
the menu which is connected to the navigation class by a composition.A
navigation class can be created to serve as home of the application, if it has not
been modeled explicitly.In addition, each process class included in the navigation
specification can be modeled as a detailed workflow in the form of a UML activity
diagram (not included in this work). It is the result of a refinement process that
starts from the workflow of the Requirements model.Presentation models are
designed based on the information provided by the navigation models and the
information available in workflows of the requirements models, e.g. Rich UI
features. A UML nested class diagram is selected as visualization technique.
Presentation model describes the basic structure of the user interface, i.e., which
UI elements (e.g. text, images, anchors, forms) are used to represent the
navigation nodes.The basic presentation modeling elements are the presentation
Group which are directly based on nodes from the navigation model, i.e.
navigation classes, menus, access primitives, and process classes. A presentation
group) or a form are used to include a set of other UI elements, liketex (), text
Input(), button(), selection() etc.The top level elements of the presentation model
are classes with the stereotype presentation Group.The second level of
presentation elements consists of input and Output elements. Presentation
model similarly to the navigation model requires a Main class, which is not
modeled explicitly during the requirements specification. This presentation group
is named Home and contains all presentation groups created from use cases
inside a class presentation Alternatives and an anchor for every presentation
group.
12
SOFTWARE ENGINEERING (SWE) V/s WEB ENGINERING (WEBE)
Although both of them follows the disciplined approach to develop, deploy and
maintained the applications.The basic difference is that the requirements (Scope)
of web projects is different from the software projects.Software projects have the
various models like Waterfall, Spiral, Incremental, etc.But there is no defined
models for Web Applications project, as the requirements are dynamic (not
fixed). For simplicity we can define model for web projects as PDCA model. (P)Lan,
(D)o, (C)heck and (A)ct.In the planning stage, you ‘Plan’ as what are the
requirements, concept, planning, costing, and timeline and get the approval from
the customer before starting the project.Next comes the ‘Do’ – which is defined
as “How the concept has to be designed and developed”.Here the prototype
(models base on blueprint) has to be build and then get it reviewed from the
customer. PDCA cycle is the base of all the models.WEBE is more complex then
SWE, as former is dependent on the various types of browsers, OS, and servers
like Web server, application servers.WEB Apps is more complex then SW Apps, in
the sense that to build such applications, you have to know at least HTML,
Database, Server side scripting language, Java scripts and Photoshop for editing
images.
WORLD WIDE WEB (WWW)
The World Wide Web is a system of Internet servers that support
specially formatted documents. The documents are formatted in a markup
language called HTML (Hypertext Markup Language) that supports links to other
13
documents, as well as graphics, audio, and video files.This means you can jump
from one document to another simply by clicking on hot spots. Not all Internet
servers are part of the World Wide Web.Individual document pages on the World
Wide Web are called web pages and are accessed with a software application
running on the user's computer, commonly called a web browser.Web pages may
contain text, images, videos, and other multimedia components, as well
as navigation features consisting of hyperlinks.
TCP /IP PROTOCOL
There are four layers of the TCP/IP reference model (DARPA model as named by
the US Government Agency) The ISO-OSI reference model is composed of seven
layers.Transmission Control Protocol (TCP)
One-to-one and connection-oriented reliable protocol Used in the accurate
transmission of large amount of data Slower compared to UDP because of
additional error checking being performed.
14
LAYERS OF TCP/IP
APPLICATION
TRANSPORT
INTERNET
DATA LINK
PHYSICAL
APPLICATION LAYER
Provides applications with the ability to access the services of the other layers.
New protocols and services are always being developed in this category.
TRANSPORT LAYER
Sequencing and transmission of packets
Acknowledgment of receipts Recovery of packets Flow control
In essence, it engages in host-to-host transportation of data packets and the
delivery of them to the application layer.
INTERNET LAYER
This layer is also known as Internet layer. The main purpose of this layer is to
organize or handle the movement of data on network. By movement of data, we
generally mean routing of data over the network. The main protocol used at this
layer is IP. While ICMP used by popular ‘ping’ command) and IGMP are also used
at this layer.
15
DATA LINK LAYER
The data link layer is the protocol layer that transfers data between adjacent
network nodes in a wide area network or between nodes on the same local area
network segment.The data link layer provides the functional and procedural
means to transfer data between network entities and might provide the means to
detect and possibly correct errors that may occur in the physical layer.
PHYSICAL LAYER
The physical layer consists of the basic networking hardware transmission
technologies of a network. It is a fundamental layer underlying the logical data
structures of the higher level functions in a network.The physical layer defines the
means of transmitting raw bits rather than logical data packets over a
physical link connecting network nodes.
WAP (WIRELESS APPLICATION PROTOCOL)
Wireless Application Protocol (WAP) is a technical standard for accessing
information over a mobile wireless network.
A WAP browser is a web browser for mobile devices such as mobile phones that
uses the protocol. Before the introduction of WAP, mobile service providers had
limited opportunities to offer interactive data services, but needed interactivity to
support Internet and Web applications such as:
 Email by mobile phone
 Tracking of stock-market prices
 Sports results
 News headlines
16
 Music downloads
The WAP standard described a protocol suite allowing the interoperability of WAP
equipment, and software with different network technologies SUCH AS GSM AND
IS-95 The original WAP model provided a simple platform for access to web-like
WML services and e-mail using mobile phones in Europe and the SE Asian regions.
As of 2009 it continues with a considerable user base. The later versions of WAP,
primarily targeting the United States market, were designed for a different
requirement - to enable full web XHTML access using mobile devices with a higher
specification and cost, and with a higher degree of software complexity.
Considerable discussion has
addressed
the
question
whether the WAP protocol
design was appropriate.Some
have suggested that the
bandwidth-sparingsimple
interface of Gopher would be a
better match for mobile
phones and Personal digital
assistants (PDAs).[
The initial design of WAP
specifically aimed at protocol
independence across a range of different protocols (SMS, IP over PPP over a
circuit switched bearer, IP over GPRS, etc.).This has led to a protocol considerably
more complex than an approach directly over IP might have caused.
Most controversial, especially for many from the IP side, was the design of WAP
over IP. WAP's transmission layer protocol, WTP, uses its own retransmission
mechanisms over UDP to attempt to solve the problem of the inadequacy of TCP
over high-packet-loss networks.
17
DOMAIN NAME SPACE
The domain name space consists of a tree of domain names. Each node or leaf in
the tree has zero or more resource records, which hold information associated
with the domain name.The tree sub-divides into zones beginning at the root zone.
A DNS zone may consist of only one domain, or may consist of many domains and
sub-domains, depending on the administrative authority delegated to the
manager.
The hierarchical Domain Name System, organized into zones, each served by a
name server Administrative responsibility over any zone may be divided by
creating additional zones.Authority is said to be delegated for a portion of the old
space, usually in the form of sub-domains, to another name server and
administrative entity. The old zone ceases to be authoritative for the new zone.
18
DNS (DOMAIN NAME SYSTEM)
The Domain Name System (DNS) is a hierarchical distributed naming system for
computers, services, or any resource connected to the Internet or a private
network.It associates various information with domain names assigned to each of
the participating entities. Most prominently, it translates domain names, which
can be easily memorized by humans, to the numerical IP addresses needed for
the purpose of computer services and devices worldwide.The Domain Name
System is an essential component of the functionality of most Internet services
because it is the Internet's primary directory service. An often-used analogy to
explain the Domain Name System is that it serves as the phone book for the
Internet by translating human-friendly computer hostnames into IP addresses.For
example, the domain name www.example.com translates to the addresses
93.184.216.119 (IPv4) and 2606:2800:220:6d: 26bf:1447:1097:aa7 (IPv6).Unlike a
phone book, the DNS can be quickly updated, allowing a service's location on the
network to change without affecting the end users, who continue to use the same
host name.Users take advantage of this when they use meaningful Uniform
Resource Locators (URLs), and e-mail addresses without having to know how the
computer actually locates the services.
E- MAIL (ELECTRONIC MAIL)
Electronic mail, most commonly referred to as email or e-mail since c 1993, is a
method of exchanging digital messages from an author to one or more recipients.
19
Modern email operates across the Internet or other computer networks. Some
early email systems required that the author and the recipient both be online at
the same time, in common with instant messaging.Today's email systems are
based on a store-and-forward model. Email servers accept, forward, deliver, and
store messages.Neither the users nor their computers are required to be online
simultaneously; they need connect only briefly, typically to a mail server, for as
long as it takes to send or receive messages.Historically, the term electronic mail
was used generically for any electronic document transmission. For example,
several writers in the early 1970s used the term to describe fax document
transmission.A result, it is difficult to find the first citation for the use of the term
with the more specific meaning it has today.An Internet email message consists of
three components, the message envelope, the message header, and the message
body.The message header contains control information, including, minimally, an
originator's email address and one or more recipient addresses.
Usually descriptive information is also added, such as a subject header field and a
message submission date/time stamp.
For an Email
 An account on a mail server and supporting software on your PC
 The username and password will allow you to access your account
 All e-mail programs allow you to Send, Compose, Reply, and Forward mail
Send mail electronically via the Internet.
TELNET
A terminal emulation program for TCP/IP networks such as the Internet.
The Telnet program runs on your computer and connects your PC to a server on
the network.You can then enter commands through the Telnet program and they
will be executed as if you were entering them directly on the server console.This
enables you to control the server and communicate with other servers on the
20
network. To start a Telnet session, you must log in to a server by entering a valid
username and password.Telnet is a common way to remotely control Web
servers.
HYPERTEXT TRANSFER PROTOCOL
The Hypertext Transfer Protocol (HTTP) is an application for distributed,
collaborative, hypermedia information systems.HTTP is the foundation of data
communication for the World.Hypertext is structured text that uses logical links
(hyperlinks) between nodes containing text. HTTP is the protocol to exchange or
transfer hypertext.The standards development of HTTP was coordinated by
the Internet Engineering Task Force (IETF) and the World Wide Web
Consortium(W3C), culminating in the publication of a series of Requests for
Comments (RFCs), most notably RFC 2616 (June 1999), which defined HTTP/1.1,
the version of HTTP most commonly used today.Default port number is 80.
21
FILE TRANSFER PROTOCOL
The File Transfer Protocol (FTP) is a standard network protocol used to
transfer computer files from one host to another host over a TCP-based network,
such as the Internet.FTP is built on a client-server architecture and uses separate
control and data connections between the client and the server.
Used for downloading from most MP3 sites,
Designed for faster file transfer over the Internet compared to using the HTTP
protocol FTP sites can be configured alongside a web site to support FTP file
transfer FTP default ports are 20 and 21.
22
BROWSERS AND SEARCH ENGINES
A web browser is considered a software application that allows people to access,
retrieve and view information on the internet. The information that may be
“browsed” can be in the form of text content on a web page, an image, video,
audio etc.The most popular web browsers currently in use are Firefox, Google
Chrome, Internet Explorer, Opera and Safari.The main purpose of a search engine
is to search for information on the Internet. They are software programs that
search for websites based on keywords that the user types in. The search engine
then goes through their databases of information in order to locate the
information you are looking for.The main search engines currently be used are
Google, Bing, and Yahoo.
SEARCH FUNDAMENTALS
It includes Information bar, Search form area, Directory area, Links
23
• Search Terminology- Search tool, Query, Query syntax, Query semantics,
Hit, Match, Relevancy score
•
Pattern Matching Queries- Enter Keyword(s), Search Engine returns URLs.
•
Boolean Queries- George Boole, AND OR NOT
•
Search Domain- Current Web, Newsgroups, Specialized Databases, Internet
•
Search Subjects-A way to view the search queries of anonymous users in
real time such as How busy , “Spy” on other users, “See” modifications,
Various interests, Personal interests
SEARCH STRATEGIES
• Wildcard- A special character that can be added to a phrase while searching and
the search engine or subject directory looks for all possible endings. The results
will provide all possible documents in their database that have those letters.
• Plus and Minus Signs-Used before a keyword or phrase should retrieve results
that include that specific keyword or phrase. The minus sign used before a
keyword or phrase should retrieve results that exclude that specific keyword or
phrase.
• Quotation Marks and Brackets- Assist with narrowing the search results from
the search tools. When quotation marks or brackets are used, the search engine
will only retrieve documents that have those key terms appearing together.
• Pipe Symbol (|) the pipe (|) symbol, located on most keyboards on the righthand side between the delete a return key will assist with narrowing down results
within a broad category.
• Boolean Operators- Used the same way the plus and minus signs are used. The
AND Boolean operator, is similar to the plus sign and the NOT Boolean operator is
similar to the minus sign. The OR Boolean operator tells the search engine to
retrieve one term or the other.
• Near- Indicates to the search tools that those terms must be located within a
certain number of words. The results may vary depending on the search tool. To
illustrate, some search tools may try to locate the terms within 2, 10 or 25 words
of each other. The command to use is NEAR/#
• Nesting- Allows the user to perform multiple tasks and build a complex search.
The parentheses are used to group the key words and Boolean operators
together. This is an excellent technique for complex searching.
24
DIRECTORIES SEARCH ENGINES AND META SEARCH ENGINES
Subject directories usually have smaller databases than search engines.
Directories classify web documents or sites into a subject classification scheme;
they are usually compiled by hand or in some type of logical order.
Subject directories also begin with general topics allow the user to narrow to a
specific category. They usually provide limited search results of available pages on
the Web. The information collected from a subject directory will generally
contain more related information dealing with a particular subject matter.
Information retrieved from subject directories will not be as comprehensive as
the information located from a search engine. Many of these directories include
browsing and searching capabilities .A meta-search engine is a search tool that
sends user requests to several other search engines and/or databases and
aggregates the results into a single list or displays them according to their source.
Meta search engines enable users to enter search criteria once and access several
search engines simultaneously. Meta search engines operate on the premise that
the Web is too large for any one search engine to index it all and that more
comprehensive search results can be obtained by combining the results from
several search engines.This also may save the user from having to use multiple
search engines separately.
WORKING OF SEARCH ENGINES
Search engines have two major functions: crawling and building an index, and
providing search users with a ranked list of the websites they've determined are
the most relevant.Crawling and Indexing- The billions of documents, pages, files,
news, videos, and media on the World Wide Web
25
Providing Answers- Providing answers to user queries, most frequently through
lists of relevant pages that they've retrieved and ranked for relevancy.
The link structure of the web serves to bind all of the pages together.
Links allow the search engines' automated robots, called "crawlers" or "spiders,"
to reach the many billions of interconnected documents on the web.
Once the engines find these pages, they decipher the code from them and store
selected pieces in massive databases, to be recalled later when needed for a
search query.To accomplish the monumental task of holding billions of pages that
can be accessed in a fraction of a second, the search engine companies have
constructed datacenters all over the world.
These monstrous storage facilities hold thousands of machines processing large
quantities of information very quickly.When a person performs a search at any of
the major engines, they demand results instantaneously; even a one- or twosecond delay can cause dissatisfaction, so the engines work hard to provide
answers as fast as possible.
Search engines are answer machines.
When a person performs an online search, the search engine scours its corpus of
billions of documents and does two things: first, it returns only those results that
are relevant or useful to the searcher's query; second, it ranks those results
according to the popularity of the websites serving the information.
It is both relevance and popularity that the process of SEO is meant to influence.
26
WEB SERVERS
A web server is an information technology that processes requests via HTTP, the
basic network used to distribute information on the World. The term can refer
either to the entire computer, an appliance, or specifically to the software that
accepts and supervises the HTTP requests.
The most common use of web servers is to host websites, but there are other
uses such as gaming, data storage, running enterprise, handling email, FTP, or
other web uses.
Features
 Virtual hosting to serve many web sites using one IP address
 Large file support to be able to serve files whose size is greater than 2 GB
on 32 bit OS
 Bandwidth throttling to limit the speed of responses in order to not
saturate the network and to be able to serve more clients
 Server-side scripting to generate dynamic web pages, still keeping web
server and website implementations separate from each other
27
Caching
A Web cache sits between one or more Web servers (also known as origin
servers) and a client or many clients, and watches requests come by, saving
copies of the responses — like HTML pages, images and files (collectively known
as representations) — for itself.Then, if there is another request for the same
URL, it can use the response that it has, instead of asking the origin server for it
again.There are two main reasons that Web caches are used:
To reduce latency — because the request is satisfied from the cache (which is
closer to the client) instead of the origin server, it takes less time for it to get the
representation and display it. This makes the Web seem more responsive.
To reduce network traffic — because representations are reused, it reduces the
amount of bandwidth used by a client. This saves money if the client is paying for
traffic, and keeps their bandwidth requirements lower and more manageable.
28
IIS
IIS (Internet Information Services) is Microsoft’s web server offering, playing
second fiddle to market leader Apache.
As is expected of a core Microsoft product, it only runs and is bundled on
Windows operating systems, but is otherwise free for use. It is a closed software
product and supported by solely by Microsoft.
Although development is not as open and quick as the open-source usersupported nature of Apache, a behemoth like Microsoft can throw formidable
support and development resources at its products, and IIS has fortunately
benefitted from this.
Actually it is one of the few Microsoft products that even its detractors
(grudgingly) agree can stand toe-to-toe with its open source rival and even
trounce it soundly in some areas.
There is a lite version called IIS Express that has been installable as a standalone
freeware server from Windows XP SP3 onwards. But this version only supports
http and https.
29
Solid feature, performance and security improvements over the years have meant
that IIS has steadily improved and gained ground and market share on Apache,
from about 21% in 2010 to about 32% as at Feb 2014.Security has been one area
of significant gain, making huge leaps from the days of IIS 6.0’s vulnerability to the
infamous Code Red worm.All is not yet perfect however; for instance IIS has been
called out as still being poor at supporting PFS (Perfect Forward secrecy) – a
property of key cryptography that ensures a long-term key will not be
compromised if a single component session key is compromised or broken.
Still, the IIS-Apache security comparison may not be fair to IIS. IIS vulnerability
may also be largely blamed on its operating system parent since most malware
targets Windows, and Linux (Apache’s main choice of OS) is itself an offshoot of
the inherently iron-clad Unix OS.IIS utilizes external web extensions to implement
some features. For example FTP publishing, application request routing, media
services and URL rewriting are all new features introduced in IIS 7.5 via
extensions. And IIS offers strong support for the Microsoft products .NET
(framework) and ASPX (scripting), so if your website relies heavily on these, IIS is a
clear frontrunner as a choice of web server.IIS offers in-depth diagnostic tools
such as failed request tracing, request monitoring and runtime data, in addition to
virtual hosting support. But a major concern is that choosing IIS necessitates also
picking Windows, with its attendant high cost and security implications compared
to Linux.
APACHE
Apache, or to use its full royal title The Apache HTTP web server, is an open
source Web server application managed by the Apache Software Foundation.
The server software is freely distributed, and the open source license means users
can edit the underlying code to tweak performance and contribute to the future
development of the program – a major source of its beloved status among its
proponents. Support, fixes and development are handled by the loyal user
community and coordinated by the Apache Software Foundation.
Although Apache will run on all major operating systems, it is most frequently
used in combination with Linux. These two, combined with MySQL database and
PHP scripting language, comprise the popular LAMP Web server solution.
Apache boasts an impressive repertoire. Many features are implemented as
compiled modules to extend the core functionality.These can range from serverside programming language support to authentication schemes.Some common
language interfaces support Perl, Python, Tcl, and PHP.Apache also supports
virtual hosting, which enables one machine to host and simultaneously server
several different websites, and a number of good, well-developed GUI
interfaces.Another notable feature is webpage compression to reduce their size
30
over http. This is also achieved by an external module, one called mod_gzip. And
security is one of Apache’s noted strengths.When it comes to performance,
conventional wisdom has it that Apache is just OK, a bit better than IIS but quite a
bit slower than its main open-source rival Nginx.This has been borne out by
objective tests. Though by no means slow for most general tasks, Apache is still
held back by two of its main features:Feature bloat: Apache is frequently
compared to MS Word – an extremely feature-rich application in which 90% of
users only use about 10% of the features on a regular basis.Apache is a processbased server, unlike many of its rivals that are event-based or asynchronous in
nature. In a process-based server, each simultaneous connection requires a
separate thread and this incurs significant overhead. An asynchronous server, on
the other hand, is event-driven and handles requests in a single or very few
threads.
31
Download