Uploaded by David Seku

Distributed Computing Course Syllabus (MSIS52102)

advertisement
1. Distributed Computing (MSIS52102)
Course Title
: Distributed Computing
Course Code : MSIS52102
Course Credits : Credit Hours 45 Credit Units 3
Level
: Year Ii, Semester I
Course Description
This course unit builds on the course unit Operating Systems, starting with an introduction to different
communication technologies and structures of computer networks, including LAN, WAN and the
Internet. Description of layers of software added to an operating system to support networking, including
the TCP/IP protocol suite, is discussed. Techniques for client-server programming in different platforms
are also examined. Popular distributed technologies are investigated. As an extension, the emphasis is
very much on the heterogeneity and interconnection of software components, both in the way they
communicate with each other, and in the way they are themselves distributed. The concentration is not
much on the technical detail of standard such as CORBA, Java RMI, and Distributed Network
Architecture, but on the ways these technologies can be used to construct dynamic infrastructures for
linking diverse local environments into one community of cooperating parts. The emphasis is every much
upon allowing heterogeneity, and on solving business problems related to distributed concentrations of
data.
Course Objectives
Distributed computing is an emerging and dynamic field of study. The rapid advances in miniaturization
of computing devices and “unfettered” communication technology, together with the increasing demands
for ubiquitous access to information, have introduced new challenges as well as new opportunities in
many traditional areas of computer science. This course aims providing skills of distributing and
managing information over organizational Network.
Learning Outcomes
On successful completion of this course unit a student will be able to:
1. Understand broad range of fundamentals concepts of Distributed Computing and be able to describe and
compare current types of network hardware.
2. Learn how to specify their interconnection and interaction of different components both in the way they
communicated with each other, and in the way they are themselves distributed.
3. Understand the different standards of Distributed Computing (such as Java RMI and CORBA); and
achieve moderate level of skills.
4. understanding of the distributed computing and distributed applications; describe in detail the purpose and
operations of the protocols making up the OSI and Internet protocol suites; write software taking
advantage of selected protocol stacks; and describe and implement simple examples of the techniques
used in client-server computing.
Course Content
1. Introduction to Distributed Computing
(4Hrs)
Meaning of distributed computing; Distributed system, Distributed computing vs. parallel computing,
Basic concepts in operating system: processes and threads’ Basic concepts in data communication:
Network architectures, the OSI model and the Internet model, Connectionoriented communication vs.
connectionless communication, Naming schemes for network resources, The Domain Name System
(DNS), Protocol port numbers , Uniform Resource Identifier (URI) , Email addresses
2. Distributed Computing Paradigm
(4Hrs)
Introduction to Paradigms for distributed applications; The paradigms of distributed applications,
Message passing, Client-server, Message system: Point-to-point; Publish/ Subscribe, Distributed objects,
Remote method, invocation, Object request broker, Object space, Mobile agents, Network services,
Collaborative applications, Tradeoffs considered when choosing a paradigm or a tool for an application,
Overheads, Scalability, Cross-platform support, Performance and Software engineering issues.
3. The Socket API
(4Hrs)
Conceptual Model of the Socket API, The Socket API, Connection-Oriented and Connectionless
datagram Socket, The Java Datagram Socket, The Java Socket API, The data structure in the Sender and
Receiver programs’ The program flow in the Sender and Receiver program, Event Synchronization with
the connectionless datagram sockets API, Setting Timeout, Key methods and Constructors, The Coding,
Connectionless Sockets,Study Sample Codes, Connection-Oriented
Datagram Socket API: Methods called for Connection Oriented Datagram Socket; The Stream Mode
Socket API (Connection-Oriented Socket API); The stream-mode socket API program flow, The server
(The connection listener), Key methods in the Server Socket class; Connectionless Socket API: Secure
Sockets,
The Java secure socket extension API
4. Client-Server Model
(4Hrs)
The difference between the client-server system architecture and the client-server distributed computing
paradigm; Definition of the paradigm and why it is widely adopted in network services and network
applications; The issues of service sessions, protocols, service location, interposes communications, data
representation, and event synchronization in the context of the clientserver paradigm; The three-tier
software architecture of network applications: Connectionless server versus connection-oriented server.
Iterative server versus concurrent server and the effect on a client session. Stateful server versus stateless
server. Stateful server: global state information versus session state information.
5. Group Communication
(4Hrs)
Unicast vs. multicast; Archetypal multicast API operations: Basic multicast is connectionless and
unreliable; Categories of reliable multicast system; IP multicast addressing; A static Class D address; A
transient address obtained at run time, or An arbitrary unassigned address. MulticastSocket class: The
joinGroup and The LeaveGroup methods; The send method, The receive methods
6. Distributed Objects
(4Hrs)
The distributed object paradigm vs. the message-passing paradigm. Using the paradigm. An object server
and object client in a distributed object system Proxy. Object registry, Distributed object system
protocols: The Java Remote Method Invocation (RMI); The Distributed Component Object; The Model
(DCOM); The Common Object Request Broker Architecture (CORBA), and The Simple Object Access
Protocol (SOAP).
7. Advanced Remote Method Invocation
(4Hrs)
Client callback: The client-side software and Remote interface; A remote method call to the server; The
object server; The object server and the invokes the callback method (defined in the client remote
interface); The two sets of stub-skeletons; Stub downloading Java security policy
8. Message Service System
(4Hrs)
Message service system paradigm; Message service system Models; Point-to-point model;
Publishsubscribe model; Toolkits based on the message-system paradigm.
9. Mobile Agent
(4Hrs)
Introduction to Mobile (Transportable) Agents; Advantages of Mobile Agents; The Mobile Agent
Paradigm vs. the Client-Server Paradigm; Basic Architecture; What is in the Agent?; Mobile Agent
Applications; Security in Mobile Agent Systems; Mobile Agent Framework Systems; Existing Mobile
Agent Framework System; Mobile Agent System Interoperability Facility (MASIF).
10. Logic Time
Application of Logic Time, Background, Notations
Total
(2Hrs)
(60 Hours )
Teaching methods
Instructions will be given through lectures, class demonstrations and tutorial discussions.
Course Unit Assessment
i.
ii.
Assessment will be as follows
One assignment, taking a total score of 15 marks;
A group presentation on a distributed network design (25 marks)
Reference
1. Ekedahl, Programming with Microsoft Visual Basic 2005: An Object-Oriented Approach, 2nd Edition,
2007, Thomson.
2. Stevens et al. 2004, UNIX Network Programming –The Sockets Networking API, Volume 1, Addition
Wesley.
3. Software
Engineering
Concerns
for
Mobile
Agent
Systems,
http://www.elet.polimi.it/Users/DEI/Sections/Compeng/GianPietro.Picco/ICSE01mobility/pape
rs/cook.pdf
Introduction to Distributed Computing
Computing has passed through many transformations since the birth of the first computing machines.
Developments in technology have resulted in the availability of fast and inexpensive processors, and
progresses in communication technology have resulted in the availability of lucrative and highly
proficient computer networks. Among these, the centralized networks have one component that is
shared by users all the time. All resources are accessible, but there is a single point of control as well as
a single point of failure. The integration of computer and networking technologies gave birth to new
paradigm of computing called distributed computing in the late 1970s. Distributed computing has
changed the face of computing and offered quick and precise solutions for a variety of complex problems
for different fields. Nowadays, we are fully engrossed by the information age, and expending more time
communicating and gathering information through the Internet. The Internet keeps on progressing along
more than a few magnitudes, abiding end systems increasingly to communicate in more and more
different ways. Over the years, several methods have evolved to enable these developments, ranging from
simplistic data sharing to advanced systems supporting a multitude of services.
What is a Distributed System?
The scale of networked workstations and plunge of the centralized mainframe has been the most theatrical
change in the last two decades of information technology. This shift has placed added processing power
in the hands of the end-user and distributed hardware resources. When computers work together over a
network, the network may us the power from all the networked computers to perform complex tasks.
Computation in networks of processing nodes can be classified into centralized or distributed
computations. A centralized solution relies on one node being designated as the computer node that
processes the entire application locally and the central system is shared by all the users all the time.
Hence there is single point of control and single point of failure.
The motivation behind the growth of decentralised computing is the availability of the low-priced,
high performance computers and network tools. When a handful of powerful computers are linked
together and communicate with each other, the overall computing power available can be amazingly vast.
Such a system can have a higher performance share than a single supercomputer. Distributed computing –
a decentralisation approach to computing is a potentially very powerful approach for accessing large
amounts of computational power. The objective of such systems is to minimize communication and
computation cost. In distributed systems, the processing steps of the application are divided among the
participating nodes. The basic step in all distributed computing architectures is the notion of
communication between computers.
Distributed system is an application that executes a collection of protocols to coordinate the actions of
multiple processes on a communication network, such that all components cooperate together to perform
a single or small set of related tasks. The collaborating computers can access remote resources as well as
local resources in the distributed system via the communication network. The existence of multiple
autonomous computers is transparent to the user in a distributed system. The user is not aware that the
jobs are executed by multiple computers subsist in remote locations. This means that like centralised
systems no single computer in the system carries the entire load on system resources that running a
computer program usually required.
Distributed System Architecture
Distributed systems are built up on top of existing networking and operating systems software. A
distributed system comprises a collection of autonomous computers, linked through a computer network
and distribution middleware. To become autonomous there exist a clear master/slave association between
two computers in the network. The middleware enables computers to coordinate their activities and to
share the resources of the system, so that users perceive the system as a single, integrated computing
facility. Thus, middleware is the bridge that connects distributed applications across dissimilar physical
locations, with dissimilar hardware platforms, network technologies, operating systems, and programming
languages. The middleware software is being developed following agreed standards and protocols. It
provides standard services such as naming, persistence, concurrency control to ensures that accurate
results for concurrent processes are produced and obtains the results as fast as possible, event
distribution, authorization to specify access rights to resources, security etc. The middleware service
extends over multiple machines. Figure 1 shows a simple architecture of a distributed system.
Figure 1. A Distributed System
The distributed system can be viewed as defined by the physical components(physical view) or as defined
from user or computation point of view. Physically a distributed system consists of a set of nodes
(computers) linked together by a communication network. The nodes in the network are loosely coupled
and do not share their memory. The nodes in the system communicate by passing messages over the
communication network. Communication protocols are used for sending messages from one node to
another. The logical model is the view that an application has of the system. It contains a set of
concurrent processes and communication channels between them. The core network is treated as fully
connected. Processes communicate by sending messages to each other. A system is synchronous if
during a proper execution, it all the time performs the intended operation in a known fixed time,
otherwise it is asynchronous. In synchronous system the failure can be noticed by a lack of response
from the system. Therefore, timeout based techniques are used for failure discovery.
A distributed system can be constructed by means of fully connected networks or partially connected
networks. A fully connected network (figure 2) is a network in which each of the nodes is
connected to each other. The problem with such a system is that adding new nodes to the system results
in the increase of number of nodes connected to the node. Due to this the number of file descriptors and
complexity for each node to implement the connections are increased heavily. Thus, the scalability
(capability of a system to continue to function well when the system is changed in size or volume) of
such systems is limited by each node’s capacity to open file descriptors and the ability to handle the
new connections. The communication cost - the message delay of sending a message from the
source to the destination- is low because a message sent from one computer to another one only goes
through one link. Fully connected systems are reliable because when a few computers or links fail, the
rest of the computers can still communicate with others.
In a partially connected network, direct links exist between some, but not all, pairs of computers. A few
of the partially connected network models are star structured networks, multi-access bus networks; ring
structured networks, and tree-structured networks (figure 2). Some of the traditional distributed systems
such as client/server paradigm use a star as the network topology. The problem with such a system is
that when the central node fails, the entire system will be collapsed. In a multi-access bus network, a set
of clients are connected via a shared communications line, called a bus. The bus link becomes the
bottleneck and if it fails, all the nodes in the system cannot connect to each other. Another
disadvantage is that performance degrades as additional computers are added or on heavy traffic. In a
ring network each node connects to exactly two other nodes, forming a single continuous pathway
for signals through each node. As new nodes are added, the diameter of the system grows as the number
of nodes in the system, resulting in a longer message transmission delay. A node failure or cable break
might isolate every node attached to the ring. In a tree-structured network (hierarchical network), the
nodes are connected as a tree. Each node in the network having a specific fixed number, of nodes
associated to it at the next lower level in the hierarchy. The scalability of the tree- structured network is
better than that of the fully connected network, since new node can be added as the child node of the leaf
nodes or the interior nodes. On the other hand, in such systems, only messages transmitted between a
parent node and its child node go though one link, other messages transmitted between two nodes have
to go through one or more intermediate nodes.
Fully Connected Network
Tree Structured Network
Multi-access Bus Network
Star Structured Network
Ring Structured Network
Figure 2. Network Models
Characteristics of a Distributed System
A distributed system must possess the following characteristics to deliver utmost performance for the
users:
Fault-Tolerant: Distributed systems consist of a large number of hardware and software modules that
are bound to fail in the long run. Such component failures can lead to service unavailability. Hence, the
systems should be able to recover from component failures without performing erroneous actions. The
goal of fault tolerance is to avoid failures in the system even in the presence of faults to provide
uninterrupted service. A system is said to be fault tolerant if it can mask the presence of faults. The aim of
any fault tolerant system is to increase its reliability or availability. The reliability of a system is defined
as the probability that the system survives till that time. A reliable system prevents loss of
information even in the event of component failures. Availability is the fraction of time for which a
system is available for use. Usually fault tolerance is achieved by providing redundancy. Redundancy is
defined as those parts of the system that are not needed for its correct functioning. It is of three types –
hardware, software and time. Hardware redundancy is achieved by adding extra hardware components to
system which take over the role of failed components in case some faults occur in them. Software
redundancy includes extra instructions and code included for managing the extra hardware components,
and using them correctly for uninterrupted service, in case of some component failure. In time
redundancy the same instruction is executed many times. This is used to handle temporary faults in the
system.
Scalable: A distributed system can operate correctly even as some aspect of the system is scaled to
a larger size. Scale has three components: the number of users and other entities that are part of the
system, the distance between the farthest nodes in the system, and the number of organizations that exert
administrative control over pieces of the system. The three elements of scale affect distributed systems in
many ways. Among the affected components are naming, authentication for verifying someone’s identity,
authorization, communication, the use of remote resources, and the mechanisms by which users observe
the system. Three techniques are employed to manage scale: replication, distribution, and caching.
Replication creates multiple copies of resources. Its use in naming, authentication, and file services
reduces the load on individual servers and improves the reliability and availability of the services as a
whole. The two important issues of replication are the placement of the replicas and the mechanisms by
which they are kept consistent. The placement of replicas in a distributed system depends on the purpose
for replicating the resource. If a service is being replicated to reduce the network delays when the service
is accessed, the replicas are sprinkled across the system. If the majority of users are local, and if the
service is being replicated to improve its availability or to spread the load across multiple servers, then
replicas may be placed near one another. If a change is made to the object, the change should be
noticeable to everyone in the system. For example, the system sends the updates to any replica, and that
replica forwards the update to the others as they become available. If inconsistent updates are received by
different replicas in different orders, timestamps (the date/time at which the update was generated) are
used to differentiate the copies.
Distribution, another mechanism for managing scale in distributed systems, allows the information
maintained by a distributed service to be extended across several servers. Distributing data across
multiple servers reduces the size of the database that must be maintained by each server, dropping the
time needed to search the database. Distribution also spreads the load across the servers reducing the
number of requests that are handled by each. If requests can be distributed to servers in proportion to their
power, the load on servers can be effectively managed. Network traffic can be reduced if data are
assigned to servers close to the location from which they are most frequently used. In tree structured
system, if cached copies are available from subordinate servers, the upper levels can be avoided.
Caching is another important technique for building scalable systems. Caching decreases the load on
servers and the network. Cached data can be accessed faster than if a new request is made. The
difference between replication and caching is that cached data is a short-term data. Instead of propagating
updates on cached data, consistency is maintained by nullifying cached data when consistency cannot be
guaranteed. Caching is usually performed by the client, reducing frequent requests to network services.
Caching can also occur on the servers executing those services.
Reading a file from the memory cached copy on the file server is faster than reading it from the client's
local disk.
Predictable Performance: Various performance metrics such as response time (elapsed time between
the end of an inquiry or demand on a computer system and the beginning of a response), throughput (the
rate at which a network sends or receives data), system utilization, network capacity etc. are employed to
assess the performance. Predictable performance is the ability to provide desired responsiveness in a
timely manner.
Openness: The attribute ‘openness’ ensures that a subsystem is continually open to interaction with
other systems. Web services are software systems designed to support interoperable machine-to-machine
interaction over a network. These protocols allow distributed systems to be extended and scaled. An open
system that scales has benefit over a completely closed and self-reliant system. A distributed system
independent from heterogeneity of the underlying environment such as hardware and software platforms
achieves the property of openness. Therefore, every service is equally accessible to every client (local or
remote) in the system. The implementation, installation and debugging of new services should not be very
complex in a system possessing openness characteristic.
Security: Distributed systems should allow communication between programs/users/ resources on
different computers by enforcing necessary security arrangements. The security features are mainly
intended to provide confidentiality, integrity and availability. Confidentiality (privacy) is protection
against disclosure to unauthorised person. Violation of confidentiality range from the discomforting to the
catastrophic. Integrity provides protection against alteration and corruption. Availability keeps the
resource accessible. Many incidents of hacking compromise the integrity of databases and other resources.
"Denial of service" attacks are attacks against availability. Other important security concerns are access
control and nonrepudiation. Maintaining access control facilitates the users to access only those
resources and services to which they are entitled. It also ensures that users are not denied resources that
they legitimately can expect to access. Nonrepudiation provides protection against denial by one of the
entities involved in a communication. The security mechanisms put into practice should guarantee
appropriate use of resources by different users in the system.
Transparency: Distributed systems should be perceived by users and application developers as a
whole rather than as a collection of cooperating components. The locations of the computer systems
involved in the operations, concurrent operations, data replication, resource discovery from multiple sites,
failures, system recovery etc. are hidden from users. Transparency hides the distributed nature of the
system from its users and shows the user that the system is appearing and performing as a normal
centralized system. The transparency can be employed in different ways in a distributed system (Figure 3).
Figure 3. Transparency in Distributed Systems
Access transparency facilitates the users of a distributed system to access local and remote resources
using identical operations. (e.g. navigation in the web).
Location transparency describes names used to identify network resources (e.g.
IP address) independent of both the user's location and the resource location. In other words, location
transparency facilitates a user to access resources from anywhere on the network without knowing where
the resource is located. A file could be on the user's own PC, or thousands of miles away on other servers.
Concurrency transparency enables several processes to operate concurrently using shared
information objects without interference between them (e.g.: Automatic Teller Machine network). The
users will not notice the existence of other users in the system (even if they access the same resources).
Replication transparency enables the system to make additional copies of files and other resources for
the purpose of performance and/or reliability, without the users noticing. If a resource is replicated
among several locations, it should appear to the user as a single resource (e.g. Mirroring - Mirror sites are
usually used to offer multiple sources of the same information as a way of providing reliable access to
large downloads).
Failure transparency enables the applications to complete their task despite failures occurring in
certain components of the system. For example, if a server fails, but users are automatically redirected to
another server and the user never notices the failure, the system is said to show high failure transparency.
Failure transparency is one of the most difficult types of transparency to accomplish since it is hard to
determine whether a server has actually failed, or whether it is simply responding very slowly. Moreover,
it is generally unfeasible to achieve full failure transparency in a distributed system since networks are
unreliable.
Migration transparency facilitates the resources to move from one location to another without having
their names changed. (e.g.: Web Pages). Users should not be aware of whether a resource or computing
entity possesses the ability to move to a different physical or logical location.
Performance transparency ensures the load variation should not lead to performance degradation. This
could be achieved by automatic reconfiguration as response to changes of the load. (e.g.: load
distribution)
Scalability transparency allows the system to remain efficient even with a significant increase in
the number of users and resources connected (e.g. World-Wide-Web, distributed database).
Fallacies of distributed computing
The fallacies of distributed Computing are a set of common but false assumptions made by programmers
when first developing distributed applications. Many distributed systems which were developed based on
these assumptions were needlessly complex caused by mistakes that required patching later on.
The Fallacies of Distributed Computing
1.
The network is reliable.
2.
Latency is zero.
3.
Bandwidth is infinite.
4.
The network is secure.
5.
Topology doesn't change.
6.
There is one administrator.
7.
Transport cost is zero.
8. The network is homogeneous.
Reliability: The software which has been developed with the assumption that network is reliable; the
network will lead to trouble when it starts dropping packets. Reliability can often be improved by
increasing the autonomy of the nodes in a system. Replication can also improve the reliability of a system.
Latency: Latency is the time between initiating a request for data and the beginning of the actual data
transfer. Latency can be comparatively good on a LAN; however it deteriorates quickly the user move to
WAN scenarios or internet scenarios. Assuming latency is zero will definitely lead to scalability problems
as the application grows geographically, or is moved to a different kind of network.
Bandwidth: A measure of the capacity of a communications channel, i.e. how much data we can transfer
during that time. The higher a channel's bandwidth, the more information it can carry. However, there are
two forces at work to keep this assumption a fallacy. One is that while the bandwidth grows, so does the
amount of information we try to squeeze through it. VoIP, videos, and IPTV are some of the newer
applications that take up bandwidth. The other force at work to lower bandwidth is packet loss (along with
frame size). Bandwidth limitations direct us to strive to limit the size of the information we send over the
wire.
Security: The network is never secure since the systems are facing various types of threats. Hence,
the developer should perform threat modelling to evaluate the security risks. Following this, analyze
which risk should be mitigated by what measures (a tradeoff between costs, risks and their probability)
and take appropriate measures. Security is typically a multi-layered solution that is handled on the
network, infrastructure, and application levels. The software architect should be conscious that security is
very essential and the consequences it may have.
Topology: Topology deals with the different configurations that can be adopted in building networks, such
as a ring, bus, star or fully connected. For example any given node in the LAN will have one or more links
to one or more other nodes in the network and the mapping of these links and nodes onto a graph results in
a geometrical shape that determines the physical topology of the network. Likewise, the mapping of the
flow of data between the nodes in the network determines the logical topology of the network. The
physical and logical topologies might be identical in any particular network but they also may be
different. When a new application is deployed in an organization, the network structure may also be
altered. The operations team is likely to add and remove servers every once in a while and/or make other
changes to the network. Finally, there are server and network faults which can cause routing changes. At
the client’s side the situation is even worse. There are laptops coming and going, wireless ad hoc
networks, new mobile devices etc. In a nutshell, topology in a distributed system is changing persistently.
Administrator: The sixth distributed computing fallacy is "there is one administrator”. A simple
situation is that with different administrators assigned according to expertise databases, web servers,
networks, different operating systems and the like for a company. The problem occurs when the company
collaborates with external entities such as a business partner, or if the application is deployed for Internet
consumption and hosted by some hosting service and the application consumes external services. In these
situations, the other administrators are not even under company administrators control and they may have
their own rules for administration. Hence the assumption of ‘one administrator’ is proven to be a myth.
Most of the time, the administrators are not part of the software development team. Therefore, the
developers should provide them with tools to diagnose and find problems. A practical approach is to
include tools for monitoring ongoing operations as well; for instance, to allow administrators recognize
problems when they are small before they become a system failure. As a distributed system grows, its
various components such as users, resources, nodes, networks, etc. will start to cross administrative
domains. This means that the number of organizations or individuals that exert administrative control over
the system will grow. In a system that scales poorly with regards to administrative growth, this can lead to
problems of resource usage, reimbursement, security, etc.
Transport cost: Transport cost never becomes zero. The costs for setting and running the network are
not free. There are costs for buying the routers, costs for securing the network, costs for leasing
the bandwidth for internet connections, and costs for operating and maintaining the network running.
Homogeneous network: The eighth distributed computing fallacy is “network is homogeneous."
Homogeneous network is a network derived of computers using similar configuration and protocols.
Except a few very trivial ones, no network is homogeneous. Proprietary protocols are very harder to
integrate. Hence, make use of standard technologies that are widely accepted such as XML (extended
markup language) or Web Services as these technologies help alleviate the effects of the heterogeneity of
the enterprise environment.
Client/Server Computing
As networks of computing resources have become widespread, the notion of distributing interrelated
processing amongst several resources has become popular. Over the years, numerous methods have
evolved to facilitate this distribution. One of the popular distributed models is client/server computing
[SILV98]. The client/server model is an extension of the modular programming model. Modular
programming breaks down the design of a program into individual modules that can be programmed and
tested independently. A modular program consists of a main module and one or more auxiliary modules.
Like modular programming model, a client/server model consists of clients and servers. The clients and
servers normally run on different computers interconnected by a computer network. The calling
component becomes the client and the called component the server.
Figure 4. Client/server communication
A client application sends messages to a server via the network to request the server for performing a
specific task. The client handles local resources such as input-output devices, local disks, and other
peripherals. The server program listens for client requests that are transmitted via the network. Servers
receive those requests and perform actions. Most of the data is processed on the server and only the
results are returned to the client. This reduces the amount of network traffic between the server and the
client machine. Thus network performance is improved further. The server controls the allocation of the
information and also optimizes the resource consumption.
An important design consideration for large client/server systems is whether a client talks directly to the
server, or whether an intermediary process is introduced between the client and the server. The former
is a two-tier architecture (figure 4); the latter is a three-tier architecture. N-tier architecture is usually
used for web applications to forward the requests further to other enterprise services. The two-tier
architecture is easier to implement and is typically used in small environments. However, two-tier
architecture is less scalable than a three-tier architecture.
In the three-tier architecture (figure 5), an intermediate process connects the clients and servers . The
intermediary can accumulate frequently used server data to guarantee enhanced performance and
scalability. In database based 3-tier client/server architecture, normally there are three essential
components: a client computer, an application server and a database server. The application server
is the middle tier server which runs the business application. The data is retrieved from database server
and it is forwarded to the client by the middle tier server. Middleware is a key to developing three-tier
client/server application. Database-oriented middleware offers an Application Program Interface (API)
access to a database. Java Database Connectivity (JDBC) is a well-known API, these classes can be
inured to aid an applet or a servlet access any number of databases without considerate the inhabitant
features of
the database.
Figure 5. 3-tier Client/server structure
For security purposes servers must also address the problem of authentication. In a networked
environment, an unauthorized client may attempt to access sensitive data stored on a server.
Authentication of clients is provided by cryptographic techniques such as public key encryption or
special authentication servers. Sometimes critical servers are replicated in order to achieve high
availability and fault tolerance. If one replica fails then the other replicas hosted in different servers still
remain available for the clients.
In the 3-tier architecture, it is easier to modify or replace any tier without affecting the other tiers. The
separation of application and database functionality achieves better load balancing. Moreover, necessary
security guidelines can be put into effect within the server tiers without hampering the clients.
Web Server
Client
Request/Response
Proxy Server
Web Server
Web Server
Figure 6. Proxy Server Model
A 3-tier client/server model known as ‘proxy server model’ (figure 6) is commonly used to improve
retrieval performance of Internet. The intermediate process – proxy server, distributes client requests to
several servers so that requests execute in parallel. A client connects to the proxy server, requesting some
service, such as web page available in a web server. The proxy server assesses the request based on its
filtering policy. For example, it may filter traffic by IP address or protocol. If the request is
authenticated by the filter, the proxy presents the resource by connecting to the appropriate server and
demanding the required service for the client. A proxy server may sometimes serve the request without
contacting the specified web server. This is made possible by keeping the pages commonly visited by
users in the cache of the proxy. By keeping local copies of frequently accessed file, the proxy can serve
those files back to a requesting browser without going to the external site each time; this dramatically
improves the performance seen by the end user. A proxy server with the ability to cache information is
generally called a "proxy-cache server". A proxy is sometimes used to authenticate users by asking them
to identify themselves, such as with a username and password. It is also easy to grant access to
external
resources
only
to
authorised
users,
and
to
record
each
use
of
external resources in log files. A proxy can also be used in a reverse direction to balance the load
amongst a set of identical servers.
Advantages of client/server model
i. All resources are centralised, hence, server can manage resources that are common to all users. For
example a central database would be used to evade problems caused by redundant and conflicting data.
ii. Improved security is offered as the number of entry points giving access to data is not so important.
iii. Server level administration is adequate as clients do not play a major role in the client/server model,
they require less administration.
iv. A scalable network as it is possible to remove or add clients without affecting the operation of the
network and without the need for major modification.
Disadvantages of the client/server model
i. Increased cost due to the technical complexity of the server.
ii. If a server fails, none of its clients will get service, unless the system is designed to be fault-tolerant.
iii. If the network fails, all servers become unreachable.
iv. If one client produces high network traffic, then all clients may suffer from long response times.
World Wide Web– A massive distributed system
The Internet - a massive network of networks, connects millions of computers together worldwide,
forming a network in which any computer can communicate with any other computer provided that
they are both connected to the Internet. The World Wide Web (WWW), or simply Web, is a way of
accessing information over the medium of the Internet. WWW consists of billions of web pages, spread
across thousands and thousands of servers all over the world. It is an information-sharing model that is
built on top of the Internet. The most well-known example of a distributed system is the collection of web
servers. Hypertext is a document containing words that bond to other documents in the Web. These words
are known as links and are selectable by the user. A single hypertext document can hold links to many
documents.
The backbone of WWW are its files, called pages or Web pages, containing information and links to
resources - both text and multimedia - throughout the Internet. Internet protocols are sets of rules that allow
for inter-machine communication on the Internet. HTTP (HyperText Transfer Protocol) transmits hypertext
over networks. This is the protocol of the Web. Simple Mail Transport Protocol or SMTP distributes e-mail
messages and attached files to one or more electronic mailboxes. VoIP (Voice over Internet Protocol) allows
delivery of voice communications over IP networks, for example, phone calls. A web server accepts HTTP
requests from clients, and serving them HTTP responses along with optional data contents such as web
pages.
The operation of the web relies primarily on hypertext as its means of information retrieval. Web pages can be
created by user activity. Creating hypertext for the Web is accomplished by creating documents with a
language called hypertext markup language, or HTML. With HTML, tags are placed within the text to
achieve document formatting, visual features such as font size, italics and bold, and the creation of hypertext
links. Servers implementing the HTTP protocol jointly provide the distributed database of hypertext and
multimedia documents. The clients access the web through the browser software installed on their system.
The URL (uniform resource locator) indicates the internet address of a file stored on a host computer, or
server, connected to the internet. URLs are translated into numeric addresses using the domain name system
(DNS). The DNS is a worldwide system of servers that stores location pointers to web sites. The numeric
address, called the IP (Internet Protocol) address, is actually the "real" URL. Once the translation is made
by the DNS, the browser can contact the web server and ask for a specific file located on its site. Web
browsers use the URL to retrieve the file from the server. Then the file is downloaded to the user's computer,
or client, and displayed on the monitor connected to the machine. Due to this correlation between clients and
servers, the web is a client-server network. The web is used by millions of people daily, for different purposes
including email, reading news, downloading music, online shopping or simply accessing information about
anything. In fact, the web symbolizes a massive distributed system that materializes as a single resource to the
user accessible at the click of a button. In order for the web to be accessible to anyone, some agreed-upon
standards must be pursued in the formation and delivery of its content. An organization leading the efforts
to standardize the web is the World Wide Web (W3C) Consortium.
Page
14
Web Information Retrieval
Web information retrieval is the process of searching the world’s largest and linked document collection
– the World Wide Web, for information most relevant to a user’s query. The various challenges of
information retrieval on the web are: (i) data is distributed - data spans over many computers, of a variety of
platforms, (ii) data is volatile - computers and files are added and removed frequently and unpredictably,
(iii) volume of data is very huge - growth continues exponentially, (iv) data quality is inconsistent - data
may be false, error-ridden, invalid, outdated, ambiguous and multiplicity of sources introduces inconsistency
and (v) heterogeneous data - multiple media types and media formats and multiple languages and
alphabets. As a result, it would be physically unfeasible for an individual to sift through and examine
all these pages to find the required information. Usually, in order to search for information on the internet a
software tool called Search Engine is used. When a user enters a query into a search engine from their
browser software, their input is processed and used to search the database for occurrences of particular
keywords. A variety of search engines such as Google, Yahoo! Search, are available to make the web
retrieval process very faster. Two main architectures used for web searching are centralised and distributed
search.
Figure 7. Search Engine: Centralised Architecture
Page
15
Centralised Architecture: The aim of centralised approach is to index sizeable portion of Web, independently
of topic and domain. The centralised architecture based search engine has main three parts: a crawler, an
indexer, and query handler. The crawler (spider or robot) retrieves web pages, compress and store into a
page repository. This process is called crawling (sometimes known as robot spidering, gathering or
harvesting). Some of the most well known crawlers include Googlebot (from Google) MSNBot (from MSN)
and Slurp (from Yahoo!). Crawlers are directed by a crawler control module that gives the URLs to visit
next. The indexer processes the web pages collected by the crawler and builds an index, which is the
main data structure used by the search engine and represents the crawled web pages. The inverted index
contains for each word a sorted list of couples such as docID and position in the document. The query
engine processes the user queries and returns matching results using the index. The results are returned to the
user in an order determined by a ranking algorithm. Each search engine may have a different ranking
algorithm, which parses the pages in the engine’s database to determine relevant responses to search queries.
Some search engines keep a local cache copy of many popular pages indexed in their database, to allow for
faster access and in case the destination server is temporarily inaccessible.
Figure 8. Search Engine: Distributed Architecture
Distributed architecture: Searching is a coordinated effort of many information gatherers and brokers.
Gatherer extracts information (called summaries) from the documents stored on one or more web servers. It
can handle documents in many formats: HTML, PDF, Postscript, etc. Broker obtains summaries from
gatherers, stores them locally, and indexes the data. It can search the data; fetch data from other brokers
and makes data available for user queries and to other brokers. The advantages of distributed architecture are
the gatherer running on a server reduces the external traffic on that server and evading of gatherer sending
information to multiple brokers reduces work repetition.
Page
16
Advanced Distributed System Models
The distributed systems are cost-effective as compared to central systems. The introduction of redundancy
amplifies the availability even some parts of the system stop working. Quite a few applications can be run
concurrently in a distributed system. Adding new components does not influence the performance of
distributed systems as the systems are scalable. A large number of computers take part in performing a
distributed computing task. Thus, distributed systems provide shorter response time and superior throughput
than centralised systems. Another merit is that distributed systems are very reliable. The distributed systems
have the benefit of being highly available. Because of these multiple benefits, a range of distributed systems
and applications have been developed recently and are being used extensively in the real world. Well-known
distributed computing systems are clusters, grids, peer-to-peer networks (P2P), distributed storage systems
and so on. Moreover, mobile computing based distributed systems are also emerging.
The concept behind clustering, in its simplest form, is that many smaller computers can be grouped in a way
to make a computing structure that could provide all of the processing power needed, for much less capital.
A grid is a type of distributed system that allows coordinated sharing and aggregation of distributed,
heterogeneous resources based on users’ requirements. Grids are normally used to support applications
emerging in the areas of e-Science and e-Business. Usually grid computing involves geographically
distributed communities of people who engage in collaborative activities to solve large scale problems. This
requires sharing of various resources such as computers, data, applications and scientific apparatuses. P2P
networks are decentralized distributed systems, which enable applications such as file-sharing, instant
messaging, internet telephony, content distribution, high performance computing etc. The most renowned
function of peer-to-peer networking is found in file-exchanging communities. P2P technology is also used for
Page
17
applications such as the harnessing of the dormant processing power in desktop PCs and remote storage
sharing. Distributed storage systems provide users with a unified view of data stored on different file
systems and computers which may be on the same or different networks.
Client/Server Model
4.1 Introduction
Client/Server is a term used to describe a computing model for the development of computerized systems.
This model is based on the distribution of functions between two types of independent and autonomous
processors: servers and clients. A client is any process that requests specific services from server processes. A
server is a process that provides requested services for clients. Client and server processes can reside in the
same computer or in different computers connected by a network.
The term Client/Server is in reality a logical concept. The client and server components may not exist on
distinct physical hardware. A single machine can be both a client and a server depending on the software
configuration. The Client/Server technology is a model, for the interaction between simultaneously executing
software processes. The term architecture refers to the logical structure and functional characteristics of a
system, including the way they interact with each other in terms of computer hardware, software and the
links between them.
In case of Client/Server systems, the architecture means the way clients and servers
along with the requisite software are configured with each others. Client/Server architecture is based on the
hardware and the software components that interact to form a system. The limitations of file sharing
architectures led to the emergence of the Client/Server architecture.
Forces that drive the Client/Server
The general forces that drive the move to client/server computing are:
• The changing business environment.
• The growing need for enterprise data access.
• The demand for end user productivity gains based on the efficient use for data resources.
• Technological advances that have made client/server computing practical.
• Growing cost/performance advantages of PC-based platforms.
Client/Server architecture
The client/Server architecture is based on hardware and software components that interact to form a system.
This system includes three main components:
•Clients
•Servers
•Communication middleware
• Client:
The client is any computer process that requests services from the server. The client is also known as the
front-end-application, reflecting the fact that the end user usually interacts with the client process.
Page
18
• Server:
The server is any computer process providing services to the clients. The server is also known as the backend application, reflecting the fact that the server process provides the background services for the client
process.
• Communication middleware:
It is any computer process(es) through which clients and servers communicate. The communication
middleware, also known as middleware or the communications layers, is made up of several layers of
software that aid the transmission of data and control information between clients and servers.
How Components Interact
Communication
middleware routes
SQL request to
Database server
process
Client process sends
SQL request through
Communication
Middleware
SQL
Client Process
Data
Communication
Middleware
Network
Database server process
receives request, validates
it, and execute it
SQL
Database Server
Data
Client/Server Principles
• Hardware Independence
• Software Independence
• Operating System
• Network System
• Applications
• Open access to services
• Process distribution
• Process autonomy
• Maximization of local resources
• Scalability and flexibility
• Interoperability and Integration
• Standards
Hardware Independence:
The principle of hardware independence requires that the client, server, and communications middleware
processes run on multiple hardware platforms (IBM, DEC,APPLE, and so on) without any functional
difference.
Software Independence:
The principle of software independence requires that the client, server, and communications middleware
processes support multiple operating system (Windows family, Linux, and Unix), multiple network protocols
(such as IPX and TCP/IP), and multiple applications (spreadsheets, databases, electronic mail, and so on).
Page
19
Open access to services:
All clients in the system must have open access to all the services provided within the network, and these
services must not be dependent on the location of the client or the server. A key issue is that the services be
provided on demand to the client. In fact, the provision of on-demand service is one of the main objectives of
the client/server computing model.
Process distribution:
A prime identifying characteristic of client/server systems is that the processing of information is disturbed
among clients and servers. The division of the application processing load must conform to the following
rules:
Process distribution…
• Client and server processes must be autonomous entities with clearly defined boundaries and functions.
• Local utilization of resources (at both the client and server sides) must be maximized. The client and server
processes must fully utilize the processing power of the host computers.
• Scalability and Flexibility require that the client and server processes be easily upgradeable to run on more
powerful hardware and software platforms.
• Interoperability and integration require that client and server processes be seamlessly integrated to form a
system. Swapping a server process must be transparent to the client process.
Standards:
Finally, all the principles must be based on standards applied within the client/server architecture. For
example, standards must govern the user interface, data access, network protocols; inter process
communications, and so on. Standards ensure that all components interact in an orderly manner to achieve the
desired results.
Client Components
• Power Hardware
• An operating system capable of multitasking
• A graphical user interface (GUI)
• Communications Capabilities
• File services
• Print services
• Fax services
• Communication services
• Database services
• Transaction services
• Miscellaneous services
Client/Server Databases
A database management system (DBMS) lies at the center of most client/server systems in use today. To
function properly, the client/server DBMS must be able to:
• Provide transparent data access to multiple and heterogeneous clients, regardless of the hardware, software,
and network platform used by the client application.
• Allow client requests to the database server (using SQL requests) over the network.
• Process client data requests at the local server.
• Send only the SQL results to the clients over the network.
Page
20
A client/server DBMS reduces network traffic because only the rows that match the query are returned.
Therefore, the client computer resources are available to perform other system chores such as the
management of the graphical user interface.
Client/server systems change the way in which we approach data processing. Data may be stored in one site
or in multiple sites.
Client/Server Development Tools
• GUI-based development
• A GUI builder that supports multiple interfaces
• Object-oriented development with support for code reusability
• Data dictionary with a central repository for data and applications
• Support for multiple databases
• Data access regardless of data model
•Seamless access to multiple databases.
•Complete SDLC support from planning to implementation and maintenance
•Team development support
•Support for third-party development tools
•Prototyping and rapid application development (RAD) capabilities
•Support for multiple platforms
This approach introduced a database server to replace the file server. Using a Relational Database
Management System (RDBMS), user queries could be answered directly. The Client/Server architecture
reduced network traffic by providing a query response rather than total file transfer. It improves multi-user
updating through a GUI front-end to a shared database. In Client/Server architectures, Remote Procedure
Calls (RPCs) or Structural Query Language (SQL) statements are typically used to communicate between the
client and server.
File sharing architecture (not a Client/Server architecture):
File based database (flat-file database are very efficient to extracting information from large data files.
Each workstation on the network has access to a central file server where the data is stored.
The data files can also reside on the workstation with the client application. Multiple workstations will
access the same file server where the data is stored. The file server is centrally located so that it can be
reached easily and efficiently by all workstations.
The original PC networks were based on file sharing architectures, where the server downloads files form
the shared location to the desktop environment. The requested user job is then run (including logic and
data) in the desktop environment.
File sharing architectures work if shared usage is low, update contention is low, and the volume of data to
be transferred is low. In the 1990s, PC LAN (Local Area Network) computing changed because the capacity
of file sharing was strained as the number of online users grew (it can only satisfy about 12 users
simultaneously) and Graphical User Interfaces (GUIs)became popular (making mainframe and terminal
displays appear out of data). PCs are now being used in Client/ Server architectures.
Page
21
Mainframe architecture (not a Client/Server architecture)
With mainframe software architectures all intelligence is within the central host computer. Users interact
with the host through a terminal that captures keystrokes and sends that information to the host. Mainframe
software architectures are not tied to a hardware platform. User interaction can be done using PCs and
UNIX workstations. A limitation of mainframe software architectures is that they do not easily support
graphical user interfaces or access to multiple databases from geographically dispersed sites. In the last
few years, mainframes have found a new use as a server in distributed Client/Server architectures.
The Client/Server software architecture is a versatile, message-based and modular infrastructure that is
intended to improve usability, flexibility, interoperability, and scalability as compared to centralized,
mainframe, time sharing computing.
4.2 Components
Client/Server architecture is based on hardware and software components that interact to form a system. The
system includes mainly three components.
(i) Hardware (client and server).
(ii) Software (which make hardware operational).
(iii) Communication middleware. (associated with a network which are used to link the hardware and
software).
The client is any computer process that requests services from server. The client uses the services provided
by one or more server processors. The client is also known as the front-end application, reflecting that the
end user usually interacts with the client process.
The server is any computer process providing the services to the client and also supports multiple and
simultaneous clients requests. The server is also known as back-end application, reflecting the fact that the
server
process
provides
the
background
services
for
the
client
process.
The communication middleware is any computer process through which client and server communicate.
Middleware is used to integrate application programs and other software components in a distributed
environment. Also known as communication layer. Communication layer is made up of several layers of
software that aids the transmission of data and control information between Client and Server.
Communication middleware is usually associated with a network. The Fig. 4.1 below gives a general
structure of Client/ Server System.
Page
22
Now as the definition reveals, clients are treated as the front-end application and the server as the back-end
application, the Fig. 4.2 given below shows the front-end and back-end functionality.
Fig.4.2: Front-end and Back-end Functionality
4.2.1 Interaction between the Components
The interaction mechanism between the components of Client/Server architecture is clear from the Fig. 4.3.
The client process is providing the interface to the end users. Communication middleware is providing all the
possible support for the communication taking place between the client and server processes.
Communication middleware ensures that the messages between clients and servers are properly routed and
delivered. Requests are handled by the database server, which checks the validity of the request, executes
them, and send the result back to the clients.
4.2.2 Complex Client/Server Interactions
Page
23
The better understanding about the functionality of Client/Server is observed when the clients and server
interact with each other. Some noticeable facts are:
-A client application is not restricted to accessing a single service. The client contacts a different server
(perhaps on a different computer) for each service.
-A client application is not restricted to accessing a single server for a given service.
-A server is not restricted from performing further Client/Server interactions — a server for one service can
become a client of another.
Fig.4.3: Components Interaction
Fig.4.4: A Complex Client/Server Environment
Page
24
Generally, the client and server processes reside on different computers. The Fig. 4.4 illustrates a
Client/Server system with more than one server and several clients. The system comprises of the Back-end,
Front-end Processes and Middleware.
Back-end processes as: IBM Database server process and Compaq Zeon Server Front-end as:
Application client processes (Windows, Unix and Mac Systems) Middleware as: Communication
middleware (network and supporting software)
The client process runs under different Operating Systems (Windows, Unix and Mac System), server process
(IBM and Compaq computers) runs under different operating system (OS/2 and Unix). The communication
middleware acts as the integrating platform for all the different components. The communication can take
place between client to client and as well as server to server.
4.3 Principles Behind Client/Server Systems
The components of the Client/Server architecture must conform to some basic principles, if they are to
interact properly. These principles must be uniformly applicable to client, server, and to communication
middleware components. Generally, these principles generating the Client/Server architecture constitute the
foundation on which most current- generation Client/Server system are built. Some of the main principles are
as follows:
(i) Hardware independence. (ii) Software independence.
(iii) Open access to services. (iv) Process distribution.
(v) Standards.
(i) Hardware independence: The principles of hardware independence requires that the Client, Server, and
communication middleware, processes run on multiple hardware platforms (IBM, DEC, Compaq, Apple,
and so on) without any functional differences.
(ii) Software independence: The principles of software independence requires that the Client, Server, and
communication middleware processes support multiple operating systems (such as Windows 98, Windows
NT, Apple Mac system, OS/2, Linux, and Unix) multiple network protocols (such as IPX, and TCP/IP), and
multiple application (spreadsheet, database electronic mail and so on).
(iii) Open access to services: All client in the system must have open (unrestricted) access to all the services
provided within the network, and these services must not be dependent on the location of the client or the
server. A key issue is that the services should be provided on demand to the client. In fact, the provision of
on- demand service is one of the main objectives of Client/Server computing model.
(iv) Process distribution: A primary identifying characteristic of Client/Server system is that the processing
of information is distributed among Clients and Servers. The division of the application-processing load
must conform to the following rules:
• Client and server processes must be autonomous entities with clearly defined boundaries and functions.
This property enables us to clearly define the functionality of each side, and it enhances the modularity and
flexibility of the system.
• Local utilization of resources (at both client and server sides) must be maximized.
The client and server process must fully utilize the processing power of the host computers. This property
enables the system to assign functionality to the computer best suited to the task. In other words, to best
utilize all resources, the server process must be shared among all client processes; that is, a server process
should service multiple requests from multiple clients.
Page
25
• Scalability and flexibility requires that the client and server process be easily upgradeable to run on more
powerful hardware and software platforms. This property extends the functionality of Client/Server
processes when they are called upon to provide the additional capabilities or better performance.
• Interoperability and integration requires that client and server processes be seamlessly integrated to form a
system. Swapping a server process must be transparent the client process.
(v) Standards: Now, finally all the principles that are formulated must be based on standards applied within
the Client/Server architecture. For example, standard must govern the user interface, data access, network
protocols, interprocess communications and so on. Standards ensure that all components interact in an
orderly manner to achieve the desired results. There is no universal standard for all the components. The fact
is that there are many different standards from which to choose. For example, an application can be based on
Open Database Connectivity (ODBC) instead of Integrated Database Application Programming Interface
(IDAPI) for Data access (ODBC and IDAPI are database middleware components that enables the system to
provide a data access standard for multiple processes.) Or the application might use Internet work Packet
Exchange (IPX) instead of Transmission Control Protocol/Internet Protocol (TCP/IP) as the network
protocol. The fact that the application does not use single standards does not mean that it will be a
Client/Server application. The point is to ensure that all components (server, clients, and communication
middleware) are able to interact as long as they use the same standards. What really defines Client/Server
computing is that the splitting of the application processing is independent of the network protocols used.
4.4 Client Components
As we know, the client is any process that requests services from the server process. The client is proactive
and will, therefore, always initiate the conversation with the server. The client includes the software and
hardware components. The desirable client software and hardware feature are:
(i) Powerful hardware.
(ii) An operating system capable of multitasking.
(iii) Communication capabilities.
(iv) A graphical user interface (GUI).
(i) Powerful hardware: Because client processes typically requires a lot of hardware resources, they should
be stationed on a computer with sufficient computing power, such as fast Pentium II, III, or RISC
workstations. Such processing power facilitates the creation of systems with multimedia capabilities. A
Multimedia system handles multiple data types, such as voice, image, video, and so on. Client processes also
require large amount of hard disk space and physical memory, the more such a resource is available, the
better.
(ii) An operating system capable of multitasking: The client should have access to an operating system
with at least some multitasking capabilities. Microsoft Windows 98 and XP are currently the most common
client platforms. Windows
98 and XP provide access to memory, pre-emptive multitasking capabilities, and a
graphical user interface, which makes windows the platform of choice in a majority of Client/Server
implementations. However, Windows NT, Windows 2000 server, OS/2 from IBM corporation, and the many
“flavours” of UNIX, including Linux are well-suited to handle the Client/Server processing that is largely
done at the server side of the Client/Server equation.
(iii) Communication capabilities: To interact efficiently in a Client/Server environment, the client computer
must be able to connect and communicate with the other components in a network environment. Therefore,
the combination of hardware and operating system must also provide adequate connectivity to multiple
network operating systems. The reason for requiring a client computer to be capable of connecting and
accessing multiple network operating systems is simple services may be located in different networks.
Page
26
(iv) A graphical user interface (GUI): The client application, or front-end, runs on top of the operating
system and connects with the communication middleware to access services available in the network. Several
third generation programming languages (3GLs) and fourth generation languages (4GLs) can be used to
create the front-end application. Most front-end applications are GUI-based to hide the complexity of the
Client/Server components from the end user. The Fig. 4.5 given below illustrates the basic client
components.
Fig.4.5:
Client
Components
4.5 Server Components
As we have already discussed, the server is any process that provides services to the client process. The
server is active because it always waits for the client’s request. The services provided by server are:
(i) File services: For a LAN environment in which a computer with a big, fast hard disk is shared among
different users, a client connected to the network can store files on the file server as if it were another local
hard disk.
(ii) Print services: For a LAN environment in which a PC with one or more printers attached is shared
among several clients, a client can access any one of the printers as if it were directly attached to its own
computer. The data to be printed travel from the client’s PC to the server printer PC where they are
temporarily stored on the hard disk. When the client finishes the printing job, the data is moved from the hard
disk on the print server to the appropriate printer.
(iii) Fax services: This requires at least one server equipped (internally or externally) with a fax device. The
client PC need not have a fax or even a phone line connection. Instead, the client submits the data to be
faxed to the fax server with the required information; such as the fax number or name of the receiver. The fax
server will schedule the fax, dial the fax number, and transmit the fax. The fax server should also be able to
handle any problems derived from the process.
Page
27
(iv) Communication services: That let the client PCs connected to the communications server access other
host computers or services to which the client is not directly connected. For example, a communication
server allows a client PC to dial out to access board, a remote LA location, and so on.
(v)
Database services: Which constitute the most common and most successful Client/Server
implementation. Given the existence of database server, the client sends SQL request to the server. The
server receives the SLQ code, validates it, executes it, and send only the result to the client. The data and the
database engine are located in the database server computer.
(vi) Transaction services: Which are provided by transaction servers that are connected to the database
server. A transaction server contains the database transaction code or procedures that manipulate the data in
database. A front-end application in a client computer sends a request to the transaction server to execute a
specific procedure store on the database server. No SQL code travels through the network. Transaction
servers reduce network traffic and provide better performance than database servers.
(vii) Groupware services: Liable to store semi-structured information like Text, image, mail, bulletin
boards, flow of work. Groupware Server provides services, which put people in contact with other people,
that is because “groupware” is an ill-defined classification. Protocols differ from product to product. For
examples:
Lotus
Notes/Domino,
Microsoft
Exchange.
(viii) Object application services: Communicating distributed objects reside on server.
Object server provides access to those objects from client objects. Object Application Servers are responsible
for Sharing distributed objects across the network. Object Application Servers uses the protocols that are
usually some kind of Object Request Broker (ORB). Each distributed object can have one or more remote
method. ORB locates an instance of the object server class, invokes the requested method, and returns the
results to the client object. Object Application Server provides an ORB and application servers to implement
this.
(ix) Web application services: Some documents, data, etc., reside on web servers.
Web application provides access to documents and other data. “Thin” clients typically use a web browser
to request those documents. Such services provide the sharing of the documents across intranets, or across
the Internet (or extranets). The most commonly used protocol is HTTP (Hyper Text Transport Protocol).
Web application servers are now augmenting simple web servers.
(x) Miscellaneous services: These include CD-ROM, video card, backup, and so on.
Like the client, the server also has hardware and software components. The hardware components include
the computer, CPU, memory, hard disk, video card, network card, and so on. The computer that houses the
server process should be the more powerful computer than the “average” client computer because the server
process must be able to handle concurrent requests from multiple clients. The Fig.4.6 illustrates the
components of server.
The server application, or back-end, runs on the top of the operating system and interacts with the
communication middleware components to “listen” for the client request for the services. Unlike the frontend client processes, the server process need not be GUI based. Keep in mind that back-end application
interacts with operating system (network or stand alone) to access local resources (hard disk, memory, CPU
cycle, and so on). The back-end server constantly “listens” for client requests. Once a request is received
the server processes it locally. The server knows how to process the request; the client tells the server only
what it needs do, not how to do it. When the request is met, the answer is sent back to the client through the
communication middleware.
The server hardware characteristics depend upon the extent of the required services. For example, a
database is to be used in a network of fifty clients may require a computer with the following minimum
characteristic:
♦ Fast CPU (RISC, Pentium, Power PC, or multiprocessor)
♦ Fault tolerant capabilities:
Page
28
• Dual power supply to prevent power supply problem.
•
Standby
power
supply
to
protect
against
power
line
failure.
• Error checking and correcting (ECC) memory to protect against memory module failures.
• Redundant Array to Inexpensive Disk (RAID) to provide protection against physical hardware failures.
Fig.4.6: Server Components
• Expandability of CPU, memory disk, and peripherals.
• Bus support for multiple add-on boards.
• Multiple communication options.
In theory, any computer process that can be clearly divided into client and server components can be
implemented through the Client/Server model. If properly implemented, the Client/Server architectural
principles for process distribution are translated into the following server process benefits:
• Location independence. The server process can be located anywhere in the network.
• Resource optimization. The server process may be shared.
• Scalability. The server process can be upgraded to run on more powerful platforms.
Page
29
• Interoperability and integration. The server process should be able to work in a “Plug and Play”
environment.
These benefits added to hardware and software independence principles of the Client/ Server computing
model, facilitate the integration of PCs, minicomputer, and mainframes in a nearly seamless environment.
4.5.1 The Complexity of Servers
The server processes one request at a time; we can say servers are fairly simple because they are sequential.
After accepting a request, the server forms a reply and sends it before requesting to see if another request has
arrived. Here, the operating system plays a big role in maintaining the request queue that arrives for a server.
Servers are usually much more difficult to build than clients because they need to accommodate multiple
concurrent requests. Typically, servers have two parts:
♦ A single master program that is responsible for accepting new requests.
♦ A set of slaves that are responsible for handling individual requests. Further, master server
performs the following five steps (Server Functions):
(i) Open port: The master opens a port at which the client request reached.
(ii) Wait for client: The master waits for a new client to send a request.
(iii) Choose port: If necessary, the master allocates new local port for this request and informs the client.
(iv) Start slave: The master starts an independent, concurrent slave to handle this request (for example: in
UNIX, it forks a copy of the server process). Note that the slave handles one request and then terminates–
the slave does not wait for requests from other clients.
(v) Continue: The master returns to the wait step and continues accepting new requests while the newly
created slave handles the previous request concurrently.
Because the master starts a slave for each new request, processing proceeds concurrently. In addition to the
complexity that results because the server handles concurrent requests, complexity also arises because the
server must enforce authorization and protection rules. Server programs usually need to execute with the
highest privilege because they must read system files, keep logs, and access protected data. The operating
system will not restrict a server program if it attempts to access a user files. Thus, servers cannot blindly
honour requests from other sites. Instead, each server takes responsibility for enforcing the system access and
protection policies.
Finally, servers must protect themselves against malformed request or against request that will cause the
server program itself to abort. Often it is difficult to foresee potential problems. Once an abort occurs, no
client would be able to access files until a system programmer restarts the server.
Page
30
“Servers are usually more difficult to build than clients because, although they can be implemented with
application programs, server must enforce all the access and protection policies of the computer system
on which they run, and must protect themselves against all possible errors.”
4.6 Communications Middleware Components
The communication middleware software provides the means through which clients and servers
communicate to perform specific actions. It also provides specialized services to the client process that
insulates the front-end applications programmer from the internal working of the database server and
network protocols. In the past, applications programmers had to write code that would directly interface with
specific database language (generally a version of SQL) and the specific network protocol used by the
database server. For example, when writing a front-end application to access an IBM OS/2 database manager
database, the programmer had to write SQL and Net BIOS (Network Protocol) command in the application.
The Net BIOS command would allow the client process to establish a session with the database server, send
specific control information, send the request, and so on. If the same application is to be used with a different
database and network, the application routines must be rewritten for the new database and network protocols.
Clearly, such a condition is undesirable, and this is where middleware comes in handy. The definition of
middleware is based on the intended goals and main functions of this new software category.
Although middleware can be used in different types of scenarios, such as e-mail, fax, or network protocol
translation, most first generation middleware used in Client/Server applications is oriented toward providing
transport data access to several database servers. The use of database middleware yields:
♦ Network independence: by allowing the front-end application to access data without regard to the
network protocols.
♦ Database server independence: by allowing the front-end application to access data from multiple
database servers without having to write code that is specific to each database server.
The use of database middleware, make it possible for the programmer to use the generic SQL sentences to
access different and multiple database servers. The middleware layer isolates the program from the
differences among SQL dialects by transforming the generic SQL sentences into the database server’s
expected syntax. For example, a problem in developing a front-end system for multiple database servers is
that application programmers must have in-depth knowledge of the network communications and the
database access language characteristic of each database to access remote data. The problem is aggravated by
the fact that each DBMS vendor implements its own version of SQL (with difference in syntax, additional
functions, and enhancement with respect to the SQL standard). Furthermore, the data may reside in a nonrelational DBMS (hierarchical, network or flat files) that does not support SQL, thus making it harder for the
programmers to access the data given such cumbersome requirements, programming in Client/Server
systems becomes more difficult than programming in traditional mainframe system. Database middleware
eases the problem of accessing resources of data in multiple networks and releases the program from details
of managing the network communications. To accomplish its functions, the communication middleware
software operates at two levels:
• The physical level deals with the communications between client and server computers (computer to
computer). In other words, it addresses how the computers are physically linked. The physical links include
the network hardware and software. The network software includes the network protocol. Recall that
network protocols are rules that govern how computers must interact with other computers in network, and
they ensure that computers are able to send and receive signal to and from each other. Physically, the
communication middleware is, in most cases, the network. Because the Client/Server model allows the client
and server to reside on the same computer, it may exist without the benefit of a computer network.
• The logical level deals with the communications between client and server. Process (process to process)
that is, with how the client and server process communicates. The logical characteristics are governed by
process-to-process (or interprocess) communications protocols that give the signals meaning and purpose. It
is at this level that most of the Client/Server conversation takes place.
Although the preceding analogy helps us understand the basic Client/Server interactions, it is required to
have a better understanding of computer communication to better understand the flow of data and control
information in a client server environment. To understand the details we will refer to Open System
Interconnection (OSI) network reference model which is an effort to standardize the diverse network
systems. Figure 4.7 depicts the flow of information through each layer of OSI model.
From the figure, we can trace the data flow:
• The client application generates a SQL request.
• The SQL request is sent down to the presentation layer, where it is changed to a format that the SQL server
engine can understand.
Page
31
• Now, the SQL request is handed down to session layer. This layer establishes the connection to the client
processes with the server processes. If the database server requires user verification, the session layer
generates the necessary message to log on and verify the end user. And also this layer will identify which
mesaages are control messages and which are data messages.
• After the session is established and validated, the SQL request is sent to the transport layer. The transport
layer generates some error validation checksums and adds some transport-layer-specific ID information.
• Once the transport layer has performed its functions, the SQL request is handed down to the network layer.
This layer takes the SQL request, identifies the address of the next node in the path, divides the SQL request
into several smaller packets, and adds a sequence number to each packet to ensure that they are assembled in
the correct order.
• Next the packet is handed to the data-link layer. This layer adds more control information, that depends on
the network and on which physical media are used. The data-link layer sends the frame to the next node.
• When the data-link layer determines that it is safe to send a frame, it hands the frame down to the physical
layer, which transmits it into a collection of ones and zeros(bits), and then transmit the bits through the
network cable.
• The signals transmitted by the physical layer are received at the server end at the physical layer, which
passes the data to the data-link layer. The data-link layer reconstructs the bits into frames and validates them.
At this point, the data-link layer of the client and server computer may exchange additional messages to
verify that the data were received correctly and that no retransmission is necessary. The packet is sent up to
the network layer.
• The network layer checks the packet’s destination address. If the final destination is some other node in
network, the network layer identifies it and sends the packet down to the data-link layer for transmission to
that node. If the destination is the current node, the network layer assembles the packets and assigns
appropriate sequence numbers. Next, the network layer generates the SQL request and sends it to the
transport layer.
• Most of the Client/Server “conversation” takes place in the session layer. If the communication between
client and server process is broken, the session layer tries to reestablish the session. The session layer
identifies and validates the request, and sends it to the presentation layer.
• The presentation layer provides additional validation and formatting.
• Finally, the SQL request is sent to the database server or application layer, where it is executed.
Although the OSI framework helps us understand network communications, it functions within a system that
requires considerable infrastructure. The network protocols constitute the core of network infrastructure,
because all data travelling through the network must adhere to some network protocol. In the Client/Server
environment, it is not usual to work with several different network protocols. In the previous section, we
noted that different server processes might support different network protocols to communicate over the
network.
Page
32
For example, when several processes run on the client, each process may be executing a different SQL
request, or each process may access a different database server. The transport layer ID helps the transport
layer identify which data corresponds to which session.
Client
Server
Application
SQL request
Application
Receive & execute SQL
Presentation
Formats SQL request
to server’s native SQL
format
Presentation
Formats SQL
Session
Establish session (conversation
between two processes &
programs)
Session
Validate Session information
Transport
Adds checksum to data,
adds transport layer ID
Transport
Validate data, verifies transport
ID
Network
Assemble Message
Data-link
Validate data frames
Network
Formats data into packets for
transmission
to next node
Data-link
Determines when to transmit
data frames to next node
Physical
Transmits data through
network physical media
Physical
Receive data frames
Bits of data travel through network
Fig.4.7: Flow of Information through the OSI Model
4.7 Architecture For Business Information System
4.7.1 Introduction
Page
33
In this section, we will discuss several patterns for distributing business information systems that are
structured according to a layered architecture. Each distribution pattern cuts the architecture into different
client and server components. All the patterns discussed give an answer to the same question: How do I
distribute a business information system? However, the consequences of applying the patterns are very
different with regards to the forces influencing distributed systems design. Distribution brings a new design
dimension into the architecture of information systems. It offers great opportunities for good systems design,
but also complicates the development of a suitable architecture by introducing a lot of new design aspects
and trap doors compared to a centralized system. While constructing the architecture for a business
information system, which will be deployed across a set of distributed processing units (e.g., machines in a
network, processes on one machine, threads within one process), you are faced with the question:
How do I partition the business information system into a number of client and server components, so
that my users’ functional and non-functional requirements are met?
Page
34
There are several answers to this question. The decision for a particular distribution style is driven by users’
requirements. It significantly influences the software design and requires a very careful analysis of the
functional and non-functional requirements.
4.7.2 Three-Layer Architecture
A Business Information System, in which many (spatially distributed) users work in parallel on a large
amount of data. The system supports distributed business processes, which may span a single department, a
whole enterprise, or even several enterprises. Generally, the system must support more than one type of data
processing, such as On-Line Transaction Processing (OLTP), off-line processing or batch processing.
Typically, the application architecture of the system is a Three-Layer Architecture, illustrated in Fig. 4.8.
The user interface handles presentational tasks and controls the dialogue the application kernel performs the
domain specific business tasks and the database access layer connects the application kernel functions to a
database. Our distribution view focuses on this coarse-grain component level. In developing distributed
system architecture we mainly use the Client/Server Style. Within these model two roles, client and server
classify components of a distributed system. Clients and servers communicate via a simple request/response
protocol.
Fig.4.8: Three-Layer Architecture for Business Information System
4.7.3 General Forces
• Business needs vs. construction complexity: On one hand, allocating functionality and data to the places
where it is actually needed supports distributed business processes very well, but on the other hand,
distribution raises a system’s complexity. Client server systems tend to be far more complex than
conventional host software architectures. To name just a few sources of complexity: GUI, middleware, and
heterogeneous operating system environments. It is clear that it often requires a lot of compromises to reduce
the complexity to a level where it can be handled properly.
Page
35
• Processing style: Different processing styles require different distribution decisions.
Batch applications need processing power close to the data. Interactive processing should be close to
input/output devices. Therefore, off-line and batch processing may conflict with transaction and on-line
processing.
• Distribution vs. performance: We gain performance by distributed processing units executing tasks in
parallel, placing data close to processing, and balancing workload between several servers. But raising the
level of distribution increases the communication overhead, the danger of bottlenecks in the communication
network, and complicates performance analysis and capacity planning. In centralized systems the effects are
much more controllable and the knowledge and experience with the involved hardware and software allows
reliable statements about the reachable performance of a configuration.
• Distribution vs. security: The requirement for secure communications and transactions is essential to
many business domains. In a distributed environment the number of possible security holes increases
because of the greater number of attack points. Therefore, a distributed environment might require new
security architectures, policies and mechanisms.
• Distribution vs. consistency: Abandoning a global state can introduce consistency problems between states
of distributed components. Relying on a single, centralized database system reduces consistency problems,
but legacy systems or organizational structures (off-line processing) can force us to manage distributed data
sources.
• Software distribution cost: The partitioning of system layers into client and server processes enables
distribution of the processes within the network, but the more software we distribute the higher the
distribution, configuration management, and installation cost. The lowest software distribution and
installation cost will occur in a centralized system. This force can even impair functionality if the software
distribution problem is so big that the capacities needed exceed the capacities of your network. The most
important argument for so called diskless, Internet based network computers is exactly software distribution
and configuration management cost.
• Reusability vs. performance vs. complexity: Placing functionality on a server enforces code reuse and
reduces client code size, but data must be shipped to the server and the server must enable the handling of
requests by multiple clients.
4.7.4 Distribution Pattern
To distribute an information system by assigning client and server roles to the components of the layered
architecture we have the choice of several distribution styles. Figure 4.9 shows the styles, which build the
pattern language. To take a glance at the pattern language we give an abstract for each pattern:
• Distributed presentation: This pattern partitions the system within the presentation component. One part of
the presentation component is packaged as a distribution unit and is processed separately from the other part
of the presentation, which can be packaged together with the other application layers. This pattern allows of
an easy implementation and very thin clients. Host systems with 3270-terminals is a classical example for
this approach. Network computers, Internet and intranet technology are modern environments where this
pattern can be applied as well.
• Remote user interface: Instead of distributing presentation functionality the whole user interface becomes a
unit of distribution and acts as a client of the application kernel on the server side.
Page
36
• Distributed application kernel: The pattern splits the application kernel into two parts which are processed
separately. This pattern becomes very challenging if transactions span process boundaries (distributed
transaction processing).
Fig.4.9: Pattern Resulting from Different Client/Server Cuts
• Remote database: The database is a major component of a business information system with special
requirements on the execution environment. Sometimes, several applications work on the same database.
This pattern locates the database component on a separate node within the system’s network.
• Distributed database: The database is decomposed into separate database components, which interact by
means of interprocess communication facilities. With a distributed database an application can integrate data
from different database systems or data can be stored more closely to the location where it is processed.
4.8 Existing Client/Server Architecture
Page
37
4.8.1 Mainframe-based Environment
In mainframe systems all the processing takes place on the mainframe and usually dumb terminals that are
known as end user platform are used to display the data on screens.
Mainframes systems are highly centralized known to be integrated systems. Where
dumb terminals do not have any autonomy. Mainframe systems have very limited data manipulation
capabilities. From the application development point of view. Mainframe systems are over structured, timeconsuming and create application backlogs. Various computer applications were implemented on mainframe
computers (from IBM and others), with lots of attached (dumb, or semi-intelligent) terminals see the Fig.
4.10.
Fig.4:10: Mainframe-based Environment
There are some major problems with this approach:
D Very inflexible.
D Mainframe system are very inflexible.
D Vendor lock-in was very expensive.
D Centralized DP department was unable to keep up with the demand for new applications.
4.8.2 LAN-based Environment
Page
38
LAN can be configured as a Client/Server LAN in which one or more stations called servers give services to
other stations, called clients. The server version of network operating system is installed on the server or
servers; the client version of the network operating system is installed on clients. A LAN may have a general
server or several dedicated servers. A network may have several servers; each dedicated to a particular task
for example database servers, print servers, and file servers, mail server. Each server in the Client/Server
based LAN environment provides a set of shared user services to the clients. These servers enable many
clients to share access to the same resources and enable the use of high performance computer systems to
manage the resources.
A file server allows the client to access shared data stored on the disk connected to the
file server. When a user needs data, it access the server, which then sends a copy, a print server allows
different clients to share a printer. Each client can send data to be printed to the print server, which then
spools and print them. In this environment, the file server station server runs a server file access program, a
mail server station runs a server mail handling program, and a print server station a server print handling
program, or a client print program.
Users, applications and resources are distributed in response to business requirements and linked by single
Local Area Networks. See the Fig. 4.11 illustrated below:
Fig.4.11: LAN Environment
4.8.3 Internet-based Environment
Page
39
What the Internet brings to the table is a new platform, interface, and architectures. The Internet can employ
existing Client/Server applications as true Internet applications, and integrate applications in the Web
browser that would not normally work and play well together. The Internet also means that the vast amount
of information becomes available from the same application environment and the interface. That’s the value.
See the Fig. 4.12 given below:
Fig.4.12: Internet-based Environment
The internet also puts fat client developers on a diet. Since most internet applications are driven from the
Web server, the application processing is moving off the client and back onto the server. This means that
maintenance and application deployment become much easier, and developers don’t have to deal with the
integration hassles of traditional Client/Server (such as loading assorted middleware and protocol stacks).
The web browsers are universal clients. A web browser is a minimalist client that interprets information it
receives from a server, and displays it graphically to a user. The client is simply here to interpret the server’s
command and render the contents of an HTML page to the user. Web browsers-like those from Netscape and
Spyglass – are primarily interpreters of HTML commands. The browser executes the HTML commands to
properly display text and images on a specific GUI platform; it also navigates from one page to another using
the embedded hypertext links. HTTP server produce platform independent content that clients can then
request. A server does not know a PC client from a Mac client – all web clients are created equal in the eyes
of their web server. Browsers are there to take care of all the platform-specific details.
Page
40
At first, the Web was viewed as a method of publishing information in an attractive format that could be
accessed from any computer on the internet. But the newest generation of the Web includes programmable
clients, using such programming environments as Sun Microsystem’s Java and Microsoft’s ActiveX. With
these programming environments, the Web has become a viable and compelling platform for developing
Client/Server applications on the Internet, and also platform of choice for Client/Server computing. The
World Wide Web (WWW) information system is an excellent example of client server “done right”. A
server system supplies multimedia documents (pages), and runs some application programs (HTML forms
and CGI programs, for example) on behalf of the client. The client takes complete responsibility for
displaying the hypertext document, and for the user’s response to it. Whilst the majority of “real world”
(i.e., commercial) applications of Client/ Server are in database applications.
Download