Distributed Computing Why Networks? Network connectivity is increasing. Availability of powerful yet cheap microprocessors (PCs, workstations, PDAs, embedded systems, etc.) Continuing advances in communication technology Development of memory/storage denser and cheaper Distributed Computing Systems A collection of independent computers that appears to its users as a single coherent system. Internet Distributed Computing Systems A collection of (perhaps) heterogeneous nodes connected by one or more interconnection networks which provides access to system-wide shared resources and services. It is basically a collection of interconnected processors covering wide geographical area in which each processor has its own local memory and other peripherals. The communication between any two processor takes place by message passing over communication network. Distributed Computing Systems A type of computing in which different components and objects comprising an application can be located on different computers connected to a network. So, for example, a word processing application might consist of an editor component on one computer, a spell-checker object on a second computer, and a thesaurus on a third computer. In some distributed computing systems, each of the three computers could even be running a different operating system. The data used in a distributed processing environment is also distributed across platforms. Centralized Multi-user System Mainframe or Minicomputer Network Problems: Single point of failure Difficult to expand Distributed Systems Heterogeneous type of computers Network Servers and databases Distributed vs. Centralized Systems Why distribute? – Information sharing among distributed users (groupware) – Resource sharing – Shorter response time & higher throughput – Flexibility to spread load – Incremental growth – Extensibility – Better Cost/performance ratio – Higher Reliability/ Availability – higher fault tolerance – Inherently distributed application – Flexibility in meeting user’s requirements Disadvantages More Software Components – The more software components that comprise a system the greater chance of errors occurring. Security – Providing easy distributed access increases the risk of a security breach occurring. Networking – The underlying network can saturate or cause other problems. Hardware Considerations Architecture of interconnected multiple processors are of two types: Tightly Coupled System – – – – Single system wide primary memory Communication takes place through shared memory. Systems are limited by bandwidth of shared memory. It is also know as Parallel Processing Systems. Loosely Coupled System – Each processor has its own local memory. – It can have unlimited number of processors. – Communication is done by passing message across the network. – It is known as Distributed Computing System. Parallel vs. Distributed Architecture Parallel vs. Distributed Systems Parallel Systems Distributed Systems Memory Tightly coupled shared Distributed memory memory Message passing, RPC, and/or use of distributed shared memory Control Global clock control Processor interconnect ion Order of Tbps Order of Gbps Bus, mesh, tree, mesh of Ethernet(bus), tree, and hypercube and SCI (ring) network Main focus Performance - Scientific Performance - cost and computing scalability, Reliability, Information/resource sharing No global clock control. Synchronization algorithms needed token ring Distributed Operating System Operating systems used for distributed computing systems can be of two types– Network Operating System – Distributed Operating System NOS vs. DOS Features Single System Image Autonomy Fault Tolerance Network OS Distributed OS NO. User are aware of the YES. Provides virtual fact that multiple computers uniprocessor image to the user. are being used. DOS dynamically and Selection of machine for automatically allocates jobs to executing a job is manual. various machines. High. Local OS at each Low. A single system-wide computer and communicate OS via common communication protocol. Shared file system. Processes & resources Each computer functions managed globally independently & manages its own processes & resources. Single set of globally valid System calls for different system calls computers may be different Unavailability grows faulty machines increase. as Unavailability remains little even if faulty machines increase. Network Operating System Distributed Operating Systems Evolution of DCS Batch Processing System – Batching together jobs with similar needs. – Automatic sequencing of jobs with control cards. – Off-line processing (By using buffering, spooling) – User does not directly interact with the computer system. Disadvantage – Less user interaction. – No sharing of resources – Job set up time still significant for new batch of jobs. – CPU remains idle during transition. – Speed mismatch (CPU & I/O dev) Time-Sharing Systems – Several dumb terminals are attached to main computer. – Multiple user could now simultaneously execute interactive jobs and share the resources. – The CPU is multiplexed among several jobs that are kept in memory . • Advantages – Reduces CPU idle time. – Avoid duplication of software • Disadvantages – Increased overhead. (Time to swap page in and out) – Terminals could not be placed far from main computer. • Due to advancement in networking technologies LAN and WAN came into existence – lead to evolution of Distributed Computing. Distributed Computing System Models Minicomputer Model Minicomputer Minicomputer ARPA net Users Minicomputer Extension of Time sharing system – User must log on to his home minicomputer. – Thereafter, he can log on to a remote machine by telnet. – Does not reflect uniprocessor image. Used basically for Resource performance devices) sharing (Database, High- Workstation Model Workstation Workstation Com. Network Workstation Workstation Workstation Workstation – A powerful, single-user computer, like a personal computer, but has a more powerful microprocessor. Each has its own local disk and a local file system – diskful workstation. Process migration – User first logs on to his/her personal workstation. – If there are idle remote workstations, may migrate one or more processes to one of them. – Result of execution migrated back to user’s workstation. Issues to be resolved: – How to find an idle workstation – How to migrate a job – What if a user logs on to the remote machine executing process of another machine – run two processes simultaneously, kill remote process, migrate process back to its home workstation ? Examples – Sprite System, Xerox PARC Workstation-Server Model Workstation Workstation Workstation 100Gbps LAN MiniComputer file server MiniComputer db server MiniComputer print server Client workstations – Largely Diskless – Local disk of diskful workstation used for storage of temporary files etc. Server minicomputers – Each minicomputer is dedicated to one or more different types of services, for managing & providing access to shared resources. – Multiple servers used for a service for better scalability and higher reliability. User logs on to his machine. Normal computation activities carried at home workstation but services provided by special servers. No process migration involved. Advantages – Cheaper – few minicomputers vs. large no. of diskful workstations – Backup and hardware maintenance easier – Flexibility to access files from any file server – No process migration – Guaranteed response time Disadvantage – Does not exploit idle workstations Client-Server model of communication – RPC (Remote Procedure Call) – RMI (Remote Method Invocation) Example: V system Processor-Pool Model Terminals 100Gbps LAN Run Server Ser1 Server N Pool of processors Processors (microcomputers and minicomputers ) are pooled together to be shared by the users as needed. Each processor has its own memory to load and run a system program or an application program of the DCS. Clients: – They log on to one of the terminals (diskless workstations or graphic terminals) – All services are dispatched to servers. Servers: – Necessary number of processors are allocated to each user from the pool by run server No concept of home machine. User logs on to system as whole. Better utilization of processing power but less interactivity Greater flexibility – processors can act as extra servers Unsuitable for high performance interactive application as communication is slow between processor & terminal Example – Amoeba, Cambridge Distributed System Hybrid Model Combines advantages of both the workstation – server and processor - pool model Based on workstation – server model with additional pool of processors The processor in the pool can perform large computations WS-server model can perform user interactive jobs. Hybrid model is more expensive to implement. Issues in Distributed Computing System Transparency How to achieve the single-system image, i.e., how to make a collection of computers appear as a single computer. Transparency in a Distributed System Access Transparency – Hide differences in data representation & how a resource( local or global) is accessed. Use global set of system calls & global resource naming facility ( ex. URL). Location Transparency – Hides where a resource is located – Name transparency – Name of resource should not reveal its physical location. Resource names must be unique system wide. – User Mobility – User should be able to freely log on to any machine in the system and access a resource with the same name. Replication Transparency – Naming of replicas – map user supplied name of resource to appropriate replica. – Replication control – how where when Failure Transparency – Partial failure transparency – Complete failure transparency Migration Transparency – Movement of object is handled automatically by system & following issues are taken care of – • Migration decision made automatically by system. • Name of resource remains same on migration from one node to another • IPC ensures proper receipt of message by process, even if it further migrates. Concurrency Transparency – Hide that a resource may be shared by several competitive users. It is achieved by • Event ordering property • Mutual exclusion property • No starvation property • No deadlock property Performance Transparency – System is automatically reconfigured as per load varying in the system. Scaling Transparency – System can expand in scale without disrupting activities of users Persistence Transparency – Hides whether a (software) resource is in memory or on disk Reliability Faults – Fail stop : system stops functioning – Byzantine failure : system produces wrong result Fault avoidance – Occurrence of faults is minimized by making components more reliable Fault tolerance – Redundancy techniques • K-fault tolerance needs K + 1 replicas • K-Byzantine failures need 2K + 1 replicas. – Distributed control • Avoiding single point of failure Fault detection and recovery – Atomic transaction – Stateless servers – Ack & timeout based retransmissions of messages Flexibility Ease of modification Ease of enhancement Choosing appropriate kernel – Monolithic kernel : Kernel where the entire operating system is working in the kernel space and alone as supervisor mode. – Micro kernel : Kernel is reduced to contain minimal facilities necessary, and the other system services reside in user space in form of normal processes (as so called servers). Because the servers do not run in kernel space anymore, so called "context switches" are needed, to allow user processes to enter privileged mode (and to exit again). Monolithic kernel vs. Micro kernel Performance Various performance metrics: – response time – throughput – system utilization – network capacity utilization Design issues to increase performance – Batch if possible – Cache whenever possible – Minimize copying of data – Minimize network traffic – Fine grained parallelism ( involve large no. of small computations but more interaction) vs. coarse grained parallelism ( involve large computations, low interaction rates & little data) Scalability Capability of a system to adapt to increased service load. – Avoid centralized entities – Avoid centralized algorithms – Perform most operations on client workstations Geographical scalability also difficult. LAN usually based on synchronous communication. WAN – inherently unreliable. Difficult to scale system across multiple, independent administrative domains. Concept Example Centralized services A single server for all users Centralized data A single on-line telephone book Centralized algorithms Doing routing based on complete information Heterogeneity Caused by interconnected sets of dissimilar hardware or software systems (Ex: different topologies, protocols, word lengths etc) Data and instruction formats depend on each machine architecture If a system consists of K different machine types, we need K–1 translation softwares, at sender/receiver. Use intermediate standard data format. Security Lack of a single point of control & use of insecure networks for data communication Security concerns: – Messages may be stolen, plagiarized(copied and passed off as your own) or changed by an intruder. – Message received by intended receiver & sent by genuine sender. Cryptography used for security. Emulation of Existing OS Middleware Middleware is an additional layer of software that is used in NOS to more or less hide the heterogeneity of the collection of underlying platforms but also to improve distribution transparency. It offers a higher level of abstraction. It is placed in the middle between applications & NOS. Distributed System as Middleware Distributed Computing Environment (DCE) • It is an integrated set of services and tools that can be installed as a coherent environment on top of existing OS and serve as a platform for building and running distributed application. • It runs on many different kinds of computers , OS , and network produced by different vendors. • It hides differences between machines by automatically performing data-type conversions, thus making heterogeneous nature of system transparent to application programmers. DCE applications DCE software Operating system and networking DCE Component Component Functionality Thread package Used in concurrent applications. RPC facility Necessary to build client-server applications. Forms basis for communication. Distributed Time Service (DTS) Synchronizes clocks of all computers in the system Name Services Allows resources to be uniquely named & accessed in location transparent manner Security Services Provides tools for authentication & authorization. Distributed File Service (DFS) Provides systemwide file system DCE Cells • A cell is a group of user, machines or resources that have a common purpose and share common DCE services. • It helps to break down large system into smaller, manageable units. The minimum cell configuration requires: • Cell directory server • A security server • Distributed time server and • One or more client machine Factors for deciding cell boundaries: – Purpose • Users working on a common goal should be put in same cell. – Administration • Machines known & manageable by an administrator put in one cell. – Security • Users of machines that trust each other – Overhead • Avoid communication overhead by putting users that interact more in the same cell. 1. Suppose a component of a distributed system suddenly crashes. How will this cause inconvenience to the users when one of the following happens: • The system uses processor-pool model and crashed component is a processor in the pool. • In processor-pool model , a user terminal crashes. • The system uses a workstation-server model and server crashes. • In the workstation-server model , one of the client crashes.