ICS 2403 DISTRIBUTED SYSTEMS ICS 2403 DISTRIBUTED SYSTEMS (45 CONTACT HOURS) Pre-requisite BIT 2108 Computer Networks Course Purpose Students will examine the principles, techniques, and practices relevant to the design and implementation of distributed systems through hands-on experience. Learning Outcomes By the end this course, the student should be able to: i. Discuss concurrency, independent failure of components, lack of a global clock ii. Associate distributed systems in a realistic context through examples: Internet, intranet, mobile computing iii. Motivate the benefits of resource sharing and discuss Web challenges including heterogeneity, openness, security, scalability, failure handling, concurrency, transparency iv. Use the acquired knowledge to develop a simple client-server application. Course Description Overview of distributed computing; computational models, communication complexity, design and analysis of distributed algorithms and protocols, fault-tolerant protocols, synchronous computations. Applications such as communication in data networks, control in distributed system such as election, and distributed mutual exclusion, manipulation of distributed data such as ranking. Java remote method innovation (RMI), remote procedure call (RPC), common object request brokerage architecture (CORBA). Teaching Methodology Lectures, laboratory exercises, assignments and a class project Instructional Materials LCD projectors, computers, white boards, appropriate software Course Assessment 30% Continuous Assessment (Tests 10%, Assignment 10%, Practical 10%) 70% End of Semester Examination. Course Text Books 1. Andrew Tanenbaum (2002). Distributed Systems, Prentice-Hall, ISBN 456-7755 BY MASESE Page 1 of 30 ICS 2403 DISTRIBUTED SYSTEMS 2. M.L. Liu, Pearson (2004). Distributed Computing: Principles and Applications by Addison-Wesley, ISBN 456-67738438 3. Alan C. Shaw, Lubomir F. Bic (2002). Operating Systems Principles. Prentice Hall. ISBN: 0130266116. Reference Text Books 1. Andrew S. Tanenbaum (1994). Distributed Operating Systems, Prentice-Hall, 1994, ISBN:456-88594 2. Sape J. Mullender (1993). Distributed Systems, 2nd Edition, ACM Press, ISBN: 043077585 3. Charles Crowley (1996). Operating Systems: A Design-Oriented Approach. Irwin Professional Publishing. ISBN: 0256151512 Course Journals 1. Acta Informatica ISSN 0001-5903 2. Advances in Computational Mathematics ISSN 1019-7168 3. Advances in data Analysis and Classification ISSN1 1862-5347 4. Annals Of software Engineering ISSN 1022-7091 Reference Journals 1. Journal of computer science and Technology ISSN 1000-9000 2. Journal of Science and Technology ISSN 1860-4749 3. Central European Journal Of Computer Science ISSN 1896-1533 4. Cluster computing ISSN 1386-7857 BY MASESE Page 2 of 30 ICS 2403 DISTRIBUTED SYSTEMS COURSE UNIT: DISTRIBUTED SYSTEM CODE: ICS 2403 Lecturer: MASESE CHUMA- CONTACT: 0782526000 Objective By the end of this session, students will be able to: i. discuss the concepts of distributed system ii. what is a distributed system? iii. what is a centralized system? iv. characteristics of centralized systems v. centralized vs. distributed systems vi. advantages of distributed systems over centralized systems vii. characteristics of a distributed system viii. advantages of distributed system ix. disadvantages of distributed system x. challenges in the design of distributed system BY MASESE Page 3 of 30 ICS 2403 DISTRIBUTED SYSTEMS INTRODUCTION The process of computation was started from working on a single processor. This uniprocessor computing can be termed as centralized computing. As the demand for the increased processing capability grew high, multiprocessor systems came to existence. The advent of multiprocessor systems, led to the development of distributed systems with high degree of scalability and resource sharing. The modern-day parallel computing is a subset of distributed computing WHAT IS A DISTRIBUTED SYSTEM? A distributed system is a collection of computer programs that utilize computational resources across multiple, separate computation nodes to achieve a common, shared goal. Distributed systems aim to remove bottlenecks or central points of failure from a system. A distributed system is A collection of independent computers that appears to its users as a single coherent system [Tanenbaum and van Steen, 2007] A distributed system is a collection of autonomous computational entities conceived as a single coherent system by its designer A system in which hardware or software components located at net-worked computers communicate and coordinate their actions only by message passing. [Coulouris] A distributed system is a collection of computer programs that utilize computational resources across multiple, separate computation nodes to achieve a common, shared goal. Also known as distributed computing or distributed databases, it relies on separate nodes to communicate and synchronize over a common network. These nodes typically represent separate physical hardware devices but can also represent separate software processes, or other recursive encapsulated systems. Distributed systems aim to remove bottlenecks or central points of failure from a system. A distributed system is A collection of independent computers that appears to its users as a single coherent system. The definition of distributed systems deals with two aspects that: ➢Deals with hardware: The machines linked in a distributed system are autonomous. BY MASESE Page 4 of 30 ICS 2403 DISTRIBUTED SYSTEMS ➢Deals with software: A distributed system gives an impression to the users that they are dealing with a single system. Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal. A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs. Distributed computing also refers to the use of distributed systems to solve computational problems. In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers. A distributed system is a collection of independent computers, interconnected via a network, capable of collaborating on a task. Distributed computing is computing performed in a distributed system. Distributed computing has become increasingly common due advances that have made both machines and networks cheaper and faster. Some examples of distributed systems: ♦ Local Area Network and Intranet ♦ Database Management System ♦ Automatic Teller Machine Network ♦ Internet/World-Wide Web ♦ Mobile and Ubiquitous Computing Motivation The following are the key points that acts as a driving force behind DS: • Inherently distributed computations: DS can process the computations at geographically remote locations. • Resource sharing: The hardware, databases, special libraries can be shared between systems without owning a dedicated copy or a replica. This is cost effective and reliable. • Access to geographically remote data and resources: As mentioned previously, computations may happen at remote locations. Resources such as centralized server scan also be accessed from distant locations. BY MASESE Page 5 of 30 ICS 2403 DISTRIBUTED SYSTEMS • Enhanced reliability: DS provides enhanced reliability, since they run on multiple copies of resources. The distribution of resources at distant locations makes them less susceptible for faults. The term reliability comprises of: 1.Availability: the resource/ service provided by the resource should be accessible at all times 2.Integrity: the value/state of the resource should be correct and consistent. 3.Fault-Tolerance: the ability to recover from system failures • Increased performance/cost ratio: The resource sharing and remote access features of DS naturally increase the performance / cost ratio. • Scalable: The number of systems operating in a distributed environment can be increased as the demand increases. 2. Centralized vs. Distributed Computing WHAT IS A CENTRALIZED SYSTEM? A centralized system is a type of system where all the important tasks like processing data, storing information, and making decisions are done by a single main computer or server. This means that there is one central place that controls and manages all the resources and important choices for the whole system. In such systems, all resources, data, and functionalities are managed and controlled from this central point. BY MASESE Page 6 of 30 ICS 2403 DISTRIBUTED SYSTEMS CHARACTERISTICS OF CENTRALIZED SYSTEMS • Single Point of Control: In a centralized system, there is a single point of control and authority. This central entity typically makes all decisions and manages all resources. • Centralized Data Management: All data and resources are stored and managed centrally. This means that all data processing, storage, and retrieval activities occur within the central system. • Hierarchical Structure: Centralized systems often have a hierarchical structure, with lower-level nodes or entities reporting to and receiving instructions from the central authority. • Communication Flow: Communication within a centralized system typically flows from peripheral nodes or entities to the central node. • Simplicity in Management: Centralized systems are relatively simpler to manage and administer since all control and decision-making are centralized. This can lead to efficient coordination and streamlined operations. For Example: Many businesses operate with centralized IT infrastructures where data centers or servers centrally manage resources such as file storage, application hosting, and network services. Use Cases of Centralized Systems • Small Office Network: Many offices use one main computer to run things. This main computer stores files for all workers. It also helps computers access the network. The main computer checks workers are who they say. Using one main computer makes it simpler to manage everything. It also allows all workers to use things the same way. • Traditional Client-Server Architecture: A lot of older programs like email, websites, and databases work one way. Clients talk to one main server to get what they need. This setup has a center. Computers connect to the main spot to get services or info. • Standalone Applications: Apps running on one machine do everything locally. They process and store things without needing other machines. This is a centralized system. All the work happens on the single machine you are using. Centralized vs. Distributed Systems Below are the difference between Centralized and Distributed System: BY MASESE Page 7 of 30 ICS 2403 DISTRIBUTED SYSTEMS Aspect Centralized System Distributed System Control Centralized control and authority Decentralized control and authority Resource Management All resources managed centrally Communication flows to central Communication Fault Tolerance node scalability nodes Direct communication between nodes Redundancy, less vulnerable to single Single point of failure Limited Resources distributed across multiple points of failure due to Highly scalable, new nodes can be added Scalability centralization easily Complexity Relatively simpler to manage More complex to manage ADVANTAGES OF DISTRIBUTED SYSTEMS OVER CENTRALIZED SYSTEMS 1. RELIABILITY: If one machine crashes, the system as a whole can still survive. 2. SPEED: A distributed system may have more total computing power than a mainframe 3. OPEN SYSTEM: Since it is an open system it is always ready to communicate with other systems. An open system that scales has an advantage over a perfectly closed and self-contained system. BY MASESE Page 8 of 30 ICS 2403 DISTRIBUTED SYSTEMS 4. ECONOMIC: Collection of microprocessors offers a better price or performance than mainframes. 5. INCREMENTAL GROWTH: Computing power can be added in small increments. Advantages of Distributed Systems over Centralized System • Economics: a collection of microprocessors offers a better price/performance than mainframes. Low price/performance ratio: cost effective way to increase computing power. • Speed: a distributed system may have more total computing power than a mainframe. • Inherent distribution: Some applications are inherently distributed. Ex. a supermarket chain. • Reliability: If one machine crashes, the system as a whole can still survive. Higher availability and improved reliability. • Incremental growth: Computing power can be added in small increments. Modular expandability • Another deriving force: the existence of large number of personal computers, the need for people to collaborate and share information. Why do we use distributed systems? The alternative to using a distributed system is to have a huge centralized system, such as a mainframe. For many applications there are a number of economic and technical reasons that make distributed systems much more attractive than their centralized counterparts. Cost. Better price/performance as long as commodity hardware is used for the component computers. Performance. By using the combined processing and storage capacity of many nodes, performance levels can be reached that are beyond the range of centralized machines. Scalability. Resources such as processing and storage capacity can be increased incrementally. Reliability. By having redundant components, the impact of hardware and software faults on users can be reduced. Inherent distribution. Some applications, such as email and the Web (where users are spread out over the whole world), are naturally distributed. This includes cases where users are geographically dispersed as well as when single resources (e.g., printers, data) need to be shared. CHARACTERISTICS OF A DISTRIBUTED SYSTEM BY MASESE Page 9 of 30 ICS 2403 DISTRIBUTED SYSTEMS 1. Concurrency: In a distributed system, multiple nodes can carry out operations simultaneously, enabling parallel processing and better performance. 2. High Scalability: Distributed systems may scale horizontally by adding more computers to the network and can support a high number of nodes. This enables them to handle rising consumer expectations and accept increasing workloads. 3. Fault-tolerance: Distributed systems are built to be fault-tolerant. The system can continue to function even if one or more nodes go down by shifting the workload to the nodes that are still up and running. 4. Transparency: By hiding the underlying architectural complexity of the system, distributed systems seek to provide users and applications with transparency. This covers the location, handling of failures, and transparency in resource access. 5. Heterogeneity: Nodes in distributed systems frequently have various hardware and software configurations. They might employ various programming languages, run on various operating systems, or have differing processing speeds. To ensure interoperability and coordination across the nodes, managing this heterogeneity is a challenge. 6. Consistency and synchronization: A major challenge is ensuring data consistency and state synchronization among distant nodes. In order to handle concurrent updates and ensure data consistency, distributed systems use a variety of mechanisms, including distributed algorithms, consensus protocols, and distributed transactions. 7. Security and Privacy: Authentication, access control, data integrity, and confidentiality are security issues that must be addressed in distributed systems. In designing a distributed system, it is crucial to ensure safe communication and to preserve sensitive data. Benefits of distributed systems Distributed systems offer a number of advantages over monolithic, or single, systems: • Scalability & flexibility. It is easier to add computing power as the need for services grows. In most cases today, you can spin up servers to a distributed system on the fly, increasing performance and further reducing time to completion. • Fault tolerance. Distributed systems reduce the risks involved with having a single point of failure, bolstering reliability and fault tolerance. BY MASESE Page 10 of 30 ICS 2403 DISTRIBUTED SYSTEMS • Reliability. A well-designed distributed system can withstand failures in one or more of its nodes without severely impacting performance. In a monolithic system, the entire application goes down if the server goes down. • Speed. Heavy traffic can bog down single servers when traffic gets heavy, impacting performance for everyone. The scalability of distributed databases and other distributed systems makes them easier to maintain and also sustain high-performance levels. • Geo-distribution. Distributed content delivery is both intuitive for any internet user, and vital for global organizations. Advantages of Distributed System 1. Improved Performance: Parallel processing, in which tasks are split up and carried out simultaneously by several nodes, is possible with distributed systems. Comparatively to a single, centralized system, this results in quicker execution times and better performance. The task is distributed over several nodes, which allows for optimal resource use and can accommodate heavier workloads or higher user demands. 2. Load Balancing: Load balancing strategies can be used in distributed systems to divide the workload among nodes in a fair manner. As a result, resources are used to their full potential and performance is enhanced. This prevents any one node from becoming overburdened with work. The effective scalability of resources as needed is made possible through load balancing, which also helps prevent bottlenecks. 3. Data Replication and Data Locality: Replication of data across several nodes is a common technique used in distributed systems. This raises data availability and lowers the possibility of data loss or non-availability as a result of node failures. Data can also be kept nearby nodes or users who access it frequently, lowering network latency and enhancing system performance. 4. Redundancy and Disaster Recovery: Redundancy and disaster recovery capabilities can be offered by distributed systems. The system is better able to recover from errors or disasters when data and tasks are replicated. By ensuring that there are backup resources or nodes accessible in the event of failures, redundancy helps to reduce downtime and data loss. BY MASESE Page 11 of 30 ICS 2403 DISTRIBUTED SYSTEMS 5. Flexibility and Modularity: Distributed systems allow for freedom in design and modularity. It is possible to build the system up of microservices or loosely linked components, which makes it simpler to create, deploy, and manage. This modular design encourages flexibility in system architecture and evolution and enables independent component scalability. This flexibility of distributed system will help in providing better user experience and helps in processing the user requests faster. 6. Geographic Distribution and Reduced Latency: Data and services can be placed closer to end consumers because of distributed systems' ability to span several different geographic regions. The system can lower latency and speed up reaction times by putting nodes in various areas. Services like content delivery networks (CDNs) or real-time applications that demand low-latency interactions will particularly benefit from this. 7. Resource sharing: Distributed systems allow multiple users and programmers to share resources. Computing resources, such as processing power, memory, and storage, can be efficiently utilized and shared across the system, resulting in resource allocation optimization. 8. Flexibility and extensibility: Distributed systems allow for the addition or removal of nodes without affecting the overall system. This enables easy scaling and adaptation to changing requirements and workloads. 9. Increased data availability: Distributed systems can replicate and spread data across numerous nodes, boosting data availability and accessibility. Even if specific nodes are inaccessible, data can still be accessed from other nodes. 10. Collaboration and coordination: Multiple people or entities can collaborate and coordinate using distributed systems. They serve as a platform for sharing resources, communicating, and synchronizing tasks, facilitating effective teamwork. 11. Improved fault isolation: Failures or faults in one component or node can be isolated and restricted in a distributed system, preventing them from affecting the entire system. This enhances system stability and decreases the impact of failures. 12. Enhanced security: Distributed systems provide increased security capabilities by utilizing distributed security techniques. By disseminating data and processing, it becomes more difficult for unauthorized entities to compromise the entire system. BY MASESE Page 12 of 30 ICS 2403 DISTRIBUTED SYSTEMS 13. Easier software development: Distributed systems encourage modular and decentralized software development. Developers can work on independent components or services that can be easily merged into the larger system. This increases development productivity and makes system maintenance and updates easier. 14. Increased reliability: Distributed systems are less prone to full failures or data loss when data is duplicated across numerous nodes. Even if one node fails, the system can still function with the remaining nodes. Disadvantages of Distributed System 1. Increased communication overhead: Distributed systems often demand frequent communication and coordination among nodes. This communication cost might degrade system performance and deplete network bandwidth. 2. Higher latency: The distributed design of the system adds extra communication costs, which might result in higher latency as compared to centralized solutions. Network delays and message forwarding can all have an impact on the system's total reaction time. 3. Increased development and maintenance complexity: In comparison to centralized systems, developing and maintaining distributed systems can be more difficult and timeconsuming. Coordination and synchronization of activities across numerous nodes, as well as resolving failure scenarios, necessitate extra work and knowledge. 4. Network dependency: Data interchange and coordination in distributed systems are primarily reliant on network connectivity. Network failures or latency issues can have a substantial influence on the system's performance and availability. 5. Cost and complexity of infrastructure: The networking hardware, servers, and storage required for distributed systems can be expensive and difficult to set up and manage. 6. Debugging and troubleshooting: Comparatively to a centralized system, locating and fixing problems in a distributed system can be more difficult. Advanced monitoring and diagnostic technologies are necessary to resolve issues or performance bottlenecks affecting several nodes 7. Scalability limitations: Although distributed systems are very scalable, there may be some restrictions depending on the system's design and architecture. There can be scalability bottlenecks in some apps or components that are difficult to get around. BY MASESE Page 13 of 30 ICS 2403 DISTRIBUTED SYSTEMS 8. Software compatibility: Multiple software components frequently operate on various nodes in distributed systems. It might be difficult to ensure compatibility and easy integration between these components, especially if they were created by various teams or organizations. 9. Security risks: Compared to centralized systems, distributed systems face more security threats. It can be more difficult and vulnerable to flaws to manage access control, authentication, and data secrecy across several nodes. 10. Consistency and data integrity: It can be difficult to guarantee consistency and data integrity among distributed nodes. It takes careful planning and implementation of techniques, such as distributed transactions or consensus protocols, to achieve global consistency in a distributed system. 11. Dependency on network stability: A reliable network infrastructure is crucial for distributed systems. System availability may be decreased or even rendered completely unavailable as a result of network failures or disturbances. 12. Complexity of failure handling: In a distributed system, handling errors can be challenging. Robust fault-tolerance methods and careful design are needed for failure scenarios to detect faults, start recovery mechanisms, and preserve consistency among nodes. 13. Lack of global view: It is difficult to monitor and manage distributed systems because they lack a centralized global view of the entire system. Decentralized monitoring and management solutions are necessary for administrators. CHALLENGES IN THE DESIGN OF DISTRIBUTED SYSTEM The following are the challenges being faced while designing distributed system. 1. Heterogeneity: It’s underlying network infrastructure, computer hardware and software (e.g. Operating systems), programming languages (in particular data representation). 2. Openness ❖ Ensuring extensibility and maintainability of the systems ❖ Adherence to standard interface 3. Security ❖ Privacy BY MASESE Page 14 of 30 ICS 2403 DISTRIBUTED SYSTEMS ❖ Authentication ❖ Availability 4. Scalability ❖ Handling increasing number of files and users ❖ Growth of storage space. 5. Handling of failures ❖ Detection (may be impossible) ❖ Exception handling (e.g. time-outs when waiting for a web resource) ❖ Redundancy of data storage ❖ Redundant routes in network ❖ Replication of name tables in multiple domain name servers 6. Concurrency ❖ Consistent scheduling of concurrent threads (so that dependencies are preserved e.g. in ❖ concurrent transitions) ❖ Avoidance of dead and life lock problems. 7. Transparency: concealing the heterogeneous and distributed nature of the system so that it appears to the user like one system. Resource sharing and the web challenges Resources may be shared either in the form of printer, scanner, machine and so on. Terms used in the web A. Services: it is a distinct of a computer system that manages a collection of related resources and present functionality to users. For instance, we can access the shared file service to send document through the printing service. B. Server: it means a running program on a networked computer that accepts request from program running on other computer to perform a service and respond appropriately. WWW (World Wide Web) It is an evolving system for publishing and accessing resources and service across the internet. Among the web browsers are Mozilla, fire fox, internet explorer, etc. and are used to retrieve and view documents of many types, view video streams and so on. Properties of WWW (World Wide Web) 1. It is an open system and it can be extended and implemented in new ways without distributing BY MASESE Page 15 of 30 ICS 2403 DISTRIBUTED SYSTEMS its existing functionality. 2. The web is open with respect to the type of resources that can be published and shared on it. Web characteristics Heterogeneity The internet enables users to access services and run application over a heterogeneity collection of computers network. It is applicable on the following: a) Computer network b) Computer hardware c) Operating system d) Programming language The internet consists of many different sorts of network, their differences are masked by fact that all of the computers attached to them use the Internet Protocols to communicate with one another. Openness This characteristic determines whether the system can be extended and re-implemented in various ways. The openness of distributed system is determined primarily by the degree to which a new resource sharing service can be added and be made available for use by variety of client programs. Security 1. Information security: it depends on three components; a. Confidentiality: it is confidential and to protect the unauthorized individual. b. Integration: it deals with the protection against alteration and corruption. c. Availability: it is protection against interference with the mean to access the resource. Example: In banking, users send their credit card number across the internet 2. Denial of service attacks: this is a security problem whereby a user may wish to disrupt a service for some reasons. 3. Security on mobile code: it needs to be handled with care. Failure handling Software, hardware or program may produce incorrect result or stop before completing an intended computation due to the system failure The following techniques could be employed in dealing with failure. a. Detecting failure: its failure can be detected. b. Masking failure: it is failure due to detect and can be hidden or made less server. BY MASESE Page 16 of 30 ICS 2403 DISTRIBUTED SYSTEMS c. Redundancy: services can be made to tolerate failure by the use of redundant components. There should always be at least two different routes between any two routes on the internet. Scalability A system is scalable if it will remain effective when there is significant increase in the number of resources and the number of users. For a system with user to be scalable, the quantity of physical resources required to support them should be O(n) that is proportional to n. Transparency This is concealment from the user and the application programmer of the separation of components in a distributed system so that the system is perceived as a whole rather than collection of independent components. Risks of distributed systems The challenges of distributed systems create a number of correlating risks. • Security. Distributed systems are as vulnerable to attack as any other system, but their distributed nature creates a much larger attack surface that exposes organizations to threats. • Risk of network failure. Distributed systems are beholden to public networks to transmit and receive data. If one segment of the internet becomes unavailable or overloaded, distributed system performance may decline. • Governance and control issues. Distributed systems lack the governability of monolithic, single-server-based systems, creating auditing and adherence issues around data privacy laws. Globally distributed environments are challenging when it comes to providing certain levels of assurance and understanding exactly where data resides. • Cost control. Unlike centralized systems, the scalability of distributed systems allows administrators to easily add additional capacity as needed, which can also increase costs. Pricing for cloud-based distributed computing systems are based on usage (such as the number of memory resources and CPU power consumed over time). If demand suddenly spikes, you might face a massive bill. Examples of distributed systems Here are some very common examples of distributed systems: BY MASESE Page 17 of 30 ICS 2403 DISTRIBUTED SYSTEMS • Telecommunications networks that support mobile and internet networks • Graphical and video-rendering systems • Scientific computing, such as protein folding and genetic research • Airline and hotel reservation systems • Multiuser video conferencing systems • Cryptocurrency processing systems (e.g. Bitcoin) • Peer-to-peer file-sharing systems • Distributed community computing systems • Multiplayer video games • Global, distributed retailers and supply chain management Distributed Systems History and OS Models Minicomputer model: In this model, each user has a local machine. The machines are interconnected, but the connection may be transient (e.g., dialing over a telephone network). All the processing is done locally but you can fetch remote data like files or databases. Workstation model: In this model, you have local area networks (LANs) that provide a connection nearly all of the time. An example of this model is the Sprite operating system. You can submit a job to your local workstation. If your workstation is busy, Sprite will automatically transmit the job to another idle workstation to execute the job and return the results. This is an early example of resource sharing where processing power on idle machines is shared. Client-server model: This model evolved from the workstation model. In this model, there are powerful workstations who serve as dedicated servers while the clients are less powerful and rely on the servers to do their jobs. Processor pool model: In this model, the clients become even less powerful (thin clients). The server is a pool of interconnected processors. The thin clients rely on the server by sending almost all their tasks to the server. Cluster computing systems / Data centers: In this model, the server is a cluster of servers connected over high-speed LAN. Grid computing systems: This model is similar to cluster computing systems except that the server is now distributed in location and is connected over a wide area network (WAN) instead of LAN. BY MASESE Page 18 of 30 ICS 2403 DISTRIBUTED SYSTEMS WAN-based clusters / distributed data centers: Similar to grid computing systems, but now it is clusters/data centers rather than individual servers that are interconnected over WAN. Virtualization and data center Cloud computing: Infrastructures are managed by cloud providers. Users only lease resources on demand and are billed on a pay-as-you-go model. Emerging Models - Distributed Pervasive Systems: The nodes in this model are no longer traditional computers but smaller nodes with microcontrollers and networking capabilities. They are very resource constrained and present their own design challenges. For example, today’s car can be viewed as a distributed system as it consists of many sensors, and they communicate over LAN. Other examples include home networks, mobile computing, personal area networks, etc. Applications and Real-World Examples of Distributed Computing Distributed computing is not just a theoretical concept; it has practical applications across various industries and sectors. Here are some notable examples and applications: Big Data Analytics: Distributed computing is fundamental in big data. It allows for the processing and analysis of vast datasets that are beyond the capacity of a single machine. Frameworks like Apache Hadoop and Spark are used for this purpose, distributing data processing tasks across multiple nodes. • Cloud Computing: Services like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform rely on distributed computing to offer scalable and reliable cloud services. These platforms host applications and data across numerous servers, ensuring high availability and redundancy. • Scientific Research: Many scientific projects require immense computational power. Distributed computing enables researchers to solve complex scientific problems by utilizing the combined power of multiple computers. An example is the SETI (Search for Extraterrestrial Intelligence) project, which uses the idle processing power of thousands of volunteered computers worldwide. BY MASESE Page 19 of 30 ICS 2403 DISTRIBUTED SYSTEMS • Financial Services: The financial sector employs distributed computing for high-frequency trading, risk management, and real-time fraud detection, where rapid processing of massive amounts of data is crucial. • Internet of Things (IoT): In IoT, distributed computing helps manage and process data from countless devices and sensors, enabling real-time data analysis and decision-making. Advantages of Distributed Computing Distributed Computing offers several significant advantages over traditional single-system computing. These include: • Scalability: Distributed systems can easily grow with workload and requirements, allowing for the addition of new nodes as needed. • Availability: These systems exhibit high fault tolerance. If one computer in the network fails, the system continues to operate, ensuring consistent availability. • Consistency: Despite having multiple computers, distributed systems maintain data consistency across all nodes, ensuring reliability and accuracy of information. • Transparency: Users interact with a distributed system as if it were a single entity, without needing to manage the complexities of the underlying distributed architecture. • Efficiency: Distributed systems offer faster performance and optimal resource utilization, effectively managing workloads and preventing system failures due to volume spikes or underuse of hardware. Types of Distributed Computing Architecture Distributed computing consists of various architectures, each with unique characteristics and use cases. The main types include: • Client-Server Architecture: This common structure divides functions into clients and servers. Clients handle limited processing and requests, while servers manage data and resources. It offers security and ease of management but can face bottlenecks in high-traffic situations. BY MASESE Page 20 of 30 ICS 2403 DISTRIBUTED SYSTEMS • Three-Tier Architecture: It adds a middle layer (application servers) between clients and database servers, reducing communication bottlenecks and improving performance. • N-Tier Architecture: Involves multiple client-server systems working together, often used in modern enterprise applications. • Peer-to-Peer Architecture: Assigns equal responsibilities to all networked computers, popular in content sharing, file streaming, and blockchain networks. Parallel Computing vs. Distributed Computing While often used interchangeably, parallel and distributed computing have distinct characteristics: Parallel Computing Involves multiple processors carrying out calculations simultaneously, typically within a single machine or tightly coupled system. All processors have access to shared memory, facilitating quick information exchange. Distributed Computing Consists of multiple computers (or nodes), each with its own private memory, working on a common task. These nodes communicate via message passing, making it a more loosely coupled system compared to parallel computing. This structure is ideal for tasks distributed across different geographic locations or separate systems. Review note Transparency in distributed systems Transparency is the concealment from the user and the application programmer of the separation of the components of a distributed system (i.e., a single image view). Transparency is a strong property that is often difficult to achieve. There are a number of different forms of transparency including the following: BY MASESE Page 21 of 30 ICS 2403 DISTRIBUTED SYSTEMS 1. Access Transparency: Local and remote resources are accessed in same way 2. Location Transparency: Users are unaware of the location of resources 3. Migration Transparency: Resources can migrate without name change 4. Replication Transparency: Users are unaware of the existence of multiple copies of resources 5. Failure Transparency: Users are unaware of the failure of individual components 6. Concurrency Transparency: Users are unaware of sharing resources with others Note that complete transparency is not always desirable due to the trade-offs with performance and scalability, as well as the problems that can be caused when confusing local and remote operations. Furthermore, complete transparency may not always be possible since nature imposes certain limitations on how fast communication can take place in wide-area networks. Is decentralized systems a subset of distributed systems? No, decentralized systems are a superset of distributed systems. All distributed systems are decentralized but not every decentralized system is a distributed system. Examples include parallel machines and networked machines. Distributed systems have the following advantages: 1. Resource sharing. Distributed systems enable communication over the network and resource sharing across machines (e.g., a process on one machine can access files stored on a different machine). 2. Economic. Distributed systems lead to better economics in terms of price and performance. It is usually more cost-effective to buy multiple inexpensive small machines and share the resources across those machines than buying a single large machine. 3. Reliability. Distributed systems have better reliability compared to centralized systems. When one machine in a distributed system fails, there are other machines to take over its task, and the whole system can still function. It is also possible to achieve better reliability with a distributed system by replicating data on multiple machines. 4. Scalability. As the number of machines in a distributed system increases, all of the resources on those machines can be utilized which leads to performance scaling up. However, it is usually hard to achieve linear scalability due to various bottlenecks. BY MASESE Page 22 of 30 ICS 2403 DISTRIBUTED SYSTEMS 5. Incremental growth. If an application becomes more popular and more users use the application, more machines can be added to its cluster to grow its capacity on demand. This is an important reason why the cloud computing paradigm is so popular today. Types of Distributed Systems Distributed Computing Systems • Many distributed systems are configured for High-Performance Computing Cluster Computing: Essentially a group of high-end systems connected through a LAN: Distributed Information Systems • The vast amount of distributed systems in use today are forms of traditional information systems, that now integrate legacy systems. Example: Transaction processing systems. Distributed Pervasive Systems • There is a next-generation of distributed systems emerging in which the nodes are small, mobile, and often embedded as part of a larger system. What is the Criterion of Distributed Computer System (Metrics)? i. Latency – network delay before any data is sent ii. Bandwidth – maximum channel capacity (analogue communication Hz, digital communication bps) iii. Granularity – relative size of units of processing required. Distributed systems operate best with coarse grain granularity because of the slow communication compared to processing speed in general iv. Processor speed v. Reliability – ability to continue operating correctly for a given time vi. Fault tolerance – resilience to partial system failure vii. Security – policy to deal with threats to the communication or processing of data in a system viii. Administrative/management domains – issues concerning the ownership and access to distributed systems components BY MASESE Page 23 of 30 ICS 2403 DISTRIBUTED SYSTEMS Applications of distributed computing and newer challenges 1. Mobile systems 2. Sensor networks 3. Ubiquitous or pervasive computing 4. Peer-to-peer computing 5. Publish-subscribe, content distribution, and multimedia 6. Distributed agents 7. Distributed data mining 8. Grid computing 9. Security in distributed system 1. Mobile systems Mobile systems typically use wireless communication which is based on electromagnetic waves and utilizes a shared broadcast medium the characteristics of communication are different; set of problems such as a. routing, b. location management, c. channel allocation, d. localization and position estimation, e. the overall management of mobility f. There are two popular architectures for a mobile network. 1. base-station approach, also known as the cellular approach, wherein a cell which is the geographical region within range of a static but powerful base transmission station is associated with that base station 2. ad-hoc network approach where there is no base station All responsibility for communication is distributed among the mobile nodes, wherein mobile nodes have to participate in routing by forwarding packets of other pairs of communicating nodes 2. Sensor networks BY MASESE Page 24 of 30 ICS 2403 DISTRIBUTED SYSTEMS A sensor is a processor with an electro-mechanical interface that is capable of sensing physical parameters, such as temperature, velocity, pressure, humidity, and chemicals Sensors may be mobile or static; sensors may communicate wirelessly, although they may also communicate across a wire when they are statically installed. 3. Ubiquitous or pervasive computing The intelligent home, and the smart workplace are some example of ubiquitous environments Ubiquitous systems are essentially distributed systems; recent advances in technology allow them to leverage wireless communication and sensor and actuator mechanisms . 4. Peer-to-peer computing • Peer-to-peer (P2P) computing represents computing over an application layer network wherein all interactions among the processors are at a “peer” level, without any hierarchy among the processors. • P2P computing arose as a paradigm shift from client–server computing where the roles among the processors are essentially asymmetrical. • P2P networks are typically self-organizing, and may or may not have a regular structure to the network. 5. Publish-subscribe, content distribution, and multimedia In a dynamic environment where the information constantly fluctuates there needs to be: i. an efficient mechanism for distributing this information (publish), ii. an efficient mechanism to allow end users to indicate interest in receiving specific kinds of information (subscribe), iii. an efficient mechanism for aggregating large volumes of published information and filtering it as per the user’s subscription filter 6. Distributed agents Agents collect and process information, and can exchange such information with other agents Challenges in distributed agent systems include coordination mechanisms among the agents, controlling the mobility of the agents, and their software design and interfaces. 7. Distributed data mining BY MASESE Page 25 of 30 ICS 2403 DISTRIBUTED SYSTEMS The data is necessarily distributed and cannot be collected in a single repository, massive to collect and process at a single repository in real-time. 8. Grid computing Grid Computing is a subset of distributed computing, where a virtual supercomputer comprises machines on a network connected by some bus, mostly Ethernet or sometimes the Internet. idle CPU cycles of machines connected to the network will be available to others 9. Security in distributed systems The traditional challenges of security in a distributed setting include: confidentiality (ensuring that only authorized processes can access certain information), authentication (ensuring the source of received information and the identity of the sending process), availability (maintaining allowed access to services despite malicious actions). A model of distributed computations A distributed system consists of a set of processors that are connected by a communication network. The communication network provides the facility of information exchange among processors. The processors do not share a common global memory and communicate solely by passing messages over the communication network. Discuss about the transparency requirements of distributed system. Transparency deals with hiding the implementation policies from the user, and can be classified as follows • Access transparency hides differences in data representation on different systems and provides uniform operations to access system resources. • Location transparency makes the locations of resources transparent to the users. • Migration transparency allows relocating resources without changing names. • Relocation transparency: The ability to relocate the resources as they are being accessed is. BY MASESE Page 26 of 30 ICS 2403 DISTRIBUTED SYSTEMS • Replication transparency does not let the user become aware of any replication. • Concurrency transparency deals with masking the concurrent use of shared resources for the user. • Failure transparency refers to the system being reliable and fault-tolerant. 6. List the algorithmic challenges in designing a distributed system. • Designing useful execution models and frameworks • Dynamic distributed graph algorithms and distributed routing algorithms • Time and global state in a distributed system • Synchronization/coordination mechanisms • Group communication, multicast, and ordered message delivery • Monitoring distributed events and predicates • Distributed program design and verification tools • Debugging distributed programs • Data replication, consistency models, and caching What do you understand the by load balancing in a distributed environment? The goal of load balancing is to gain higher throughput, and reduce the user perceived latency. Load balancing may be necessary because of a variety of factors such as high network traffic or high request rate causing the network connection to be a bottleneck, or high computational load the objective is to service incoming client requests with the least turnaround time. The following are some forms of load balancing: • Data migration- The ability to move data (which may be replicated) around in the system, based on the access pattern of the users. • Computation migration -The ability to relocate processes in order to perform a redistribution of the workload. • Distributed scheduling -This achieves a better turnaround time for the users by using idle processing power in the system more efficiently. Explain in detail about the design issues of a distributed System. BY MASESE Page 27 of 30 ICS 2403 DISTRIBUTED SYSTEMS The following functions must be addressed when designing and building a distributed system: 1. Communication 2. Processes 3. Naming 4. Synchronization 5. Data storage and access 6. Consistency and replication 7. Fault tolerance 8. Security 9. Applications Programming Interface (API) and transparency 10. Scalability and modularity 1. Communication This task involves designing appropriate mechanisms for communication among the processes in the network. Some example mechanisms are: remote procedure call (RPC), remote object invocation (ROI), message-oriented communication versus stream-oriented communication. 2. Processes Some of the issues involved are: management of processes and threads at clients/servers; code migration; and the design of software and mobile agents. 3. Naming Devising easy to use and robust schemes for names, identifiers, and addresses is essential for locating resources and processes in a transparent and scalable manner. 4. Synchronization Mechanisms synchronization or coordination among the processes are essential. Mutual exclusion is the classical example of synchronization, In addition, synchronizing physical clocks, and devising logical clocks that capture the essence of the passage of time, 5. Data storage and access Schemes for data storage, and implicitly for accessing the data in a fast and scalable manner across the network are important for efficiency. Traditional issues such as file system design have to be reconsidered in the setting of a distributed system. BY MASESE Page 28 of 30 ICS 2403 DISTRIBUTED SYSTEMS 6. Consistency and replication To avoid bottlenecks, to provide fast access to data, and to provide scalability, replication of data objects is highly desirable. 7. Fault tolerance Fault tolerance requires maintaining correct and efficient operation in spite of any failures of links, nodes, and processes. Process resilience, reliable communication, distributed commit, checkpointing and recovery, agreement and consensus, failure detection, and self-stabilization are some of the mechanisms to provide fault-tolerance. 8. Security Distributed systems security involves various aspects of cryptography, secure channels, access control, key management – generation and distribution, authorization, and secure group management. 9. Applications Programming Interface (API) and transparency Transparency deals with hiding the implementation policies from the user, and can be classified as follows • Access transparency hides differences in data representation on different systems and provides uniform operations to access system resources. • Location transparency makes the locations of resources transparent to the users. • Migration transparency allows relocating resources without changing names. • Relocation transparency: The ability to relocate the resources as they are being accessed is. • Replication transparency does not let the user become aware of any replication. • Concurrency transparency deals with masking the concurrent use of shared resources for the user. • Failure transparency refers to the system being reliable and fault-tolerant. 10. Scalability and modularity • The algorithms, data (objects), and services must be as distributed as possible. • Various techniques such as replication, caching and cache management, and asynchronous processing help to achieve scalability. BY MASESE Page 29 of 30 ICS 2403 DISTRIBUTED SYSTEMS 4. Explain the algorithmic challenges of designing a distributed system • Designing useful execution models and frameworks • Dynamic distributed graph algorithms and distributed routing algorithms • Time and global state in a distributed system • Synchronization/coordination mechanisms • Group communication, multicast, and ordered message delivery • Monitoring distributed events and predicates • Distributed program design and verification tools • Debugging distributed programs • Data replication, consistency models, and caching BY MASESE Page 30 of 30