Distributed systems By. Issa Al smadi 1 Distributed Systems Definitions “ A system in which hardware or software components located at networked computers communicate and coordinate their actions only by message passing . ” [Coulouris] “A system that consists of a collection of two or more independent computers whish coordinate their processing through the exchange of synchronous or asynchronous message passing .” “ A distributed system computer .” [ Tanenbaum ]. is a collection of independent “A distributed system is a collection of independent computer linked by a network with software designed to produce an integrated computing facility.” 2 What is a Distributed System ? Some comments : •System architecture: the machines are autonomous ; this means they are computers which, in principle, could work independently; • The user’s perception : the distributed system is perceived as a single system solving a certain problem (even though, in reality we have several computers placed in different locations). By running a distributed system software the computers are enabled to : -coordinate their activates -share resources: hardware, software, data 3 Reasons for Distributed Systems 4 1. Functional distribution : computers have different functional capabilities ─ Client / server ─ Host / terminal ─ Data gathering / data processing 2. Load distribution / balancing : assign tasks to processors such that the overall system performance is optimized. Reasons for Distributed Systems 3. Replication of processing power : independent processors working on the same task: distributed systems consisting of collections of microcomputers may have processing powers that no supercomputer will ever achieve • 4. 10000 CPUs, each running at 50 MIPS, yields 500000 MIPS, then instruction to be executed in 0.002 nsec, equivalent to light distance of 0.6 mm – any processor chip of that size would melt immediately Physical separation: systems that rely on the fact that computers are physically separated (e.g., to satisfy reliability requirements). 5. Economics: collections of microprocessors offer a better price / performance ration than large mainframes (mainframes: 10 times faster, 1000 times as expensive) 5 Disadvantages of Distributed System Difficulties of developing distributed software : how should operating systems, programming languages and applications look like ? Networking problems: several problems are created by the network infrastructure, which have to be dealt with : loss of messages, overloading,.. Security problems: sharing generates the problem of data security. 6 Examples of Distributed Systems 1-The internet -Heterogeneous network of computers and applications -Implemented through the internet Protocol Stack -Typical configuration: 7 Examples of Distributed Systems •Characteristics of Internet •Very large and heterogeneous •Enables email, file transfer, multimedia communications, WWW, … •Open-ended •Connects intranets (via backbones) with home users (via modems, ISPs) 8 Examples of Distributed Systems 2- Intranets -Locally administered network -Usually proprietary ( e.g., the University campus network ( -Interface with the Internet • firewalls - Provides services internally and externally 9 Characteristics of intranets •Several LANs linker by backbones •Enables info. Flow within organization - Electronic data, documents, … •Provides services - Email, file, print services, … •Often connected to Internet via router •In / out communications protected by firewall 10 Examples of Distributed Systems 3- Mobile and Computing Systems 1- Cellular phone system (e.g., GSM, UMTS) •Resources being shared Radio frequencies The mobile on the move 2- Laptop computers •Wireless LANs (uni campus WLAN soon to come here?) 3- Handheld devices, PDAs etc. 11 Mobile & computing Wireless LANs (WLANs) -Connectivity for portable devices (laptops, PDAs, mobile phones, video / dig. Cameras,…) -WAP (Wireless Applications Protocol) Home intranet -Devices embedded in home appliances (hi-fi, washing machines) -Universal ‘remote control’ + communication 12 Examples of Distributed Systems 4- Embedded Systems 1-Avionics ( airplanes engineering ) control system •Flight management systems in aircraft 2-Automotive control system •Mercedes S-Class automobiles these days are equipped with 50+ autonomous embedded processors •Connected through proprietary bus-like LANs 3-Consumer Electronics •Audio HiFi equipment 13 Examples of Distributed Systems 5- Network file Systems 1- Architecture to access file systems across a network 2- Famous example •Network File System (NFS), originally developed by SUN Microsystems for remote access support in a UNIX context •FTP 6- The World Wide Web 1- Open client-server architecture implemented on top of the Internet 2- Shared resources •Information, uniquely identified through a Uniform Resource Locator (URL) 14 15 Computer network vs. Distributed Systems Computer network :the autonomous computers are explicitly visible (have to be explicitly addressed) Distributed System : existence of multiple autonomous computer is transparent However -Many problems in common -In some sense networks (or parts of them, e.g., name services) are also distributed systems, and - Normally, every distributed system relies on services provided by a computer network. 16 Challenges in the design of Distributed Systems 1. Heterogeneity of: • Distributed applications are typically heterogeneous: – Different hardware : mainframes, workstations, PCs, servers , etc.; – Different software : UNIX ,MS Windows ,IBM OS/2 , Real-time OSs, etc.; – Unconventional devices : teller machines , telephone switches, robots, etc. – Diverse networks and protocols: Ethernet, FDDI, ATM, TCP/IP – Different Programming language (in particular, data representations). The solution: • • 17 Middleware (e.g., CORBA) : transparency of network , hardware and software and programming language heterogeneity. Mobile Code (e.g., java) ; transparency from hardware and software and programming language heterogeneity through virtual machine concept. Challengesin in the the design design of of Distributed Distributed Systems Challenges Systems 2. Openness One of the important features of distributed systems is openness and flexibility : - • Ensure extensibility and maintainability of systems Every service is equally accessible to every client (local or remote). It is easy to implement, install and debug new services Users can write and install their own services. Key aspect of openness: – Standard interfaces and protocols (like internet communication protocols) – Support of heterogeneity (by adequate middleware, like CORBA) 18 Openness ( cont’d ) The same looking at two distributed nodes Application & services Hardware: Comp.&NW Node 1 19 Platform 1 Operating system Platform 2 Middleware Operating system Hardware: Comp.&NW Node 2 Openness ( cont’d ) Software Architecture Application & services Middleware Operating system The platform Hardware and computer networked 20 Challenges in the design of Distributed Systems 3. Security - Privacy Authentication Confidentiality : Protection against disclosure to unauthorized person. - Integrity: protection against alteration and corruption. - Availability: Keep the resource accessible 21 Challenges in the design of Distributed Systems 4. Scalability The system should remain efficient even with a significant increase in the number of users and resources connected; - cost of adding resources should be reasonable -Performance loss with increased number of users and resources should be controlled -Software resources should not run out (number of bits allocated to addresses, number of entries in tables , etc.) Solution: •IP addresses: from 32 to 128 bits 22 23 Challenges in the design of Distributed Systems 5. Handling of failures -Detection (may be impossible) -masking •retransmission •Redundancy of data storage -Tolerance •Exception handling (e.g., timeouts when waiting for a web resource) -redundancy •Redundant routes in network 6. Concurrency - Try to Avoid of dead e lock problems. 24 Challenges in the design of Distributed Systems 7. Performance Several factors are influencing the performance of a distributed system : 1. The performance of individual workstations. 2. The speed of the communication infrastructure. 3. Extent to which reliability (fault tolerance ) is provided (replication and preservation of coherence imply large overheads). 4. Flexibility in workload allocation: for example, idle processors ( workstations) could be allocated automatically to a user’s task. 25 Challenges in the design of Distributed Systems 8. transparency: concealing the heterogeneous and distributed nature of the system so that it appears to the user like one system . -Transparency categories (according to ISO’s Reference Model for ODP, quoted after [Coulouris ]) 1. Access :access local and remote resources using identical operations * e.g., network mapped drive using samba server , NFS mounts 2. Location : access without knowledge of location of a resource * e.g., URLs , email addresses 26 transparency ( cont’d ) 3. Concurrency : allow several processes to operate concurrently using shared resources in a consistent fashion 3. Replication : The system is free to make additional copies of files and other resources (for purpose of performance and / or reliability ) , without the users noticing. Example :several copies of a file; at a certain request that copy is accessed which is the closest to the client . 5. Failure : allow programs to complete their task despite failures retransmit of email messages 27 6- Mobility : Resources should be free to move from one location to another without having their names changed 7- Performance: adaptation of the system to varying load situations without the user noticing it. This could be achieved by automatic reconfiguration as response to changes of the load ; it is difficult to achieve 8- Scaling : allow system and applications to expand without need to change structure or application algorithms 28 Forms of Transparency in a Distributed System 29 Transparency Description Access Hide differences in data representation and how a resource is accessed Location Hide where a resource is located Migration Hide that a resource may move to another location Relocation Hide that a resource may be moved to another location while in use Replication Hide that a resource may be shared by several competitive users Concurrency Hide that a resource may be shared by several competitive users Failure Hide the failure and recovery of a resource Persistence Hide whether a (software) resource is in memory or on disk Distributed Computing Systems Communication •Components of a distributed system have to communicate in order to interact. This implies support at two levels : 1.Networking infrastructure (interconnections & network software). 2.Appropriate communication primitives and models and their implementation: Communication primitives: Send message passing Receive Remote procedure call (RPC) Communication models -Client-server communication: implies a message exchange between two processes : the process which requests a service and the one which provides it; -Group multicast : the target of a message is a set of processes, `30which are members of a given group . Models 31 Overview •System architecture •Software layer •Architecture models –Client-server , peer processes,… –Mobile code , agents •Design requirements –User expectations of the system 32 Distributed design Customer service Scanner Data base server Mainframe Printer service 33 Architecture Distribute Systems are foremost highly complex software systems -Nortel network DMS-100 switch:25-30 million lines of code,3000 software developers ,20 years life cycle to date. -Motorola:20% of engineers produce hardware ,80% produce software -Subject to all kinds of software engineers problems •Investigation of software Architecture to deal with design challenges - “…….include the organization of a system as composition of component:global control structure the protocols for communication ,synchronization,and data access:the assignment of functionality to design elements the composition of design elements physical distribution scaling and performance dimension of evolution and selection among design alternatives this is the software architecture level of design “[garlan and Shaw] •Architecture paradigms pertinent to distributed systems - layers 34 - client-server Layers •basic idea -Breaking up the complexity of systems by designing them through layers and services -layer: group of closely related and highly coherent functionalities -service: functionalities provided to superior layer Layer n+1 Layer n Layer n-1 •Example of layered architecture -operating system, ( Kernel , other services )historical :the operating system -computer 35 network protocol architectures Layers •Typical layering in Distributed systems Application service Middleware Operating system Computer and network hardware -platform :Hardware and operating system -windows NT / Pentium processor 36 platform Software Layers Extend services available to those of the distributed system Language an runtime support for program interaction Applications Open ( distributed ) services Middleware Conventional and distributed application Responsible for basic local resource management memory allocation protection 37 Operating system platform Computer and network hardware Software Layers Service Layers •Higher-level access services at lower layer •Services can be located on different computers •process type: -server process -client process -peer process 38 Terminology • Server: process that accepts requests from other processes & interacts with other servers and client processes to provide a consistent view of its services • Platform: low-level layers that provide services to other higher layers & bring to them a system’s programming interface for communication & coordination between processes 39 Terminology • Middleware: a layer of software whose purpose is to mask heterogeneity & provide a unified distributed programming interface to application programs by providing useful building blocks & communication mechanisms • Examples: sun RPC, java RMI, CORBA • Limitations: require application level involvement in some tasks like error corrections and security 40 Models Models can be used to provide an abstract and simplified description of certain relevant aspects of distributed systems. Model types: 1.Architectural models define the way responsibilities are distributed among components and how they are placed in the systems. Three architectural models: 1.Client-server model 2.Proxy server 3.Peer processes 43 Models 2. Interaction models deal with how time is handled throughout the system. Two interaction models have been introduced: 1.Synchronous distributed systems 2.Asynchronous distributed systems 3. Failure models: • The failure model specifies what kind of failures can occur and what their effects are. 1.Omission failures 2.Arbitrary failures 3.Timing failures 44 Architectural models Architectural models Define -software component (processes ,objects) -ways in which components interact -mapping of components onto the underlying network •Why needed? -to handle varying environments and usage -to guarantee performance 45 Client/Server Performance • Performance, scalability and mobility of the client/server model can be improved by – Partitioning or replicating data on servers – Caching data at proxy servers or clients – Using mobile code and mobile agents 46 Leon Jololian/George Blank 7/1/2016 Architectural models 1- Client Server System 1.1 One Tier Architecture Network computer Or PC’s with terminal emulation 47 Presentation ( to clients ) + processing )transactions , applications ) + data ( management & access ) Architectural models 1- Client Server System 1.2 Two Tier Architecture Client-Server “ fat client “ Or “ fat server” workstation Presentation + processing Or presentation 48 Data ( remote data access ) Data processing ( Remote procedure call ) Architectural models Architectural 1- Client Server System 1.3 Three Tier Architecture Shared application services clients presentation 49 Two tier is satisfactory for simple clientserver application , but more demanding transaction processing application Remote data access Procedure call processing Shared Data services data Remote data access or Transaction processing Client - server Basic model invocation invocation client result Server Server result Client client :process wishing to access data use resources or perform operation on different computer server : process managing data and all others shared resources amongst server and allow clients access to resource and performs computation 50 interaction :invocation / result message pairs Variants •-service provided by multiple servers Service server client server client server Examples : many commercial webs services are implemented through different physical server -performance (e.g CNN.com,down load servers,etc) -reliability ° Server maintain either replicated or distributed database 51 Client - server Variants -proxy servers : render replication / distributedness transparent client Proxy server -Caching client Web server Web server -proxy server maintains cache store of recently requested resources -frequently used in search - engines: Google (if we search for any page It may take 0.2 sec to find it, but at second search it will take 0.04 sec 52 proxy server A proxy server provides copies (replication) of resources which are managed by other servers. client client Proxy server server server • Proxy servers are typically used as caches for web resources they maintain a cache of recently visited web pages or other resources. • Proxy server can be located at each client or can be shared by several client . •The purpose is to increase performance and availability , by avoiding frequent accesses to remote servers. 53 Client - server Further variants of client-server model -mobile code code that is sent to a client process to carry out a specific task - Examples • Applets • active messages(containing communications protocol code) • mobile agents - executing program (code+data),migrating amongst processes ,carrying out of an autonomous task, usually on behalf of some other process . - Advantages :flexibility ,savings in communications cost 54 Client-server Model Variations (Thin Clients) • Software layer that supports a window-based user interface on a local computer while executing application programs on a remote computer. • Same as the network computer scheme but instead of downloading the applications code into the user’s computer, it runs them on a server machine, compute server. • Compute server is a powerful computer that has the capacity to run large numbers of applications simultaneously. • Disadvantage: Increasing of the delays in highly interactive graphical applications 55 Thin clients executing windows -based user interface on local computer while application executes on computer server. -example : X11 server (run on the application client side) • mobile devices for mobile computing -personal digital assistance ( PDAs) -how to connect to internet • • 56 wireless LANs/MANs wireless Personal Area Networks Client - Server Music service Alarm server gateway Hotel wireless Internet Camera discovery service • Further variants of client-server model TV/PC laptop PDA Guest device -spontaneous networking - characteristics * W-LAN confronted with constantly changing set of heterogeneous mobile devices * Devices roaming in heterogeneous W-LAN environments - Benefits * no need for wire line connection * Easy access to locally available services 57 Client - server gateway Music service Alarm server Hotel wireless Internet discovery Discovery service service TV/PC laptop Camera PDA Further variant of client - server mode * spontaneous networking -Discovery services *services available in the network * their properties ,and how to access them ( including device-specific driver information ) -Interfaces of discovery services * registration service accept registration requests from servers , stores properties in database of currently available services * lookup 59 services match requested services with available services Architectures Design Requirements • Performance Issues: – Considered under the following factors: • Responsiveness: – Fast and consistent response time is important for the users of interactive applications. – Response speed is determined by the load and performance of the server and the network and the delay in all the involved software components. – System must be composed of relatively few software layers and small quantities of transferred data to achieve good response times. • Throughput: – The rate at which work is done for all users in a distributed system. • Load balancing: 60 – Enable applications and service processes to proceed concurrently without competing for the same resources. – Exploit ()مأثرةavailable processing resources. Architectures Design Requirements • Quality of Service: – Main system properties that affect the service quality are: • • • • Reliability: related to failure fundamental model (discussed later). Performance: ability to meet timeliness guarantees. Security: related to security fundamental model (discussed later). Adaptability: ability to meet changing resource availability and system configurations. • Dependability issues: – Achieved by: • Fault tolerance: continuing to function in the presence of failures. • Security: locate sensitive data only in secure computers. • Correctness of distributed concurrent programs: research topic. 61 Fundamental Models (Interaction Model) • Interacting processes in a distributed system are affected by two significant factors: 1. Performance of communication channels: is characterized by: • Latency: delay between sending and receipt of a message including – Network access time. – Time for first bit transmitted through a network to reach its destination. – Processing time within the sending and receiving processes. • Throughput: number of units (e.g., packets) delivered per time unit. • Bandwidth: total amount of information transmitted per time unit. • Jitter: variation in the time taken to deliver multiple messages of the same type (relevant to multimedia data). 64 Interaction Model 2. Computer clocks • Clock drift rate: relative amount a computer clock differs from a perfect reference clock • Clock corrections can be made by sending messages which will still be affected by network delays 65 Fundamental interaction Model synchronous distributed system * time to execute each step of computation within a process has known lower and upper bounds * message delivery times are bounded to known value * each process has a clock whose drift rate from real times is bounded by a known value • Asynchronous distributed system : (no bounds on) * process execution times * message delivery times * clock drift rate • Note * synchronous distributed systems are easier to handle but determining realistic bounds can be hard or impossible * asynchronous systems are more abstract and general : a distributed algorithm executing 66 on one system is likely to also work on another one Synchronous Distributed Systems Main features: • lower and upper bounds on execution time of processes can be set • transmitted messages are received within a known bounded time • drift rates between local clocks have a known bound Important consequences : 1. Only synchronous distributed system have a predictable behavior in terms of timing only such systems can be used for hard real-time application 2. In a synchronous distributed system it is possible and safe to use timeouts in order to detect failures of a process or communication link . ** it is difficult and costly to implement synchronous distributed systems. 67 Asynchronous Distributed Systems * Many distributed systems (including those on the internet) are asynchronous. • No bound on process execution time (nothing can be assumed about speed , load , reliability of computers). • No bound on message transmission delays(nothing can be assumed about speed , load , reliability of interconnections). • No bounds on drift rates between local clocks. Important consequences: 1. In an asynchronous distributed systems there is no global physical time Reasoning can be only in terms of logical time 2. Asynchronous distributed systems are unpredictable in terms of timing. 3. No timeouts can be used. 68 Fundamental Models (Interaction Model) • Event ordering: when need to know if an event at one process (sending or receiving a message) occurred before, after, or concurrently with another event at another process. • It is impossible for any process in a distributed system to have a view on the current global state of the system. • The execution of a system can be described in terms of events and their ordering despite the lack of accurate clocks. • Logical clocks define some event order based on causality. • Logical time can be used to provide ordering among events in different computers in a distributed system (since real clocks cannot be synchronized). 69 Fundamental Models (Interaction Model) s end X receiv e 1 m1 Y 4 s end 3 2 receiv e m2 receiv e Phy sical time receiv e s end Z receiv e receiv e m3 A t1 t2 m1 receiv e receiv e receiv e t3 Real-time ordering of events 70 m2 71 The “Happened Before “ Relation • Lamport defined the happened before relation (denoted as “ “), which describes a casual ordering of events: (1) if a and b are events in the same process, and a occurred before b, then a b (2) if a is the event of sending a message m in one process, and b is the event of receiving that message m in another process, then a (3) if a b , and b c , then a b c (i.e., the relation “ transitive Causality: * past events influence future events * this influence among casually related events (those that can be ordered by “ 72 * if a “ ) is referred to as casual affects b , event a casually affects event b “ is Failure Models What kind of failures can occur and what are there effects ? • Omission failures • Arbitrary failures • Timing failures ** Failures can occur both in processes and communication channels ,the reason can be both software and hardware . ** Failure models are needed in order to build systems with predictable behavior in case of failures (systems which are fault tolerant ). 73 Omission failure A processor or communication channel fails to perform actions it is supposed to do . This means that the particular action is not performed ! We do not have an omission failure if : • An action is delayed (regardless how long) but finally executed. •An action is executed with an erroneous result. If we are sure that messages arrive, a timeout will indicate that the Sending process has crashed. Such a system has a fail-stop behavior 74 Failures Process p Process q Send m Outgoing message buffer Omission Failures Receive Communication channel ●Process omission failures: process crashes Incoming message buffer • Detection with timeouts • Crash is fail-stop if other processes can detect with certainty that process has crashed ●Communication omission failures: message is not being delivered (dropping of messages) possible causes: • Network transmission error • Receiver incoming message buffer overflow Arbitrary failures (Any type of error can occur in processes or channels (worst).) Process: omit intended processing steps or carry out unwanted ones ● 75 Communications channel: e.g., non-delivery, corruption or duplication Failures Class of failure Affects description Fail-stop process Crash process Process halts and remains halted. Other processes may not be able to detect this state. Omission Channel A message inserted in an outgoing message buffer never arrives at the other end’s incoming message buffer Send-omission process process A process completes a send, but the message is not put in it’s outgoing message buffer Receive-omission Arbitrary (Byzantine) 76 Process or channel process halts and remains halted. Other processes may detect this state A message is put in process’s incoming message buffer, but that process does not receive it . Process/channel exhibits arbitrary behavior: it may send/transmit arbitrary message at arbitrary times. Commit omissions: a process may stop or taken an incorrect step. Timing failures description Class of Failure Affects Process Process’s local clock exceeds the bounds on its rate of drift from real time. Process Process exceeds the bounds on the interval between two steps. channel A message’s transmission takes longer than the stated bound Clock performance performance 77 Security Model The security of a DS can be achieved by securing the processes and the channels used in their interactions and by protecting the objects that they encapsulate against unauthorized access. 78 •To model security threats, we postulate an enemy that is capable of sending any process or reading/copying message between a pair of processes •Threats form a potential enemy: threats to processes, threats to communication channels, and denial of service. 80 Object Interaction: RMI and RPC 81 Overview • Distributed applications programming - distributed objects model - RMI, invocation semantics - RPC Products - Java RMI,CORBA,DCOM - Sun RPC - JINI 82 Why Middleware? • Location transparency - client/server need not know their location • Sits on top of OS, independent of: - communication protocols: use abstract request-reply protocols over UDP,TCP - computer hardware: use external data representation e.g. CORBA CDR - operating system: use e.g. socket abstraction available in most systems - programming language: 83 e.g. CORBA supports Java, C++ Middleware Layer Applications RMI , RPC and events Request-reply protocol External data representation Operating System 84 Middleware layer Objects object Data Implementation of methods object interface m1 m2 m3 Data m4 m5 Implementation of methods • Objects = data + methods • Interact via interfaces: - define types of arguments and exceptions of methods 85 The object model • Programs logically partitioned into objects - distributing objects natural and easy • Interfaces - the only means to access data, make them remote? • Actions - via method invocation - interaction, chains of invocations - may lead to exceptions, part of interface • Garbage collection - reduced effort, error-free (Java, not C++) 86 The distributed object model C REMOTE INVOCATION A LOCAL INVOCATION B LOCAL INVOCATION LOCAL INVOCATION REMOTE INVOCATION F E D • Objects distributed (client-server models) • Extend with - Remote object reference - Remote interfaces -87Remote method invocation (RMI) Advantages of distributed objects • Data encapsulation gives better protection - concurrent processes, interference • Method invocations - can be remote or local • Objects - can act as clients, servers, etc - can be replicated for fault-tolerance and performance - can migrate, be cached for faster access 88 Remote Object Reference • Object References - used to access objects which live in processes - can be passed as arguments, stored in variables,… • Remote Object References - object identifiers in a distributed system - must be unique in space and time - error returned if accessing a deleted object - can allow relocation 89 Remote Object Reference • Constructing unique remote object reference - IP address, port, interface name - time of creation, local object number (new for each object) • Use the same as for local object references • If used as addresses - cannot support relocation 32 bit Internet address 90 32 bit Port number 32 bit time 32 bit Object number Interface of remote object Remote interfaces • Specify externally accessed - variables and procedures - no direct references to variables (no global memory) - local interface separate • Parameters - input, output or both, - instead of call by value, call by reference • No pointers • No constructors 91 Remote Object and its interfaces Remote object Remote interface Data m1 m2 m3 Implementation Of method • CORBA: Interface Definition Language (IDL) • Java RMI: as other interfaces, keyword remote 92 Local interface m4 m5 m6 Handling remote objects • Exceptions - raised in remote invocation - clients need to handle exceptions - timeouts in case server crashed or too busy • Garbage collection - distributed garbage collection may be necessary - combined local and distributed collector 93 RMI issues • Local invocations -executed exactly once • Remote invocations - via Request-Reply - may suffer from communication failures! • retransmission of request/reply • message duplication, duplication filtering - no unique semantics... 94 Invocation semantics summary Fault tolerance measures Invocation semantics Retransmit request Duplicate message No filtering Not applicable Yes No Yes Yes 95 Re-execute procedure or retransmit reply Not applicable Re-execute procedure Retransmit reply Maybe At-least-once At-most-once Re-executing a method sometimes dangerous... Maybe invocation • Remote method - may execute or not at all, invoker cannot tell - useful only if occasional failures • Invocation message lost… - method not executed • Result not received… - was method executed or not? • Server crash… - before or after method executed? - if timeout, result could be received after timeout... 96 At-least-once invocation • Remote method - invoker receives result (executed exactly) or exception (no result, executed once or not at all) - retransmission of request message • Invocation message retransmitted… - method may be executed more than once - arbitrary failure (wrong result possible) - method must be idempotent (repeated execution has the same effect as a single execution) • Server crash… 97 - dealt with timeouts, exceptions At-most-once invocation • Remote method - invoker receives result (executed once) or exception (no result) - retransmission of reply & request messages - duplicate filtering • Best fault-tolerance… - arbitrary failures prevented if method called at most once • Used by CORBA and Java RMI 98 Transparency of RMI •should remote method invocation be same as local? – Same syntax – need to hide •data marshalling •IPC calls •locating/contacting remote objects •Problems – different RMI semantics? susceptibility to failures? – Protection against interference in concurrent scenario? • Approaches (Java RMI) – transparent,but express differences in interfaces – provide recovery features 99 Implementation of RMI Server Client object A proxy for B Request Skeleton &dispatcher for B’s class Remote object B Reply Remote reference module Communication module Communication module Remote reference module Object A invokes a method in a remote object B: communication module reference module,RMI software. 100 Communication modules •Reside in client and server •Carry out Request-Reply jointly – use unique message ids (new integer for each message) – implement given RMI semantics •Server’s communication module – selects dispatcher within RMI software – converts remote object reference to local 101 Remote reference module •Creates remote object references and proxies •Translates remote to local references (object table): – correspondence between remote and local object references (proxies) •Directs requests to proxy (if exists) •Called by RMI software – when marshalling / unmarshalling 102 RMI Software architecture •Proxy – behaves like local object to client – forwards requests to remote object •Dispatcher – receives request – selects method and passes on request to skeleton •skeleton – implements methods in remote interface – unmarshals data, invokes remote object – Waits for result, marshals it and returns reply 103 Remote Procedure Call (RPC) •RPC – historically first,now little used – over Request-Reply protocol – usually at-least-once or at-most-once semantics – can be seen as a restricted form of RMI –sun RPC •RPC software architecture – similar to RMI (communication,dispatcher and stub in place of proxy / skeleton) 104 RPC client and server Client process Request Client stub procedure Client program Server process Reply Server stub Communication module Communication module procedure dispatcher Implemented over Request-Reply protocol 105 Service procedure Summary •Distributed object model – capabilities for handling remote objects (remote references,etc) – RMI:maybe,at-least-once,at-most-once semantics – RMI implementation,software architecture • Other distributed programming paradigms – RPC,restricted form of RMI, less often used – event notification (for heterogeneous,asynchronous systems) 106