Distributed systems Overview 2010 1 Layers •Communication is logically on the application layer •Only that has to be considered •except for speed, reliability, security and cost •Error correction (and security) might (will) be on application layer, but is usually also on lower layers 2010 2 Client-Server and Peer-to-Peer Server always on; Client and Peers not P2P: Skype, BitTorrent; IM: partly, messages yes, setup+addr. not 2010 3 Problems •No throughput guaranties: •problems with bandwidth-sensitive application, like many multimedia applications •some may use adaptive coding techniques (reducing quality) to match available throughput •No timing (delay or jitter) guaranties •problems for real-time streaming multimedia •like telephony, multi-layer games, teleconferencing •no solution for this except special networks •non-real time streaming multimedia (like a movie replay) can buffer at the receiver •No security •this can be cured by SSL(secure socket layer) •also by network layer security methods 2010 4 Distributed systems A collection of independent computers that appears to its users (people or programs) as a single coherent system. 2010 5 Goals: sharing, transparent • making resources accessible – economics: printers, storage systems, supercomputers – information exchange: mail, audio, video – collaboration: groupware, videoconferencing, virtual organizations • distribution transparency 2010 6 Goals: openness • Openness – offer services according standard rules • describing syntax and semantics of services • computer networks: protocols • distributed systems: interfaces described in an IDL (Interface Definition Language) – to achieve interoperability and portability – extensible: add new components or replace existing • collection of relatively small component • separate policies and mechanism 2010 7 Goals: scalable • • • size: easily add more users and resources geographically: increasing distances administration: easy to manage if it increases limitations • decentralized algorithm: – no machine has complete information about the system state – each machines makes decisions based only on local information – failure of one machine does not ruin the algorithm – no implicit assumption a global clock exists 2010 8 Distributing Computing Systems Cluster Computing Systems: high performance computing 2010 9 Grid Computing Systems high degree of heterogeneity: resource from different organizations are brought together in a virtual organization. 2010 10 Distributed Information Systems Transaction Processing System •Atomic: To the outside world, the transaction happens indivisibly. •Consistent: The transaction does not violate system invariants. •Isolated: Concurrent transactions do not interfere with each other. •Durable: Once a transaction commits, the changes are permanent. 2010 11 Nested Transaction 2010 12 TP monitor 2010 13 Distributed Pervasive Systems • consisting of mobile and embedded computing devices – small, battery-powered, mobile, wireless connections 2010 14 Sensor Networks 2010 15 Important topics 1. architecture: software and system 2. processes: treads, virtual machines, client-server organization, code migration 3. communication: layered protocols, Remote Procedure Calls, Message Passing Interface 4. naming: names, identifiers, addresses 5. synchronization: (logical) clocks, mutual exclusion, election algorithms 6. consistency and replication 7. fault tolerance 8. security 2010 16 Architecture (1) • • • • layered object-based data-centered event-based 2010 17 Architecture (2) Processes communicate through a common (passive or active) repository. Events may carry data Publish/subscribe systems Loosely coupled processes 2010 18 Application layering • user-interface • processing • data 2010 Using an Internet search engine 19 Alternative client-server organization Thin - fat clients easier – difficult to manage application and database on different servers Vertical distribution: placing logically different components on different machines 2010 20 Peer-to-peer systems Horizontal distribution: client or server physically split up in equivalent parts, operating on its own share of the data set Distributed Hash Tables data items with key k mapped on node with id: smallest id >= k 2010 21 Collaborative Distributed Systems for a node to join often a client-server scheme is used an example is BitTorrent a Tracker keeps an account of active nodes (currently downloading some file) having (chunks of) the requested file the client node becomes than active, providing also (chunks of) files 2010 22 Processes and treads • a way to do more things at the same time • illusion that each one has it own virtual CPU • used in clients (e.g. browser to start downloading parts of a website at the same time) and servers 2010 23 Virtual Machines • not only virtualization of CPU but also of other resources • many different OS’s working concurrently on 1 machine • old technique from the 1960’s 2010 24 Process virtual machine same OS, different runtime-systems (with applications) 2010 25 Virtual machine manager Virtual multiple different OS’s concurrently on same hardware Applications Applications Guest OS (Windows NT) Guest OS (Windows 2000) Virtual Machine Virtual Machine Applications Guest OS (Windows 2003) Virtual Machine Physical Virtual Machine Manager 2010 Host Operating System System Hardware 26 Virtual private servers • bridge the gap between shared web hosting services and dedicated hosting services • also for workstations • examples VMware, VirtualPC 2010 27 Communication • send and receive over TCP streams using socket interface for networks • message passing, higher level of abstraction – representation of integers, floats, structures, etc – usable for shared memory communication and highspeed interconnect busses on parallel machines • RPC, Remote Procedure Call 2010 28 Naming • Names are used to refer to entities (anything that can be operated on) • The naming system may be itself be implemented in a distributed fashion. • We need to resolve a name to the entity it refers to. • How to organize a human friendly name system? E.g. files systems, World Wide Web • How to locate from a name the entity it refers to in a way that is independent of their current location. • How to resolve names by means of entity attributes? • Internets Domain Name System as an example 2010 29 Synchronization • Synchronization of distributed processes is more difficult than that of processes in uni/multi-processor systems. • using physical clocks on systems is not accurate enough, need for logical clocks • distributed global states • distributed mutual exclusion • the bully election algorithm 2010 30