CSCI652W1_SP14

advertisement
Distributed Systems
Introduction and background
Mohan Kumar
CSCI652.002 Spring 2014 B&K
1
Course information
http://www.cs.rit.edu/~hpb/Lectures/20135/652/index.html
CSCI652.002 Spring 2014 B&K
2
Requirements
• CSCI-352 Operating Systems or
equivalent and CSCI-603 Advanced C++
and Program Design or equivalent
CSCI652.002 Spring 2014 B&K
3
Course Content
• Issues and challenges in distributed
systems, including: communication,
distributed processes, naming and name
services, synchronization, consistency
and replication, transactions, fault
tolerance and recovery, security,
distributed objects, and distributed file
systems.
CSCI652.002 Spring 2014 B&K
4
Outcomes
• Build a solid foundation in distributed systems.
• Outcomes:
– Understand fundamental concepts of distributed
computing systems.
– Understand modern distributed systems – P2P,
mobile, pervasive, sensor etc.
– Recognize importance of addressing challenges in
modern systems to facilitate distributed computing.
– Develop distributed programs on real systems.
– More (you tell us at the end of semester)
CSCI652.002 Spring 2014 B&K
5
Attendance
• Class participation: ACTIVE
Participation will prepare students for
midterms. Students are expected to
interact actively during lectures. All
students are expected to solve
homework problems and engage in class
discussions.
CSCI652.002 Spring 2014 B&K
6
Course material
• Reference Books
• Slides by Coulouris et al.
– www.cdk4.net
• Power point slides and whiteboard notes
prepared by the professors
– Students are expected to read corresponding
chapters from textbook prior to each class (please
see tentative schedule).
– PPT slides prepared by the professors may or may
not be available before class. But they will be made
available after class.
• Reference books and articles
CSCI652.002 Spring 2014 B&K
7
Course organization
• The course will mainly have two main
themes.
• Distributed Algorithms–
• distributed processes/objects, interprocess
communication, remote procedure call, coordination,
file systems, clocks and global states, security,
concurrency, shared memory, transactions and
replication.
• Systems – Operating systems, Distributed file systems,
Name services, case studies, implementations,
P2P, Security,
– Plan 9 System
CSCI652.002 Spring 2014 B&K
8
Textbook and References
Textbook
Distributed Systems: Concepts and Design
George Coulouris, Jean Dollimore and Tim Kindberg
Addison Wesley, 4th Edition, 5th Edition
- e-version of 5th edition is available on Kindle
References
Distributed Systems: Principles and Paradigms
A.S. Tanenbaum and M. V. Steen, Pearson Publishers,2nd Edition.
Distributed Operating Systems & Algorithms, R Chow and
T. Johnson, Addison-Wesley, 1997.
– Related Articles – details will be provided during the
course
CSCI652.002 Spring 2014 B&K
9
Grading
• The structure of quizzes will be discussed in class, at least one
week prior to the quiz.
– Midterm 1: 15%
– Midterm 2: 15%
– Final Exam: 30%
• Group Work (project, presentation, report and class participation):
40%.
• Group Presentations: Will be scheduled during the last week of
semester.
• Group Work Reports: Due at 9 am May 10, 2014.
• Each Group will have 3 members; Groups to be formed before
February 15.
•
• Group Work: Problems will be assigned by February 25 and the
expected date of completion is May 10.
CSCI652.002 Spring 2014 B&K
10
What is a distributed system?
• Concurrent components
– Independent
– Use message passing to communicate and
coordinate
• Lack of global clock
– Asynchronous
• Independent failures of components
– Good for fault-tolerance
CSCI652.002 Spring 2014 B&K
11
“ A distributed system is a collection of independent
computers that appears to its users as a single coherent
system”
Tannenbaum and Van Steen, Distributed Systems, 2007.
• Application developers can focus on
developing applications rather than
system issues
• The distributed system should be
– Easy to expand or scale
– Available all the time
– Accessible uniformly
– Fault-tolerant
CSCI652.002 Spring 2014 B&K
12
Layered representation
Applications and services
Middleware
PLATFORM
Mask Heterogeneity
Provide abstraction, transparency
Uniformity
Operating System
Communications Network
Hardware
CSCI652.002 Spring 2014 B&K
13
Motivation
• Resource sharing
– CPU
– Disk
– Software services
– Databases
• Fault-tolerance
– Redundancy
– Replication
CSCI652.002 Spring 2014 B&K
14
Challenges
•
•
•
•
•
•
Heterogeneity
Transparency, openness
Security and privacy
Scalability
Failure handling
Concurrency of components
CSCI652.002 Spring 2014 B&K
15
Modern Distributed Systems
• Mobility
– Wireless communications
• WiFI, Bluetooth, Zigbee, LTE, WiMax, Cellular
• Ubiquity
– Small, but multifunctional devices
• Cell phones, sensors, RFIDs
• Large scale
– Components
– Data
– Users
CSCI652.002 Spring 2014 B&K
16
Enablers
• Computer Technology
– Advanced microprocessors
– Multi-core architectures
– Lower costs (CPU, memory, peripheral devices)
• High-speed networks
– Wired and wireless
• Applications
– Business
– Scientific
– Everything else ….
CSCI652.002 Spring 2014 B&K
17
Examples of Distributed Systems
• The Internet
• Intranets
• Mobile and
Ubiquitous systems
Grid Computers
• Pervasive Systems
• Sensor Systems
• P2P Networks





Airlines
Aircraft
Car
Building
University
CSCI652.002 Spring 2014 B&K
18
Recent Developments
Wireless ad hoc networking
• Novel algorithms and schemes developed
• Cooperation in the absence of infrastructure
Pervasive computing
• Context-aware services to users/applications
• Smart environments
Distributed resources
• Mobile devices possess myriad of resources
Opportunistic communications
• Exchange of packets/bundles
Social networks and computing
• Exploit gregarious nature of humans
CSCI652.002 Spring 2014 B&K
19
Fading Distinctions
Servers and clients
• Distributed systems, P2P systems
• Cost and time
Producers and consumers of information
• Users are producers of information as well
• User with a cell phone camera
Service providers and consumers
The
Challenge is
to provide a
uniform view
• Resources on user devices can be exploited
Resourceful and resource-poor entities
• Servers, desktops, laptops, mobile phones
• Grid computing
• Cyber foraging
CSCI652.002 Spring 2014 B&K
20
What is a distributed system?
• Concurrent components
– Independent
– Use message passing to communicate and
coordinate
• Lack of global clock
– Asynchronous
• Independent failures of components
– Good for fault-tolerance
CSCI652.002 Spring 2014 B&K
21
Concurrency
• Program execution
• Access to resources
• Message passing
• Coordination
• Resource sharing
Coordination of concurrently executing
programs
CSCI652.002 Spring 2014 B&K
22
No Global Clock
• Clocks of different components are not
synchronized
• Asynchronous
• Concurrent programs coordinate their
actions by passing messages
CSCI652.002 Spring 2014 B&K
23
Event ordering
• Lamport’s logical ordering
– X sends m1 before Y receives m1
– Y sends m2 before X receives m2
– Because we know replies are sent after
receiving messages
– That is m2 is a reply to m1
– Y receives m1 before sending m2
CSCI652.002 Spring 2014 B&K
24
Time services
• Global time consensus is needed to
– Coordinate distributed activities
• File backup
• Expiration time of a received message/data
– Event related activities
• When an event occurs or has already occurred
• How long did it take
• Which event occurred first
CSCI652.002 Spring 2014 B&K
25
Clocks
• Physical clock
– Approximation of real-time
• Logical clock
– Preserves ordering of events
CSCI652.002 Spring 2014 B&K
26
Independent Failures
• Distributed systems can fail in multiple
ways
– CPU/memory of one or more components
– Network link/s
– Programs might stop executing
• E.g., input/output, synchronization
– System components may get isolated
CSCI652.002 Spring 2014 B&K
27
Resource sharing
• Hierarchy
• Processors, Disks
– Shared data
– Shared webpages
• Search engine
• Weather channel
• Currency converter
CSCI652.002 Spring 2014 B&K
28
Services
• Manage resources
• Present functionalities of resources to users and
applications
– Coherent to applications/users
• Examples
– File service
– Mail service
– FTP service
• Client-server architectures
– Service may access resources remotely
– Clients connect to servers
• Utilize services
CSCI652.002 Spring 2014 B&K
29
Basic applications
• Remote login
– Keyboard and display interface
– Virtual terminal support
• telnet, rlogin
• File transfer
– File, file structures, file attributes
• E.g., FTP
• Messaging
– Send and receive
– Email, SMTP
• Browsing
– Information retrieval
• Remote execution
– Execute a program on a remote server
• E.g, MIME – multipurpose Internet mail extension
CSCI652.002 Spring 2014 B&K
30
System models
• Architectural models
– Client-server model
– Peer-to-peer model
• Functional models
– Interaction model
– Failure model
– Security model
CSCI652.002 Spring 2014 B&K
31
Architecture
• Structural organization of various
components
– Simple abstraction of components
– Two main objectives
• Placements
– Network topology
– Data distribution
• Interrelationships
– Patterns of communications
– Relationships between data objects
– Data access patterns, dependencies
CSCI652.002 Spring 2014 B&K
32
Peer-to Peer and
Client/server variations
•
Peer-to-peer
– No distinction among peers
– Excellent scalability compared to C-S
– Resources are utilized in a distributed network, and more efficiently.
•
•
Minimize bottleneck points
Variations
– Multiple servers
•
Each server specializes in a providing a particular service
–
E.g., web servers, DNS server, authentication etc.
– Proxy servers
•
•
Enhance availability
Reduce latency
– Caches
•
Objects cached to reduce latency
– Mobile code and mobile agents
•
Mobile code (e.g., applet) downloaded to client’s site
•
Mobile agents include code and data
–
–
Local interactions, fast response as there are no communication delays
Go around execute on different processors
CSCI652.002 Spring 2014 B&K
33
Goals
• Efficiency
– Propagation delays, communications
– Overlapped computation/communication
– Efficient distributed processing and load sharing
• Flexibility
– User friendly
– Ability to evolve and migrate
• Modularity, scalability, portability, and interoperability
• Consistency
– Predictability and uniformity in system behavior
– Integrity in concurrency control, failure handling and failure handling
• Robustness
– Ability to handle exceptional situations and errors
• Change in topology, lost message, crashed system etc.
– Reliability, protection and access control
• Secure and privacy preserving
CSCI652.002 Spring 2014 B&K
34
Design requirements
• Performance
– Responsiveness
• Access to shared resources
–
–
–
–
–
–
–
Communication delays
Server loads, scheduling, wait periods
Control switching
Load balancing
Combined computation/communication scheduling
Scalability
Fault-tolerance
CSCI652.002 Spring 2014 B&K
35
Transparency
• Ability to hide/mask
all system details from
users/application
developers
– System details are
irrelevant to
users/developers
– System details are very
relevant to system
managers
• Creation of an illusion
of a model that it is
supposed to be
Applications and services
Mask Heterogeneity
Provide abstraction,
Middleware transparency
Uniformity
PLATF
ORM
Operating System
Communications Network
Hardware
This is in contrast to the meaning of transparency in English – open,
visible, see through etc.
CSCI652.002 Spring 2014 B&K
36
Basic Processes
• Server
– Accepts inputs from other processes
– Performs a service
– Returns outcomes
• Client
– User/application level
– Makes requests, receives results
The roles of server and client may change with
time
• Peer
– All are equal
CSCI652.002 Spring 2014 B&K
37
Processes
• A process is a program in
execution
– Sequential
• A single control block regulates
the execution
– A control block contains state
information – program
counters, register contents,
stack pointers,
communication ports, file
descriptors etc.
– Process control block (PCB)
PCB
PCB
PCB
Process
Process
Process
– Concurrent
• Simultaneously interacting
sequential processes are said to
be concurrent
• Asynchronous
• Separate address space and
PCBs
• Components may interact
through
communication/synchronization
CSCI652.002 Spring 2014 B&K
38
Thread
PCB
TCB| TCB
Thread
Thread
PCB
TCB| TCB| TCB
Thread
PCB
Thread
A lightweight process
– Threads of a process share the
same address space, but have their
own registers
– A thread control block (or TCB) is
local to a thread
– Typically,
• Threads have their own PC, SP
and register set.
• Threads share address space,
communication ports and file
descriptors
– Multiple threads are spawned by a
process
– A PCB is shared among interacting
threads
– Context switching among threads is
lightweight compared to context
switching among processes
Thread
•
Threads
Thread run-time library
support
Operating System Support
CSCI652.002 Spring 2014 B&K
39
Interaction model
• Process interactions
– C-S, P2P, message passing, shared space, synchronous, asynchronous
– Single process/thread, multiple threads
• Distributed algorithms
– Behavior of multiple processes
– Includes message transmissions
– Each process
• Has own its PCB and is inaccessible by other processes
• Likely to be executing on different systems in the network
• Difficult to coordinate
• Two significant factors
– Communication performance
– Maintenance of global state
• Computer clocks drift
• Clock drifts differ from one another
Functional models
Interaction model
Failure model
Security model
CSCI652.002 Spring 2014 B&K
40
Performance of communication
channels
• Latency
– Time taken for message to arrive at the destination
– Delay in accessing the network
– Delay (processing times at) due to OS
communication services at both ends
• Bandwidth
– Frequency
– Interference
– Channel sharing
• Jitter
– Variation in times taken to deliver different
components of a message
CSCI652.002 Spring 2014 B&K
41
Two variants
• Synchronous
–
–
–
–
Process execution time is bounded
Message latency over a channel is bounded
Process’ local clock drift is bounded
Though difficult to build, very useful as a model
• Time outs
• Detect failures
• Asynchronous
– Blue bullets (Assumptions) above are NOT true
– Most systems are asynchronous
CSCI652.002 Spring 2014 B&K
42
Failure model
• Omission failures
– Processor/process crash
– Communication failure/message drops
• Arbitrary failures
– Process setting wrong values in data
– Data corruption during transmission
• Timing failures
– Synchronous systems
– Real-time systems
– Clock, process, channel
• Masking failures
– Replication
– Service to mask failures
Functional models
Interaction model
Failure model
Security model
CSCI652.002 Spring 2014 B&K
43
Security model
• Protecting objects
– Who is allowed to access what data
• Check access rights, verify identity
• Securing process and interactions
– Processes
• Server, client, peer
– Communication channel
• Copy/alter messages; inject harmful messages
• Encryption, authentication, time stamping
• Denial of service
• Mobile code, mobile agents
CSCI652.002 Spring 2014 B&K
Functional models
Interaction model
Failure model
Security model
44
Event ordering
• Lamport’s logical ordering
– X sends m1 before Y receives m1
– Y sends m2 before X receives m2
– Because we know replies are sent after
receiving messages
– That is m2 is a reply to m1
– Y receives m1 before sending m2
CSCI652.002 Spring 2014 B&K
45
Time services
• Global time consensus is needed to
– Coordinate distributed activities
• File backup
• Expiration time of a received message/data
– Event related activities
• When an event occurs or has already occurred
• How long did it take
• Which event occurred first
CSCI652.002 Spring 2014 B&K
46
Clocks
• Physical clock
– Approximation of real-time
• Logical clock
– Preserves ordering of events
CSCI652.002 Spring 2014 B&K
47
Network Background
Slides from Kurose and Ross’s
book will be used
Please read the book
CSCI652.002 Spring 2014 B&K
48
Networking review
• Please read up chapter 4 or a networking
book
• I will cover only mobile and wireless
networking
CSCI652.002 Spring 2014 B&K
49
Mobile IP
CSCI652.002 Spring 2014 B&K
50
Mobile IP
• Triangle routing,
indirect routing
• Direct Routing
– Home agents
– Foreign agents
– Registrations
• HA
• FA
• Anchor FA
–
–
–
–
Care of address
Encapsulation
Agent discovery
Registration
• TCP – Transmission
Control Protocol
• IP – Internet protocol
• BS – Base Station
• MH – Mobile host
• CH – Correspondent
host
• HA – Home agent
• FA- Foreign agent
CSCI652.002 Spring 2014 B&K
51
TCP
• Transport layer protocol
• Reliable, uses ACKs
• Congestion control
– Adjusts to network conditions
• Error control
– Packets buffered until ACKs received
– Buffered packets resent
CSCI652.002 Spring 2014 B&K
52
Desired (in Mobile systems)
• No disruption of services as the user
moves
– Changes point of attachment
• How to ensure?
– Autonomous transfer
– Minimal delays and losses
CSCI652.002 Spring 2014 B&K
53
Effects of Mobility
• IP and Mobile IP
– IP
• Packets are routed to their destinations according to IP
addresses.
• IP addresses are associated with a fixed network
location.
– Mobile IP
• Packets may be destined to mobile nodes
• Seamless roaming to applications and users.
• Shield mobility effects from
– applications
– higher level protocols
TCP/IP was designed for wired networks; But it has survived
in the wireless world; well, till now at least!!
CSCI652.002 Spring 2014 B&K
54
Effects of Mobility
• TCP congestion control mechanism
– Acks not received
• Slow start or other control mechanisms
• Window size is reduced
– Slow start
TCP congestion control mechanism in mobile environments
When a MH hands-off from one network , it does not receive
packets until it registers at another network. In the meanwhile
TCP mechanism at the sender assumes the packets have been
lost and goes into congestion recovery mode. Congestion
window size is reduced and/or packets are retransmitted.
Overall effect –performance deterioration.
CSCI652.002 Spring 2014 B&K
55
Encapsulation/Tunneling
• Messages originating at the CH have
Original Address
The home or original address of the MH
• The HA encapsulates the
New
Original Address
message with the address of the
Address
FA in the foreign network and forwards the packet to
the foreign network
• The FA peels off the ‘new address’ and forwards the
original packet to the MH in the foreign network.
• This process of appending and peeling off care off
addresses is called tunneling or encapsulation.
CSCI652.002 Spring 2014 B&K
56
Split Connections
• Split at BS
• Selective ACK of out of sequence packets
Core
network
• Mobile TCP
CH
BS
BS
MH
The TCP/IP connection is split at the BS. The BS ACKs packets,
buffers them and forwards to the MH.
CSCI652.002 Spring 2014 B&K
57
Supervising host
• One host as a controller in the core
network
– Keep track of CHs and MHs
Supervising
Host
MH
MH
MH
MH
The supervising host (SH)resides in the wired network and keeps
track of all the MHs. The supervising host is contacted for all
correspondence related to MHs. The SH maintains a directory of MH
locations. One can envision a set of distributed SHs catering to
groups of MHs.
CSCI652.002 Spring 2014 B&K
58
Snoop protocol
• Lower layer solution
– Processing between TCP and IP at the
BS
– Packets are snooped (processed)
– Snoop module reads packet
addresses to determine which
packets have not been ACKed.
– Facilitates retransmission at the BS
• Requires packets to be buffered at the
BS
– Multicast solution
• MH uses a multicast address as the careof-address
• All BSs the MH has been (And will be in)
contact with are invited to be members
of the multicast group
• The BS where he MH is residing
currently will forward the packets.
– Remaining BSs discard the packet
CSCI652.002 Spring 2014 B&K
TCP
Snoop
IP
BS
CH
BS
BS
MH
59
Download