Slide 1

advertisement
Distributed Systems
CS 15-440
Distributed System Architecture
and
Introduction to Networking
Lecture 3, Sep 12, 2011
Majd F. Sakr, Vinay Kolar, Mohammad Hammoud
Today…
 Last Session:
 Trends and challenges in Distributed Systems
 Today’s session:
 Part I: Distributed System Architectures
 Part II: Introduction to Networking
 Announcements:
 Project 1 design report is due on Wednesday at midnight
 Project handout has been updated with a “design deliverable” section
Part I
Distributed Systems Architecture
A Distributed System
A distributed system is simply a collection of hardware or
software components that communicate to solve a
complex problem
Each component performs a “task”
Components
performing
tasks
Communication
mechanism
Bird’s eye view of some Distributed Systems
Peer 2
Google
Expedia
Server
Peer 1
Search Reservation
Search
Reservation
Client 1 1
Client 2 2
Client
Client
Search
Reservation
Client 33
Client
Google Search
Airline Booking
Peer 3
Peer 4
Bit-torrent
Skype
How would one classify these distributed systems?
Classification of Distributed Systems
What are the entities that are communicating in a DS?
a) Communicating entities
How do the entities communicate?
b) Communication paradigms
What roles and responsibilities do they have?
c) Roles and responsibilities
How are they mapped to the physical distributed infrastructure?
d) Placement of entities
Classification of Distributed Systems
What are the entities that are communicating in a DS?
a) Communicating entities
How do the entities communicate?
b) Communication paradigms
What roles and responsibilities do they have?
c) Roles and responsibilities
How are they mapped to the physical distributed infrastructure?
d) Placement of entities
Communicating Entities
What entities are communicating in a DS?
System-oriented entities
Processes
Threads
Nodes
Problem-oriented entities
Objects (in object-oriented programming based approaches)
Classification of Distributed Systems
What are the entities that are communicating in a DS?
a) Communicating entities
How do the entities communicate?
b) Communication paradigms
What roles and responsibilities do they have?
c) Roles and responsibilities
How are they mapped to the physical distributed infrastructure?
d) Placement of entities
Communication Paradigms
Three types of communication paradigms
Inter-Process Communication (IPC)
Remote Invocation
Indirect Communication
Applications, Services
Remote Invocation, Indirect Communication
IPC Primitives
Internet Protocols
Middleware
layers
Inter-Process Communication (IPC)
Relatively low-level support for communication
e.g., Direct access to internet protocols (Socket API)
Advantages
Enables seamless communication between processes on
heterogeneous operating systems
Well-known and tested API adopted across multiple operating systems
Disadvantages
Increased programming effort for application developers
Socket programming: Programmer has to explicitly write code for
communication (in addition to program logic)
Space Coupling (Identity is known in advance): Sender should know
receiver’s ID (e.g., IP Address, port)
Time Coupling: Receiver should be explicitly listening to the
communication from the sender
Remote Invocation
An entity runs a procedure that typically executes on an
another computer without the programmer explicitly
coding the details for this remote interaction
A middleware layer will take care of the raw-communication
Examples
Remote Procedure Call (RPC) – Sun’s RPC (ONC RPC)
Remote Method Invocation (RMI) – Java RMI
Remote Invocation
Advantages:
Programmer does not have to write code for socket communication
Disadvantages:
Space Coupling: Where the procedure resides should be
known in advance
Time Coupling: On the receiver, a process should be explicitly
waiting to accept requests for procedure calls
Space and Time Coupling in
RPC and RMI
The sender knows the Identity of the receiver (space coupling)
Time Coupling
Request Message
doOperation
.
.
(wait)
.
.
(continuation)
getRequest
.
Select operation
.
Execute operation
.
Send reply
Reply Message
Sender
Receiver
RMI strongly resembles RPC but in a world of distributed objects
Indirect Communication Paradigm
Indirect communication uses middleware to:
Provide one-to-many communication
Some mechanisms eliminate space and time coupling
Sender and receiver do not need to know each other’s identities
Sender and receiver need not be explicitly listening to communicate
Approach used: Indirection
Sender  A middle-man  Receiver
Types of indirect communication
1. Group communication
2. Publish-subscribe
3. Message queues
1. Group Communication
One-to-many communication
Multicast communication
Sender
Abstraction of a group
Group is represented in the
system by a groupId
Recipients join the group
A sender sends a message to
the group which is received by
all the recipients
Recv 2
Recv 1
Recv 3
1. Group Communication (cont’d)
Services provided by middleware
Group membership
Handling the failure of one or more group members
Advantages
Enables one-to-many communication
Efficient use of bandwidth
Identity of the group members need not be available at all nodes
Disadvantages
Time coupling
2. Publish-Subscribe
An event-based communication mechanism
Publishers publish events to an event service
Subscribers express interest in particular events
Subscribers
Publishers
Publish (Event2)
Publish-subscribe
Event Service
Subscribe
(Event3)
Large number of producers distribute information to large
number of consumers
2. Publish-Subscribe (cont’d)
Example: Financial trading
Dealer process
3. Message Queues
A refinement of Publish-Subscribe where
Producers deposit the messages in a queue
Messages are delivered to consumers through different methods
Queue takes care of ensuring message delivery
Advantages
Enables space decoupling
Enables time decoupling
Recap: Communication Entities
and Paradigms
Communicating entities
(what is communicating)
Systemoriented
•
•
•
Nodes
Processes
Threads
Communication Paradigms
(how they communicate)
Problemoriented
•
Objects
•
IPC
Remote
Invocation
Sockets
•
•
RPC
RMI
Indirect
Communication
•
•
•
Group communication
Publish-subscribe
Message queues
Classification of Distributed Systems
What are the entities that are communicating in a DS?
a) Communicating entities
How do the entities communicate?
b) Communication paradigms
What roles and responsibilities do they have?
c) Roles and responsibilities
How are they mapped to the physical distributed infrastructure?
d) Placement of entities
Roles and Responsibilities
In DS, communicating entities take on roles to perform tasks
Roles are fundamental in establishing overall architecture
Question: Does your smart-phone perform the same role as Google
Search Server?
We classify DS architectures into two types based on the
roles and responsibilities of the entities
Client-Server
Peer-to-Peer
Client-Server Architecture
Approach:
Server provides a service that is needed by a client
Client requests to a server (invocation), the server serves (result)
Widely used in many systems
e.g., DNS, Web-servers
Client
Server
Client
Client-Server Architecture: Pros and Cons
Advantages:
Simplicity and centralized control
Computation-heavy processing can be offloaded to a
powerful server, Clients can be “thin”
Disadvantages
Single-point of failure at server
Scalability
Peer to Peer (P2P) Architecture
In P2P, roles of all entities are identical
All nodes are peers
Peers are equally privileged participants in the
application
e.g.: Napster, Bit-torrent, Skype
Peer to Peer Architecture
Example: Downloading files from bit-torrent
Peer 1 wants a file. Parts of the
file is present at peers 3,4 and 5
Peer 2
Peer 3 wants a file stored at
peers 1,2 and 6
Peer 1
Peer 3
Peer 6
Peer 4
Peer 5
Architectural Patterns
Primitive architectural elements can be
combined to form various patterns
Tiered Architecture
Layering
Tiered architecture and layering are complementary
Layering = vertical organization of services
Tiered Architecture = horizontal splitting of services
Tiered Architecture
A technique to:
1. Organize the functionality of a service, and
2. Place the functionality into appropriate servers
Airline Search Application
Display UI
screen
Get user
Input
Client
Server
Get data
from
database
Server
Rank the
offers
Server
A Two-Tiered Architecture
Personal computer
or mobile devices
Server
User view,
controls and
data
manipulation
Application
and data
management
User view,
controls and
data
manipulation
Application
and data
management
A Three-Tiered Architecture
How do you design an airline search application:
EXPEDIA Airline Search Application
Display
user input
screen
Get user
Input
Display
result to
user
Rank the
offers
Tier 1
Tier 2
Organize functionality of a given layer
Airline
Database
Tier 3
A Three-Tiered Architecture
Personal computer
or mobile devices
User
view,
and
controls
Application
Server
Database
Server
Application
logic
Database
manager
User
view,
and
control
Application
logic
Layering
A complex system is partitioned into layers
Upper layer utilizes the services of the lower layer
A vertical organization of services
Layering simplifies design of complex distributed
systems by hiding the complexity of below layers
Control flows from layer to layer
Layer 3
Response
flow
Request
flow
Layer 1
Layering – Platform and middleware
Distributed Systems can be organized into three layers
1. Platform
Low-level hardware and software layers
Provides common services for higher layers
2. Middleware
Mask heterogeneity and provide convenient programming models to
application programmers
Typically, it simplifies application programming by abstracting
communication mechanisms
3. Applications
Applications
Operating system
Classification of Distributed Systems
What are the entities that are communicating in a DS?
a) Communicating entities
How do the entities communicate?
b) Communication paradigms
What roles and responsibilities do they have?
c) Roles and responsibilities
How are they mapped to the physical distributed infrastructure?
d) Placement of entities
Placement
Observation:
A large number of heterogonous hardware (machines, network).
Smart mapping of entities (processes, objects) to hardware helps
performance, security and fault-tolerance.
“Placement” maps entities to underlying physical
distributed infrastructure.
Placement should be decided after a careful study of application
characteristics
Example strategies:
Mapping services to multiple servers
Moving the mobile code to the client
Placement
Entities
Physical
infrastructure
Web search
indexing
Hi-performance
Server
Userinterface
Mobile
Code
Desktop
Smart-phone
Recap
So far, we have covered primitive architectural elements
Communicating entities
Communication paradigms of entities
IPC, RMI, RPC, Indirect Communication
Roles and responsibilities that entities assume, and
resulting architectures
Client-Server, Peer-to-Peer, Hybrid
Placement of entities
Part II
Introduction to Networking
Introduction to Networking –
Learning objectives
You will identify how computers over Internet
communicate.
After today’s class
You will be able to identify different types of networks
After the next class
Describe networking principles such as layering, encapsulation
and packet-switching
Examine how packets get routed and how congestion is avoided
Analyze scalability, reliability and fault-tolerance of Internet
Networks in Distributed Systems
Distributed System is simply a collection of components
that communicate to solve a problem
Why should distributed systems programmers know
about networks?
Networking issues severely affect performance, fault-tolerance
and security in Distributed Systems.
e.g., Gmail outage on Sep 1, 2010 – Google Spokesman said
“we had slightly underestimated the load which some recent
changes placed on the request routers. … . few of the request
routers became overloaded… causing a few more of them to
also become overloaded, and within minutes nearly all of the
request routers were overloaded.”
Networks in Distributed Systems
Networking Issue
Comments on Distributed System design
Performance
Affects latency and data-transfer-rate of messages.
Scalability
Size of Internet is increasing. Expect greater traffic in
future.
Reliability
Detect communication errors and perform errorchecks at the application layer
Security
Install firewalls. Deploy end-to-end authentication,
privacy and security modules.
Mobility
Expect intermittent connection for mobile devices.
Quality-of-service
Internet is best-effort. It is hard to ensure strict QoS
guarantees for, say, multimedia messages.
Network Classification
Important ways to classify networks
1. Based on size
Body Area Networks (BAN)
Personal Area Networks (PAN)
Local Area Networks (LAN)
Wide Area Networks (WAN)
2. Based on technology
Ethernet Networks
Wireless Networks
Cellular Networks
Network classification – BANs and PANs
Body Area Networks (BAN):
Devices form wearable computing units
Several Body Sensor Units (BSUs)
communicate with Body Central Unit (BCU)
Typically, low-cost and low-energy networking
Personal Area Networks (PAN):
PAN connects various digital devices carried by a user (mobile
phones, tablets, cameras)
Low-cost and low-energy networking
e.g., Bluetooth
Network Classification – LAN
Computers connected by single communication medium
e.g., twisted copper wire, optical fiber
High data-transfer-rate and low latency
LAN consists of
1. Segment
Usually within a department/floor of a building
Shared bandwidth, no routing necessary
2. Local Networks
Serves campus/office building
Many segments connected by a switch/hub
Typically, represents a network within an organization
Network classification – WAN
Generally covers a wider area (cities, countries,...)
Consists of networks of different organizations
Traffic is routed from one organization to another
Routers
Bandwidth and delay
Varies
Worse than a LAN
Largest WAN = Internet
Brief Summary of Important
Networks (Based on Size)
A Segment
A Network
Types of Networks – Based on
Technology
Ethernet Networks
Predominantly used in the wired Internet
Wireless LANs
Primarily designed to provide
wireless access to the Internet
Low-range (100s of m), High-bandwidth
Cellular networks (2G/3G)
Initially, designed to carry voice
Large range (few kms)
Low-bandwidth
Typical Performance for
Different Types of Networks
Network
Example
Range
Bandwidth
(Mbps)
Latency (ms)
Wired LAN
Ethernet
1-2 km
10 – 10,000
1 – 10
Wired WAN
Internet
Worldwide
0.5 – 600
100 – 500
Wireless PAN
Bluetooth
10 – 30 m
0.5 – 2
5 – 20
Wireless LAN
WiFi
0.15 – 1.5 km
11 – 108
5 – 20
Cellular
2G – GSM
100m – 20 km 0.270 – 1.5
5
Modern
Cellular
3G
1 – 5 km
100 – 500
348 – 14.4
Next Class
Describe networking principles such as layering,
encapsulation and packet-switching
Examine how packets get routed and how congestion is
avoided
Analyze scalability, reliability and fault-tolerance of
Internet
References
http://en.wikipedia.org/wiki/Remote_procedure_call
http://www.generalsoftwares.co.uk/remote-services.html
http://gmailblog.blogspot.com/2009/09/more-on-todays-gmail-issue.html
http://innovation4u.wordpress.com/2010/08/17/why-we-dont-share-stuff/
http://essentiawhipsfloggers.wordpress.com/2010/05/08/waiting-times-queuejumping/
http://www.cdk5.net/
Download