COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015)
Models of Distributed Computing
Noah Mendelsohn
Tufts University
Email: noah@cs.tufts.edu
Web: http://www.cs.tufts.edu/~noah
Architecting a universal Web
Identification: URIs
Interaction: HTTP
Data formats: HTML, JPEG,
GIF, etc.
© 2010 Noah Mendelsohn
Goals
Introduce basics of distributed system design
Explore some traditional models of distributed computing
Prepare for discussion of REST: the Web’s model
3
© 2010 Noah Mendelsohn
Communicating systems
© 2010 Noah Mendelsohn
Communicating systems
CPU
Memory
Storage
CPU
Memory
Storage
We have multiple programs, running asynchronously, sending messages
Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)
© 2010 Noah Mendelsohn
Communicating Sequential Processes
We’ve got pretty clean higher
level abstractions for use on a
single machine
CPU
Memory
Storage
CPU
Memory
Storage
We have multiple programs, running asynchronously, sending messages
Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)
© 2010 Noah Mendelsohn
Communicating systems
How can we get a clean model of
two communicating machines?
CPU
Memory
Storage
CPU
Memory
Storage
We have multiple programs, running asynchronously, sending messages
Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)
© 2010 Noah Mendelsohn
Large scale systems
How can we get a clean model of
a worldwide network of
communicating machines?
Internet
What are the clean abstractions on this scale?
© 2010 Noah Mendelsohn
WARNING!!
This is a very big topic…
…many important approaches have been studied and used…
…there is lots of operational experience, and also formalisms…
This presentation does not attempt to be either comprehensive
or balanced…the goal is to introduce some key concepts
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
Message Passing
© 2010 Noah Mendelsohn
Message passing
CPU
Memory
Storage
CPU
Memory
Storage
Programs send messages to and from each others’ memories
© 2010 Noah Mendelsohn
Half duplex: one way at a time
CPU
Memory
Storage
CPU
Memory
Storage
Programs send messages to and from each others’ memories
© 2010 Noah Mendelsohn
Full duplex: both ways at the same time
CPU
Memory
Storage
CPU
Memory
Storage
Programs send messages to and from each others’ memories
© 2010 Noah Mendelsohn
Message passing
Data abstraction:
– Low level: bytes (octets)
– Sometimes: agreed metaformat (XML, C struct, etc.)
Synchronization
– Wait for message
– Timeout
© 2010 Noah Mendelsohn
Interaction Patterns
© 2010 Noah Mendelsohn
Between pairs of machines
CPU
Memory
Storage
CPU
Memory
Storage
Request
Response
Message passing: no constraints
Common pattern: request/response
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
Client Server
© 2010 Noah Mendelsohn
Client / server
CPU
Memory
Storage
CPU
Memory
Storage
Request service
Response
Request / response is a traffic pattern
Client / server describes the roles of the nodes
Server provides service for client
© 2010 Noah Mendelsohn
Client / server
Probably the most common dist. sys. architecture
Simple – well understood
Doesn’t explain:
– How to exploit more than 2 machines
– How to make programming easier
– How to prove correctness: though the simple model helps
Most client/server systems are request/response
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
N-Tier
© 2010 Noah Mendelsohn
N-tier – also called Multilevel Client/Server
CPU
Memory
Storage
CPU
Memory
Storage
Request
CPU
Memory
Storage
Request
Response
Response
Layered
Each tier provides services for next higher level
Reasons:
– Information hiding
– Management
– Scalability
© 2010 Noah Mendelsohn
Typical N-tier system: airline reservation
Reservation
Records
iPhone or Android
Reservation Application
Flight Reservation
Logic
Browser or Phone App
Application - logic
Application - logic
Many commercial applications work this way
© 2010 Noah Mendelsohn
The Web itself is a 2 or 3 Tier system
Web Server
Browser
Proxy Cache
(optional!)
E.g. Firefox
E.g. Squid
E.g. Apache
Many commercial applications work this way
© 2010 Noah Mendelsohn
Web Reservation System
Reservation
Records
Web-Base
Reservation Application
Flight Reservation
Logic
Proxy Cache
(optional!)
HTTP
Browser or Phone App
HTTP
E.g. Squid
RPC? ODBC? Proprietary?
Application - logic
Application - logic
Many commercial applications work this way
© 2010 Noah Mendelsohn
Web Publishing System
Content Management
System
Web-Base
Reservation Application
Content
Distribution
Network
Browser or Phone App
E.g. Akamia
Content Web Site
E.g. cnn.com
Database or CMS
Many commercial applications work this way
© 2010 Noah Mendelsohn
Advantages of n-tier system
Separation of concerns – each layer has own role
Parallism and performance?
– If done right: multiple mid-tier servers work in parallel
– Back end systems centralize mainly data requiring sharing & synchronization
– Mid tier can provide shared, scalable caching
Information hiding
– Mid-tier apps shielded from data layout
Security
– Credit card numbers etc. not stored at mid-tier
© 2010 Noah Mendelsohn
Other patterns
Spanning tree
Broadcast (send to many nodes at once)
Flood
Various P2P
Etc.
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
Remote Procedure Call
© 2010 Noah Mendelsohn
Remote Procedure Call
The term RPC was coined by the late Bruce Nelson in his
1981 CMU PhD thesis
Key idea: an ordinary function call executes remotely
The trick: the language runtime or helper code must
automatically generate code to send parameters and results
For languages like C: proxies and stubs are generated
– Not needed in dynamic languages like Ruby, JavaScript, etc.
RPC is often (erroneously IMO) used to describe any
request / response system
© 2010 Noah Mendelsohn
RPC: Call remote functions automatically
x = sqrt(4)
float
sqrt(float n) {
send n;
read s;
return s;
}
proxy
CPU
Memory
Storage
Request
float
sqrt(float n) {
…compute sqrt…
return result;
}
CPU
Memory
Storage
invoke sqrt(4)
result=2 (no exception thrown)
Response
void
doMsg(Msg m) {
s = sqrt(m.s);
send s;
}
stub
Interface definition: float sqrt(float n);
Proxies and stubs generated automatically
RPC provides transparent remote invocation
© 2010 Noah Mendelsohn
RPC: Pros and Cons
Pros:
– Transparency is very appealing
– Simple programming model
– Useful as organizing principle even when not fully automated
Cons
– Getting language details right is tricky (e.g. exceptions)
– No client/server overlap: doesn’t work well for long-running operations
– May not optimize large transfers well
– Not all APIs make sense to remote: e.g. answer = search(tree)
– Versioning can be a problem: client and server need to agree exactly on
interface (or have rules for dealing with differences)
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
Distributed Object Systems
© 2010 Noah Mendelsohn
How do you build an RPC for this?
Class
int
int
int
}
Point {
x,y
getx() {return x;}
gety() {return y;}
Class Rectangle {
…members and constructs not shown…
Point getUpperLeft() {…};
Point getLowerRight {…};
}
Call method on remoted object
int
area (Rectangle r) {
width=r.getLowerRight().getx() –
r.getUpperLeft.getx();
width=r.getLowerRight().gety() –
r.getUpperLeft.gety();
}
myRect = new Rectangle;
…assume position set here..
int a = area(myRect); // REMOTE THIS CALL!
Pass object to remote method
Distributed Object systems make this work!
© 2010 Noah Mendelsohn
Distributed object systems
In the 1990s, seemed like a great idea
Advantages of OO encapsulation & inheritance + RPC
Examples
– CORBA (Industry standard)
– DCOM (Microsoft)
Still quite widely used within enterprises
Complicated
–
–
–
–
–
Marshalling object references
Distributed object lifetime management
Brokering: which object provides the service today
Remote “new”: creating objects on remote systems
All the pros & cons of RPC, plus the above
Generally not appropriate at Internet scale
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
Some Other Options
© 2010 Noah Mendelsohn
Special Purpose Models
Remote File System
– Network provides transparent access to remote files
– Examples: NFS, CIFS
Remote Database
– Examples: ODBJ, JDBC
Remote Device
– Remote printing, disk drive etc.
Virtual terminal
– One computer simulates an interactive terminal to another
© 2010 Noah Mendelsohn
Some other interesting models
Broadcast / multicast
– Send messages to everyone (broadcast) / named group (multicast)
Publish / subscribe (pub/sub)
– Subscribe to named events or based on query filter
– Call me whenever Pepsi’s stock price changes
– Implements a distributed associative memory
Reliable queuing
–
–
–
–
Examples: IBM MQSeries, Java Message Service (JMS)
Model: queued messages, preserved across hardware crashes
Widely used for bank machine transactions; long-running (multi-day) eCommerce transactions;
Depends on disk-based transaction systems at each node to keep queues
Tuple spaces
– Pioneered by Gelernter at Yale (Linda kernel), picked up by Jini (Sun), and TSpaces (IBM)
– Network-scale shared variable space, with synchronization
– Good for queues of work to do: some cloud architectures use a related model to distribute work to
servers
© 2010 Noah Mendelsohn
Stateful and Stateless
Protocols
© 2010 Noah Mendelsohn
Stateful and Stateless Protocols
Stateful: server knows which step (state) has been reached
Stateless:
– Client remembers the state, sends to server each time
– Server processes each request independently
Can vary with level
– Many systems like Web run stateless protocols (e.g. HTTP) over
streams…at the packet level, TCP streams are stateful
– HTTP itself is mostly stateless, but many HTTP requests (typically POSTs)
update persistent state at the server
© 2010 Noah Mendelsohn
Advantages of stateless protocols
Protocol usually simpler
Server processes each request independently
Load balancing and restart easier
Typically easier to scale and make fault-tolerant
Visibility: individual requests more self-describing
© 2010 Noah Mendelsohn
Advantages of stateful protocols
Individual messages carry less data
Server does not have to re-establish context each time
There’s usually some changing state at the server at some
level, except for completely static publishing systems
© 2010 Noah Mendelsohn
Text vs. Binary Protocols
© 2010 Noah Mendelsohn
Protocols can be text or binary on the wire
Text: messages are encoded characters
Binary: any bit patterns
Pros and cons quite similar to those for text vs. binary file
formats
When sending between compatible machines, binary can be
much faster because no conversion needed
Most Internet-scale application protocols (HTTP, SMTP) use
text for protocol elements and for all content except
photo/audio/video
HTTP 2.0 moving to binary (for msg size and parsing speed)
© 2010 Noah Mendelsohn
Summary
© 2010 Noah Mendelsohn
Summary
The machine-level model is complex: multiple CPUs,
memories
A number of abstractions are widely used for limited-scale
distribution
RPC is among the most interesting and successful
Statefulness / statelessness is a key design tradeoff
We’ll see next time why a new model was needed for the Web
© 2010 Noah Mendelsohn