RPC-Present

advertisement
Remote Procedure Call
An Effective Primitive for Distributed
Computing
Seth James Nielson
What is RPC?


Procedure calls
transfer control
within local memory
RPC’s transfer
control to remote
machines
Unused
Proc B
Proc A
Main
Why RPC?
RPC is an effective primitive for distributed
systems because of 


Clean/Simple semantics
Communication efficiency
Generality
How it Works
(Idealized Example)
CLIENT
localCall()
…
…
SERVER
With specialized hardware
and encryption key
Request
c = encrypt(msg)
encrypt(msg)
Implementation
wait…
c = encrypt(msg)
localCall()
…
…
Response
Early History of RPC



1976: early reference in literature
1976-1984: few full implementations
Feb 1984: Cedar RPC
– A. Birrell, B. Nelson at Xerox
– “Implementing Remote Procedure Calls”
Imagine our Surprise…
“In practice, … several
areas [of RPC] were
inadequately
understood”
RPC Design Issues
1.
2.
3.
4.
5.
6.
Machine/communication failures
Address-containing arguments
Integration into existing systems
Binding
Suitable protocols
Data integrity/security
Birrell and Nelson Aims

Primary Aim
– Easy distributed computation

Secondary Aims
– Efficient (with powerful semantics)
– Secure
Fundamental Decisions
1.
2.
No shared address space among
computers
Semantics of remote procedure calls
should be as close as possible to
local procedure calls
Note that the first decision partially
violates the second…
Binding




Binds an importer to exporter
Interface name: type/instance
Uses Grapevine DB to locate appropriate
exporter
Bindings (based on unique ID) break if
exporter crashes and restarts
Unique ID




At binding, importer learns of exported
interface’s Unique ID (UID)
The UID is initialized by a real-time
clock on system start-up
If the system crashes and restarts, the
UID will be a new unique number
The change in UID breaks existing
connections
How Cedar RPC works
Caller Machine
User
Grapevine
User Stub RPCRun.
Callee Machine
RPCRun. Server Stub Server
record
import
return
x=F(y)
import getConnect
update
setConnect
update
addmember
export
export
return
lookup
bind(A,B)
lookup
transmit
Check 3
record
F=>3
3=>F
F(y)
Packet-Level Transport
Protocol





Primary goal: minimize time between
initiating the call and getting results
NOT general – designed for RPC
Why? possible 10X performance gain
No upper bound on waiting for results
Error Semantics: User does not know if
machine crashed or network failed
Developer
User Stub
RPCRuntime
Interface
Modules
Server
Code
RPCRuntime
Server Stub
Server
Program
Server Machine
Lupine
User
Code
Client
Program
Client Machine
Creating RPC-enabled
Software
Making it Faster



Simple Calls (common case): all of the
arguments fit in a single packet
A server reply and a 2nd RPC operates
as an implicit ACK
Explicit ACKs required if call lasts
longer or there is a longer interval
between calls
Simple Calls
Call
SERVER
CLIENT
Response/ACK
Call/ACK
Response/ACK
Complex Calls
Call (pkt 0)
CLIENT
SERVER
ACK pkt 0
Data (pkt 1)
ACK pkt 1
Data (pkt 2)
Response/ACK
ACK or New Call
Keeping it Light


A connection is just shared state
Reduce process creation/swapping
– Maintain idle server processes
– Each packet has a process identifier to
reduce swap
– Full scheme results in no processes
created/four process swaps per call

RPC directly on top of Ethernet
Elapsed Time
Performance
Number of Args/Results
Time
0
100
100 word array
1097µ
1278µ
2926µ
THE NEED
FOR SPEED


RPC performance cost is a barrier (Cedar
RPC requires .1 sec for a 0 arg call!)
Peregrine RPC (about nine years later)
manages a 0 arg call in .0573 seconds!
A Few Definitions

Hardware latency – Sum of

call/result network penalty
Network penalty – Time to
transmit (greater than…)

Network transmission time –


Raw Network Speed
Network RPC – RPC between
two machines
Local RPC – RPC between
separate threads
Peregrine RPC



Supports full functionality of RPC
Network RPC performance close to HW
latency
Also supports efficient local RPC
Messing with the Guts


Three General Optimizations
Three RPC-Specific Optimizations
General Optimization
1.
2.
3.
Transmitted arguments avoid copies
No conversion for client/server with
the same data representation
Use of packet header templates that
avoid recomputation per call
RPC Specific
Optimizations
1.
2.
3.
No thread-specific state is saved between
calls in the server
Server arguments are mapped (not copied)
No copying in the critical path of multipacket arguments
I think this is COOL

To avoid copying arguments from a
single-packet RPC, Peregrine arranges
instead to use the packet buffer itself
as the server thread’s stack

Any pointers are replaced with serverappropriate pointers (Cedar RPC didn’t
support this…)
This is cool too



Multi-packet RPC’s use blast protocol
(selective retransmission)
Data is transmitted in parallel with data
copy
Last packet is mapped into place
Fast Multi-Packet Receive
Data 0
Header0
Packet 0 buffer (sent last)
Is remapped at server
Data 3
Data 3
Data 2
Header3
Data 1
Data 2
Header2
Data 1
Header1
Data 0
Packets 1-3 data
are copied into buffer
at server
Header 0
Page
Boundary
Peregrine 0-Arg Performance
System
Cedar
Amoeba**
x-kernel
V-System
Firefly (5 CPU)
Sprite
Firefly (1 CPU)
SunRPC**
Latency
1097µsec
1100µsec
1730µsec
2540µsec
2660µsec
2800µsec
4800µsec
6700µsec
Throughput
2.0mbps
6.4mbps
7.1mbps
4.4mbps
4.6mbps
5.7mbps
2.5mbps
2.7mbps
Peregrine
573µsec
8.9mbps
Peregrine Multi-Packet
Performance
Procedure
(Bytes)
Network
Penalty
(ms)
Latency Through
put
(ms)
(mbps)
3000 byte in RPC
2.71
3.20
7.50
3000 byte in-out RPC
5.16
6.04
7.95
48000 byte in RPC
40.96
43.33
8.86
48000 byte in-out RPC
81.66
86.29
8.90
Cedar RPC Summary




Cedar RPC introduced practical RPC
Demonstrated easy semantics
Identified major design issues
Established RPC as effective primitive
Peregrine RPC Summary




Same RPC semantics (with addition of
pointers)
Significantly faster than Cedar RPC
and others
General optimizations (e.g., precomputed headers)
RPC-Specific (e.g., no copying in
multipacket critical path)
Observations



RPC is a very “transparent”
mechanism – it acts like a local call
However, RPC requires a deep
understanding of hardware to tune
In short, RPC requires sophistication in
its presentation as well as its operation
to be viable
Download