L20 - Data, RPC and Wireless

advertisement
RPC and Data Representation
+
Wireless Primer
Theophilus Benson
Assignment #3
• 3A: Reliability
– No flow-control or congestion control
– Just sliding window.
– You can use the same window value for receiver and sender
windows
• 3B: TCP-like
– Must implement @ least slow-start and congestion avoidance
– You can add more to make your implementation more efficient or fair
• Implementation details
– You can ignore the demux function
• Only gets tested in 3B and we wouldn’t be testing it.
– When testing 3B, the propagation between any two host will be
100ms
Today
• RPC
– Session management+Language
– Semantic Challenges
– Example RPC (Not on exam)
• Wireless
– Collision detection
Why Should You Care?
• RPC is used to build distributed systems
• Used by … everyone
–
–
–
–
–
Google
Facebook
Twitter
Google
Yahoo
RPC – Remote Procedure Call
• Procedure calls are a well understood mechanism
– Transfer control and data on a single computer
• Idea: make distributed programming look the same
– Have servers export interfaces that are accessible through local
APIs
– Perform the illusion behind the scenes
• 2 Major Components
– Protocol to manage messages sent between client and server
– Language and compiler support
• Packing, unpacking, calling function, returning value
Problem
• If you want a call between two servers to
look like a call between two methods
• What are some Key Problems?
– Semantics of the communication
• APIs, how to cope with failure
– Data Representation
• (think of parsing information in Assignment #2 & #3)
– Scope: should the scheme work across
• Architectures
• Languages
• Compilers…?
RPC Session Management
Stub Functions
• Local stub functions at client and server give
appearance of a local function call
• client stub
– marshalls parameters -> sends to server -> waits
– unmarshalls results -> returns to client
• server stub
– creates socket/ports and accepts connections
– receives message from client stub -> unmarshalls
parameters -> calls server function
– marshalls results -> sends results to client stub
Stub Generation
• Many systems generate stub code from
independent specification: IDL
– IDL – Interface Description Language
• describes an interface in a language neutral way’
• Separates logical description of data
from
– Dispatching code
– Marshalling/unmarshalling code
– Data wire format
RPC Components
• Stub Compiler
– Creates stub methods
– Creates functions for marshalling and unmarshalling
• Dispatcher
– Demultiplexes programs running on a machine
– Calls the stub server function
• Protocol
– At-most-once semantics (or not)
– Reliability, replay caching, version matching
– Fragmentation, Framing (depending on underlying
protocols)
RPC Language Support
Presentation Formatting
• How to represent data?
• Several questions:
– Which data types do you want to support?
• Base types, Flat types, Complex types
– How to encode data into the wire
– How to decode the data?
• Self-describing (tags)
• Implicit description (the ends know)
• Several answers:
– Many frameworks do these things automatically
Which data types?
• Basic types
– Integers, floating point, characters
– Some issues: endianness (ntohs, htons), character
encoding, IEEE 754
• Flat types
– Strings, structures, arrays
– Some issues: packing of structures, order, variable
length
• Complex types
– Pointers! Must flatten, or serialize data structures
Data Schema
• How to parse the encoded data?
• Two Extremes:
– Self-describing data: tags
• Additional information added to message to help in
decoding
• Examples: field name, type, length
– Implicit: the code at both ends “knows” how to
decode the message
• E.g., your code for assignment 2 &3 know what format
packets should be in
• Interoperability depends on well defined protocol
specification!
• very difficult to change
Examples of RPC Systems
• SunRPC (now ONC RPC)
– The first popular system
– Used by NSF
– Not popular for the wide area (security,
convenience)
• Java RMI
– Popular with Java
– Only works among JVMs
• DCE
– Used in ActiveX and DCOM, CORBA
– Stronger semantics than SunRPC, much more
complex
…even more examples
• XML-RPC, SOAP
• Json-RPC
• Apache Thrift
RPC Semantic Challenges
Can we maintain the same
semantics?
• Mostly…
• Why not?
– New failure modes: nodes, network
• Possible outcomes of failure
–
–
–
–
Procedure did not execute
Procedure executed once
Procedure executed multiple times
Procedure partially executed
• Desired: at-most-once semantics
Implementing at-most-once
semantics
• Problem: request message lost
– Client must retransmit requests when it gets no reply
• Problem: reply message lost
– Client may retransmit previously executed request
– OK if operation is idempotent
– Server must keep “replay cache” to reply to already
executed requests
• Problem: server takes too long executing
– Client will retransmit request already in progress
– Server must recognize duplicate – could reply “in progress”
Server Crashes
• Problem: server crashes and reply lost
– Can make replay cache persistent – slow
– Can hope reboot takes long enough for all clients to fail
• Problem: server crashes during execution
– Can log enough to restart partial execution – slow and
hard
– Can hope reboot takes long enough for all clients to fail
• Can use “cookies” to inform clients of crashes
– Server gives client cookie, which is f(time of boot)
– Client includes cookie with RPC
– After server crash, server will reject invalid cookie
Examples of RPC
Example: Sun XDR (RFC 4506)
• External Data Representation for SunRPC
• Types: most of C types
• No tags (except for array lengths)
– Code needs to know structure of message
• Usage:
– Create a program description file (.x)
– Run rpcgen program
– Include generated .h files, use stub functions
• Very C/C++ oriented
– Although encoders/decoders exist for other
languages
Example: fetch and add server
• In fadd_prot.x:
RPC Program Definition
• Rpcgen generates
marshalling/unmarshalling code, stub
functions, you fill out the actual code
XML
• Other extreme
• Markup language
–
–
–
–
Text based, semi-human readable
Heavily tagged (field names)
Depends on external schema for parsing
Hard to parse efficiently
<person>
<name>John Doe</name>
<email>jdoe@example.com</email>
</person>
Google Protocol Buffers
• Defined by Google, released to the public
– Widely used internally and externally
– Supports common types, service definitions
– Natively generates C++/Java/Python code
• Over 20 other supported by third parties
– Efficient binary encoding, readable text encoding
• Performance
– 3 to 10 times smaller than XML
– 20 to 100 times faster to process
Protocol Buffers Example
message Student {
required String name = 1;
required int32 credits = 2;
}
(…compile with proto)
Student s;
s.set_name(“Jane”);
s.set_credits(20);
fstream output(“students.txt” , ios:out | ios:binary );
s.SerializeToOstream(&output);
(…somebody else reading the file)
Student s;
fstream input(“students.txt” , ios:in | ios:binary );
s.ParseFromIstream();
Binary Encoding
• Integers: varints
– 7 bits out of 8 to encode integers
– Msb: more bits to come
– Multi-byte integers: least significant group first
• Signed integers: zig-zag encoding, then varint
– 0:0, -1:1, 1:2, -2:3, 2:4, …
– Advantage: smaller when encoded with varint
• General:
– Field number, field type (tag), value
• Strings:
– Varint length, unicode representation
Apache Thrift
• Originally developed by Facebook
• Used heavily internally
• Full RPC system
– Support for C++, Java, Python, PHP, Ruby, Erlang, Perl,
Haskell, C#, Cocoa, Smalltalk, and Ocaml
• Many types
– Base types, list, set, map, exceptions
• Versioning support
• Many encodings (protocols) supported
– Efficient binary, json encodings
Apache Avro
• Yet another newcomer
• Likely to be used for Hadoop data
representation
• Encoding:
– Compact binary with schema included in file
– Amortized self-descriptive
• Why not just create a new encoding for Thrift?
– I don’t know…
Conclusions
• RPC is good way to structure many
distributed programs
– Have to pay attention to different semantics, though!
• Data: tradeoff between self-description,
portability, and efficiency
• Unless you really want to bit pack your
protocol, and it won’t change much, use one
of the IDLs
• Parsing code is easy to get (slightly) wrong,
hard to get fast
– Should only do this once, for all protocols
• Which one should you use?
Wireless
Wireless
• Today: wireless networking truly
ubiquitous
–
–
–
–
802.11, 3G, (4G), WiMAX, Bluetooth, RFID, …
Sensor networks, Internet of Things
Some new computers have no wired networking
4B cellphone subscribers vs. 1B computers
• What’s behind the scenes?
Wireless is different
• Signals sent by the sender don’t always
reach the receiver intact
– Varies with space: attenuation, multipath
– Varies with time: conditions change, interference,
mobility
• Distributed: sender doesn’t know what
happens at receiver
• Wireless medium is inherently shared
– No easy way out with switches
Implications
• Different mechanisms needed
• Physical layer
– Different knobs: antennas, transmission power, encodings
• Link Layer
– Distributed medium access protocols
– Topology awareness
• Network, Transport Layers
– Routing, forwarding
• Most advances do not abstract away the
physical and link layers
Physical Layer
• Specifies physical medium
– Ethernet: Category 5 cable, 8 wires, twisted pair, R45 jack
– WiFi wireless: 2.4GHz
• Specifies the signal
– 100BASE-TX: NRZI + MLT-3 encoding
– 802.11b: binary and quadrature phase shift keying
(BPSK/QPSK)
• Specifies the bits
– 100BASE-TX: 4B5B encoding
– 802.11b @ 1-2Mbps: Barker code (1bit -> 11chips)
What can happen to signals?
• Attenuation
– Signal power attenuates by ~r2 factor for omnidirectional antennas in free-space
– Exponent depends on type and placement of antennas
• < 2 for directional antennas
• > 2 if antennas are close to the ground
Interference
• External sources
–
–
–
–
–
E.g., 2.4GHz unlicensed ISM band
802.11
802.15.4 (ZigBee), 802.15.1 (Bluetooth)
2.4GHz phones
Microwave ovens
• Internal sources
– Nodes in the same network/protocol can (and do) interfere
• Multipath
– Self-interference (destructive)
Multipath
• May cause attenuation, destructive
interference
Picture from Cisco, Inc.
Implications of Attenuation and
Interference
• Reduces the ratio of Signal to Noise
– Makes it hard to decode bites
– Increases bit error rates
• Could make signal stronger (transmit
with higher power)
– Uses more energy
– Increases interference to other nodes
Link Layer
• Medium Access Control
– Should give 100% if one user
– Should be efficient and fair if more users
• Ethernet uses CSMA/CD
– Can we use CD here ?
• No! Collision happens at the receiver
• Protocols try to avoid collision in the first
place
Collision Detection in Wireless
Carrier Sensing
• The initial collision detection of wireless
– Listen on the channel (medium)
• If someone is transmitting then you sleep and try again
later
• Else you start sending
Hidden Terminals
B
•
•
•
•
•
A
C
A can hear B and C
B and C can’t hear each other
They both interfere at A. COLLISION!!!!!
B is a hidden terminal to C, and vice-versa
Carrier sense at sender is useless
Exposed Terminals
B
A
C
D
• A transmits to B
• C hears the transmission, backs off, even though D
would hear C
– C can still be sending to D. But It wouldn’t!!!!!!!
• C is an exposed terminal to A’s transmission
• Why is it still useful for C to do CS?
Key points
• No global view of collision
– Different receivers hear different senders
– Different senders reach different receivers
• Collisions happen at the receiver
• Goals of a MAC protocol
– Detect if receiver can hear sender
– Tell senders who might interfere with receiver to shut up
Simple MAC: CSMA/CA
• Maintain a waiting counter c
• For each time channel is free, c-• Transmit when c = 0
• When a collision is inferred, retransmit with exponential
backoff
– Use lack of ACK from receiver to infer collision
– Collisions are expensive: only full packet transmissions
• How would we get ACKs if we didn’t do carrier sense?
RTS/CTS
• Idea: transmitter can check availability of
channel at receiver
• Before every transmission
–
–
–
–
–
Sender sends an RTS (Request-to-Send)
Contains length of data (in time units)
Receiver sends a CTS (Clear-to-Send)
Sender sends data
Receiver sends ACK after transmission
• If you don’t hear a CTS, assume collision
• If you hear a CTS for someone else, shut
up
Benefits of RTS/CTS
• Solves hidden terminal problem
• Also solves exposed terminal
• Does it?
– Control frames can still collide
• If A & C send RTS as the same time
– In practice: reduces hidden terminal problem on
data packets
RTS/CTS
B
RTS
• B sends to A
A
C
RTS/CTS
B
CTS
A
• B sends to A
• A responds with CTS
• C knows not to send.!!!
C
RTS/CTS
B
•
•
•
•
Data
A
B sends to A
A responds with CTS
C knows not to send.!!!
Hidden Terminal solved!
C
Benefits of RTS/CTS
• Solves hidden terminal problem
• Also solves exposed terminal
• Does it?
– Control frames can still collide
• If A & C send RTS as the same time
– In practice: reduces hidden terminal problem on
data packets
Issues with RTS/CTS
B
RTS
A
RTS
C
• A sends to B
• C hears RTS but not CTS
D
Issues with RTS/CTS
CTS
B
CTS
A
C
• A sends to B
• C hears RTS but not CTS
D
Issues with RTS/CTS
B
A
RTS
C
• A sends to B
• C hears RTS but not CTS
– So, C can send!
• C sends to D
• Exposed terminal solved!!!
RTS
D
Benefits of RTS/CTS
• Solves hidden terminal problem
• Also solves exposed terminal
• Does it?
– Control frames can still collide
• If A & C send RTS as the same time
• Also, CTS can get lost!!
– In practice: reduces hidden terminal problem on
data packets
RTS Loss
B
RTS
A
RTS
C
RTS Loss
B
RTS
A
RTS
C
Benefits of RTS/CTS
• Solves hidden terminal problem
• Also solves exposed terminal
• Does it?
– Control frames can still collide
• If A & C send RTS as the same time
• Also, CTS can get lost!!
– In practice: reduces hidden terminal problem on
data packets
CTS Loss
RTS
B
RTS
A
• T0: B sends RTS to A
C
D
CTS Loss
B
RTS
A
RTS
C
RTS
D
RTS
• T0: B sends RTS to A
• T1: A responds with CTS WHILE D also
sends RTS
CTS Loss
B
CTS
A
CTS
C
RTS
D
RTS
• T0: B sends RTS to A
• T1: A responds with CTS WHILE D also
sends RTS to E
• T1: This CTS is loss
– C doesn’t know about B->A
Drawbacks of RTS/CTS
• Overhead is too large for small packets
– 3 packets per packet: RTS/CTS/Data (4-22% for
802.11b)
•
•
•
•
RTS still goes through CSMA: can be lost
CTS loss causes lengthy retries
33% of IP packets are TCP ACKs
In practice, WiFi doesn’t use RTS/CTS
Other MAC Strategies
• Time Division Multiplexing (TDMA)
– Central controller allocates a time slot for each
sender
– May be inefficient when not everyone sending
• Frequency Division
– Multiplexing two networks on same space
– Nodes with two radios (think graph coloring)
– Different frequency for upload and download
ISM Band Channels
Sometimes you can’t (or shouldn’t)
hide that you are on wireless!
• Three examples of relaxing the layering
abstraction
Examples of Breaking Abstractions
• TCP over wireless
– Packet losses have a strong impact on TCP
performance
– Snoop TCP: hide retransmissions from TCP endpoints
– Distinguish congestion from wireless losses
Summary
• Wireless presents many challenges
– Across all layers
– Encoding/Modulation (we’re doing pretty well here)
Download