RPC and Data Representation + Wireless Primer Theophilus Benson Assignment #3 • 3A: Reliability – No flow-control or congestion control – Just sliding window. – You can use the same window value for receiver and sender windows • 3B: TCP-like – Must implement @ least slow-start and congestion avoidance – You can add more to make your implementation more efficient or fair • Implementation details – You can ignore the demux function • Only gets tested in 3B and we wouldn’t be testing it. – When testing 3B, the propagation between any two host will be 100ms Today • RPC – Session management+Language – Semantic Challenges – Example RPC (Not on exam) • Wireless – Collision detection Why Should You Care? • RPC is used to build distributed systems • Used by … everyone – – – – – Google Facebook Twitter Google Yahoo RPC – Remote Procedure Call • Procedure calls are a well understood mechanism – Transfer control and data on a single computer • Idea: make distributed programming look the same – Have servers export interfaces that are accessible through local APIs – Perform the illusion behind the scenes • 2 Major Components – Protocol to manage messages sent between client and server – Language and compiler support • Packing, unpacking, calling function, returning value Problem • If you want a call between two servers to look like a call between two methods • What are some Key Problems? – Semantics of the communication • APIs, how to cope with failure – Data Representation • (think of parsing information in Assignment #2 & #3) – Scope: should the scheme work across • Architectures • Languages • Compilers…? RPC Session Management Stub Functions • Local stub functions at client and server give appearance of a local function call • client stub – marshalls parameters -> sends to server -> waits – unmarshalls results -> returns to client • server stub – creates socket/ports and accepts connections – receives message from client stub -> unmarshalls parameters -> calls server function – marshalls results -> sends results to client stub Stub Generation • Many systems generate stub code from independent specification: IDL – IDL – Interface Description Language • describes an interface in a language neutral way’ • Separates logical description of data from – Dispatching code – Marshalling/unmarshalling code – Data wire format RPC Components • Stub Compiler – Creates stub methods – Creates functions for marshalling and unmarshalling • Dispatcher – Demultiplexes programs running on a machine – Calls the stub server function • Protocol – At-most-once semantics (or not) – Reliability, replay caching, version matching – Fragmentation, Framing (depending on underlying protocols) RPC Language Support Presentation Formatting • How to represent data? • Several questions: – Which data types do you want to support? • Base types, Flat types, Complex types – How to encode data into the wire – How to decode the data? • Self-describing (tags) • Implicit description (the ends know) • Several answers: – Many frameworks do these things automatically Which data types? • Basic types – Integers, floating point, characters – Some issues: endianness (ntohs, htons), character encoding, IEEE 754 • Flat types – Strings, structures, arrays – Some issues: packing of structures, order, variable length • Complex types – Pointers! Must flatten, or serialize data structures Data Schema • How to parse the encoded data? • Two Extremes: – Self-describing data: tags • Additional information added to message to help in decoding • Examples: field name, type, length – Implicit: the code at both ends “knows” how to decode the message • E.g., your code for assignment 2 &3 know what format packets should be in • Interoperability depends on well defined protocol specification! • very difficult to change Examples of RPC Systems • SunRPC (now ONC RPC) – The first popular system – Used by NSF – Not popular for the wide area (security, convenience) • Java RMI – Popular with Java – Only works among JVMs • DCE – Used in ActiveX and DCOM, CORBA – Stronger semantics than SunRPC, much more complex …even more examples • XML-RPC, SOAP • Json-RPC • Apache Thrift RPC Semantic Challenges Can we maintain the same semantics? • Mostly… • Why not? – New failure modes: nodes, network • Possible outcomes of failure – – – – Procedure did not execute Procedure executed once Procedure executed multiple times Procedure partially executed • Desired: at-most-once semantics Implementing at-most-once semantics • Problem: request message lost – Client must retransmit requests when it gets no reply • Problem: reply message lost – Client may retransmit previously executed request – OK if operation is idempotent – Server must keep “replay cache” to reply to already executed requests • Problem: server takes too long executing – Client will retransmit request already in progress – Server must recognize duplicate – could reply “in progress” Server Crashes • Problem: server crashes and reply lost – Can make replay cache persistent – slow – Can hope reboot takes long enough for all clients to fail • Problem: server crashes during execution – Can log enough to restart partial execution – slow and hard – Can hope reboot takes long enough for all clients to fail • Can use “cookies” to inform clients of crashes – Server gives client cookie, which is f(time of boot) – Client includes cookie with RPC – After server crash, server will reject invalid cookie Examples of RPC Example: Sun XDR (RFC 4506) • External Data Representation for SunRPC • Types: most of C types • No tags (except for array lengths) – Code needs to know structure of message • Usage: – Create a program description file (.x) – Run rpcgen program – Include generated .h files, use stub functions • Very C/C++ oriented – Although encoders/decoders exist for other languages Example: fetch and add server • In fadd_prot.x: RPC Program Definition • Rpcgen generates marshalling/unmarshalling code, stub functions, you fill out the actual code XML • Other extreme • Markup language – – – – Text based, semi-human readable Heavily tagged (field names) Depends on external schema for parsing Hard to parse efficiently <person> <name>John Doe</name> <email>jdoe@example.com</email> </person> Google Protocol Buffers • Defined by Google, released to the public – Widely used internally and externally – Supports common types, service definitions – Natively generates C++/Java/Python code • Over 20 other supported by third parties – Efficient binary encoding, readable text encoding • Performance – 3 to 10 times smaller than XML – 20 to 100 times faster to process Protocol Buffers Example message Student { required String name = 1; required int32 credits = 2; } (…compile with proto) Student s; s.set_name(“Jane”); s.set_credits(20); fstream output(“students.txt” , ios:out | ios:binary ); s.SerializeToOstream(&output); (…somebody else reading the file) Student s; fstream input(“students.txt” , ios:in | ios:binary ); s.ParseFromIstream(); Binary Encoding • Integers: varints – 7 bits out of 8 to encode integers – Msb: more bits to come – Multi-byte integers: least significant group first • Signed integers: zig-zag encoding, then varint – 0:0, -1:1, 1:2, -2:3, 2:4, … – Advantage: smaller when encoded with varint • General: – Field number, field type (tag), value • Strings: – Varint length, unicode representation Apache Thrift • Originally developed by Facebook • Used heavily internally • Full RPC system – Support for C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and Ocaml • Many types – Base types, list, set, map, exceptions • Versioning support • Many encodings (protocols) supported – Efficient binary, json encodings Apache Avro • Yet another newcomer • Likely to be used for Hadoop data representation • Encoding: – Compact binary with schema included in file – Amortized self-descriptive • Why not just create a new encoding for Thrift? – I don’t know… Conclusions • RPC is good way to structure many distributed programs – Have to pay attention to different semantics, though! • Data: tradeoff between self-description, portability, and efficiency • Unless you really want to bit pack your protocol, and it won’t change much, use one of the IDLs • Parsing code is easy to get (slightly) wrong, hard to get fast – Should only do this once, for all protocols • Which one should you use? Wireless Wireless • Today: wireless networking truly ubiquitous – – – – 802.11, 3G, (4G), WiMAX, Bluetooth, RFID, … Sensor networks, Internet of Things Some new computers have no wired networking 4B cellphone subscribers vs. 1B computers • What’s behind the scenes? Wireless is different • Signals sent by the sender don’t always reach the receiver intact – Varies with space: attenuation, multipath – Varies with time: conditions change, interference, mobility • Distributed: sender doesn’t know what happens at receiver • Wireless medium is inherently shared – No easy way out with switches Implications • Different mechanisms needed • Physical layer – Different knobs: antennas, transmission power, encodings • Link Layer – Distributed medium access protocols – Topology awareness • Network, Transport Layers – Routing, forwarding • Most advances do not abstract away the physical and link layers Physical Layer • Specifies physical medium – Ethernet: Category 5 cable, 8 wires, twisted pair, R45 jack – WiFi wireless: 2.4GHz • Specifies the signal – 100BASE-TX: NRZI + MLT-3 encoding – 802.11b: binary and quadrature phase shift keying (BPSK/QPSK) • Specifies the bits – 100BASE-TX: 4B5B encoding – 802.11b @ 1-2Mbps: Barker code (1bit -> 11chips) What can happen to signals? • Attenuation – Signal power attenuates by ~r2 factor for omnidirectional antennas in free-space – Exponent depends on type and placement of antennas • < 2 for directional antennas • > 2 if antennas are close to the ground Interference • External sources – – – – – E.g., 2.4GHz unlicensed ISM band 802.11 802.15.4 (ZigBee), 802.15.1 (Bluetooth) 2.4GHz phones Microwave ovens • Internal sources – Nodes in the same network/protocol can (and do) interfere • Multipath – Self-interference (destructive) Multipath • May cause attenuation, destructive interference Picture from Cisco, Inc. Implications of Attenuation and Interference • Reduces the ratio of Signal to Noise – Makes it hard to decode bites – Increases bit error rates • Could make signal stronger (transmit with higher power) – Uses more energy – Increases interference to other nodes Link Layer • Medium Access Control – Should give 100% if one user – Should be efficient and fair if more users • Ethernet uses CSMA/CD – Can we use CD here ? • No! Collision happens at the receiver • Protocols try to avoid collision in the first place Collision Detection in Wireless Carrier Sensing • The initial collision detection of wireless – Listen on the channel (medium) • If someone is transmitting then you sleep and try again later • Else you start sending Hidden Terminals B • • • • • A C A can hear B and C B and C can’t hear each other They both interfere at A. COLLISION!!!!! B is a hidden terminal to C, and vice-versa Carrier sense at sender is useless Exposed Terminals B A C D • A transmits to B • C hears the transmission, backs off, even though D would hear C – C can still be sending to D. But It wouldn’t!!!!!!! • C is an exposed terminal to A’s transmission • Why is it still useful for C to do CS? Key points • No global view of collision – Different receivers hear different senders – Different senders reach different receivers • Collisions happen at the receiver • Goals of a MAC protocol – Detect if receiver can hear sender – Tell senders who might interfere with receiver to shut up Simple MAC: CSMA/CA • Maintain a waiting counter c • For each time channel is free, c-• Transmit when c = 0 • When a collision is inferred, retransmit with exponential backoff – Use lack of ACK from receiver to infer collision – Collisions are expensive: only full packet transmissions • How would we get ACKs if we didn’t do carrier sense? RTS/CTS • Idea: transmitter can check availability of channel at receiver • Before every transmission – – – – – Sender sends an RTS (Request-to-Send) Contains length of data (in time units) Receiver sends a CTS (Clear-to-Send) Sender sends data Receiver sends ACK after transmission • If you don’t hear a CTS, assume collision • If you hear a CTS for someone else, shut up Benefits of RTS/CTS • Solves hidden terminal problem • Also solves exposed terminal • Does it? – Control frames can still collide • If A & C send RTS as the same time – In practice: reduces hidden terminal problem on data packets RTS/CTS B RTS • B sends to A A C RTS/CTS B CTS A • B sends to A • A responds with CTS • C knows not to send.!!! C RTS/CTS B • • • • Data A B sends to A A responds with CTS C knows not to send.!!! Hidden Terminal solved! C Benefits of RTS/CTS • Solves hidden terminal problem • Also solves exposed terminal • Does it? – Control frames can still collide • If A & C send RTS as the same time – In practice: reduces hidden terminal problem on data packets Issues with RTS/CTS B RTS A RTS C • A sends to B • C hears RTS but not CTS D Issues with RTS/CTS CTS B CTS A C • A sends to B • C hears RTS but not CTS D Issues with RTS/CTS B A RTS C • A sends to B • C hears RTS but not CTS – So, C can send! • C sends to D • Exposed terminal solved!!! RTS D Benefits of RTS/CTS • Solves hidden terminal problem • Also solves exposed terminal • Does it? – Control frames can still collide • If A & C send RTS as the same time • Also, CTS can get lost!! – In practice: reduces hidden terminal problem on data packets RTS Loss B RTS A RTS C RTS Loss B RTS A RTS C Benefits of RTS/CTS • Solves hidden terminal problem • Also solves exposed terminal • Does it? – Control frames can still collide • If A & C send RTS as the same time • Also, CTS can get lost!! – In practice: reduces hidden terminal problem on data packets CTS Loss RTS B RTS A • T0: B sends RTS to A C D CTS Loss B RTS A RTS C RTS D RTS • T0: B sends RTS to A • T1: A responds with CTS WHILE D also sends RTS CTS Loss B CTS A CTS C RTS D RTS • T0: B sends RTS to A • T1: A responds with CTS WHILE D also sends RTS to E • T1: This CTS is loss – C doesn’t know about B->A Drawbacks of RTS/CTS • Overhead is too large for small packets – 3 packets per packet: RTS/CTS/Data (4-22% for 802.11b) • • • • RTS still goes through CSMA: can be lost CTS loss causes lengthy retries 33% of IP packets are TCP ACKs In practice, WiFi doesn’t use RTS/CTS Other MAC Strategies • Time Division Multiplexing (TDMA) – Central controller allocates a time slot for each sender – May be inefficient when not everyone sending • Frequency Division – Multiplexing two networks on same space – Nodes with two radios (think graph coloring) – Different frequency for upload and download ISM Band Channels Sometimes you can’t (or shouldn’t) hide that you are on wireless! • Three examples of relaxing the layering abstraction Examples of Breaking Abstractions • TCP over wireless – Packet losses have a strong impact on TCP performance – Snoop TCP: hide retransmissions from TCP endpoints – Distinguish congestion from wireless losses Summary • Wireless presents many challenges – Across all layers – Encoding/Modulation (we’re doing pretty well here)