Protocols SPL/2010 1 Application Level Protocol Design ● atomic units used by protocol: "messages" ● encoding ● reusable, protocol independent, TCP server, ● LinePrinting protocol implementation SPL/2010 2 Protocol Definition ● ● set of rules, governing the communication details between two parties (processes) different forms and levels; ● ● ● protocols for exchange bits across a wire protocols governing administration of super computers. application level protocols - define interaction between computer applications SPL/2010 3 Protocol Communication Rules ● ● ● syntax : how do we phrase the information we exchange. semantics : what actions/response for information received. synchronization : whose turn it is to speak (given the above defined semantics). SPL/2010 4 Protocols Skeleton ● ● ● all protocols follow a simple skeleton. exchange information using messages, which define the syntax. difference between protocols: syntax used for messages, and semantics of protocol. SPL/2010 5 Protocol Initialization (hand-shake) ● ● communication begins when party sends initiation message to other party. synchronization - each party sends one message in a round robin fashion. SPL/2010 6 TCP 3-Way Handshake ● ● ● Establish/ tear down TCP socket connections computers attempting to communicate can negotiate network TCP socket connection both ends can initiate and negotiate separate TCP socket connections at the same time SPL/2010 7 TCP 3-Way Handshake (SYN,SYN-ACK,ACK) SPL/2010 8 ● A sends a SYNchronize packet to B ● B receives A's SYN ● B sends a SYNchronize-ACKnowledgement ● A receives B's SYN-ACK ● A sends ACKnowledge ● B receives ACK. ● TCP socket connection is ESTABLISHED. SPL/2010 9 HTTP (Hyper Text Transfer Protocol) ● ● exchanging special text files over the network. brief (not complete) protocol description: ● ● ● synchronization: client initiates connection, sends single request, receive reply from server. syntax: text based, see rfc2616. semantics: server either sends to the client the page asked for, or returns an error. ● SPL/2010 10 What next? ● ● syntax and semantics aspects of protocols. assume: synchronization works in round robin, i.e., each party sends one message at a time. SPL/2010 11 Message Format ● Protocol syntax: message is the atomic unit of data exchanged throughout the protocol. ● message = letter ● concentrate on the delivery mechanism. SPL/2010 12 Framing ● streaming protocols - TCP ● separate between different messages ● ● ● all messages are sent on the same stream, one after the other, receiver should distinguish between different messages. Solution: message framing - taking the content of the message, and encapsulating it in a frame (letter - envelop). SPL/2010 13 Framing – what is it good for? ● ● ● sender and receiver agree on the framing method beforehand framing is part of message format/protocol enable receiver to discover in a stream of bytes where message starts/ends SPL/2010 14 Framing – how? ● Simple framing protocol for strings: ● ● ● ● special FRAMING character (e.g., a line break). each message is framed by two FRAMING characters at beginning and end. message will not contain a FRAMING character framing protocol by adding a special tag at start and end. ● ● message can be framed using <begin> / <end> strings. avoid having <begin> / <end> in message body. SPL/2010 15 Framing – how? ● framing protocol by employing a variable length message format ● ● special tag to mark start of a frame message contains information on message's length SPL/2010 16 17 SPL/2010 Textual data ● Many protocols exchange data in textual form ● strings of characters, in character encoding, (UTF-8) ● very easy to document/debug - print messages ● Limitation: difficult to send non-textual data. – SPL/2010 how do we send a picture? video? audio file? 18 Binary Data ● ● ● non-textual data is called binary data. all data is eventually encoded in "binary" format, as a sequence of bits "binary data" = data that cannot be encoded as a readable string of characters? SPL/2010 19 Binary Data ● Sending binary data in raw binary format in a stream protocol is dangerous. ● ● may contain any byte sequence, may corrupt framing protocol. Devising a variable length message format. SPL/2010 20 Base64 Encoding Binary Data encode binary data using encoding algorithm ● Base64 encoding - encodes binary data into a string ● ● Convert every 2 bytes sequence from the binary data into 3 ASCII characters. used by many "standard" protocols (email to encode file attachments of any type of data). SPL/2010 21 Encoding using Poco ● ● In C++, Poco library includes module for encoding/decoding byte arrays into/from Base64 encoded ASCII data. functionality is modeled as a stream "filter" ● ● performs encode/decode on all data flowing through the stream classes Base64Encoder / Base64Decoder. SPL/2010 22 Encoding in Java ● ● iharder library. modeled as stream filters (wrappers around Input/Output Java streams). SPL/2010 23 Encoding binary data ● ● ● advantage: any stream of bytes can be "framed" as ASCII data regardless of character encoding used by protocol. disadvantage - size of the message, increased by 50%. (we will use UTF-8 encoding scheme) SPL/2010 24 Protocol and Server Separation SPL/2010 25 Protocol and Server Separation code reuse is one of our design goals! ● ● generic implementation of server, which handles all communication details generic protocol interface: ● handles incoming messages ● implements protocol's semantics ● generates the reply messages. SPL/2010 26 Protocol-Server Separation: protocol object ● protocol object is in charge of implementing expected behavior of our server: ● ● What actions should be performed upon the arrival of a request. requests may be correlated one to another, meaning protocol should save an appropriate state per client. SPL/2010 27 Example: authenticated session ● ● ● protocols require user authentication (login), only authorized users can perform certain actions. protocol is statefull - serving requests of client can be in at least 2 distinct states: 1. authenticated (user has already logged in) 2. non-authenticated (user has not provided login). ● by state of the protocol object, behavior of protocol object is different SPL/2010 28 Protocol and Server Separation separate different tasks server must perform. ● Accept new connections from new clients. ● Receive new bytes from connected clients. ● Parse incoming bytes from clients into messages ("de-serialization" / "unframing"). ● Dispatch message to right method on server side to execute requested operation. ● Send back an answer to a connected client after an action has been executed. SPL/2010 29 a software architecture that separates tasks into separate interfaces SPL/2010 30 ● The key participants in this architecture are: ● ● Tokenizer - syntax, tokenizing a stream of data into messages. MessagingProtocol – semantics, handling received messages and generating responses. SPL/2010 31 ● implementations of interfaces: ● generic server ● MessageTokenizer ● LinePrinitingProtocol, SPL/2010 32 Interfaces ● 1. implement separation between protocol and server. Define: message (can be encoded in various ways: Base64, XML, text). ● 2. 3. Our messages encoded as plain UTF-8 text. framing of messages - delimiters between messages sent in stream. protocol interface which handles each individual message. SPL/2010 33 ConnectionHandler ● ● ● server accepted new connection from client. server creates ConnectionHandler - will handle all incoming messages from this client. ConnectionHandler - maintains state of connection for specific client ● Ex: user perform "login" - ConnectionHandler object remembers this in its state SPL/2010 34 ConnectionHandler - Socket ● ConnectionHandler has access to Socket connecting server to client process. ● ● TCP server - Socket connection is viewed as a pair of InputStream and OutputStream. streams of bytes – client and the server exchange a bunch of bytes. SPL/2010 35 Tokenizer - in charge of parsing a stream of bytes into a stream of messages ● ● Tokenizer interface: filter between Socket input stream and protocol Protocol accesses the input stream only through the tokenizer. ● ● instead of "seeing" a stream of bytes, it sees a stream of messages. Many libraries model such "filters" on streams as wrappers around a lower-level stream. ● ● OutputStreamWriter - wraps stream and performs encoding from one character encoding to another BufferedReader - adds a layer of buffering around a non-buffered input stream. SPL/2010 36 Tokenizer ● ● ● splits incoming bytes from the socket into messages. For simplicity, we model the Tokenizer as an iterator… protocol will see the input stream from the socket as an iterator over messages (instead of an iterator over bytes). SPL/2010 37 SPL/2010 38 Messaging Protocol ● protocol interface ● wraps together: socket and Tokenizer ● ● Pass incoming messages to MessagingProtocol - execute action requested by client. ● look at the message and decide on action ● decision may depend on the state Once the action is performed - answer back from the MessagingProtocol. SPL/2010 39 SPL/2010 40 ● ● We use a String to pass data from Tokenizer to Protocol, and back from Protocol. Serialization/Deserialization (encode/decode parameters to/from Strings) performed by Protocol - and not by the Tokenizer. ● Tokenizer is only in charge of deframing (split bytes into messages). SPL/2010 41 Implementations SPL/2010 42 Connection Handler ● active object: ● ● ● handles one connection to one client for the whole period during which the client is connected (from the moment the connection is accepted, until one of the sides decides to close it). modeled as a Runnable class. SPL/2010 43 Connection Handler ● holds references to: ● ● ● TCP socket connected to the client, Tokenizer an instance of the MessagingProtocol. SPL/2010 44 ● ● ● connection handler is generic, works with any implementation of a messaging protocol. assumes data exchanged between client and server is in form of encoded strings encoder passed to constructor as an Encoder interface. SPL/2010 45 SPL/2010 46 What’s left? ● ● only need to implement: ● specific framing handler (tokenizer) ● specific protocol we wish to use. continue our line printing example… SPL/2010 47 Message Tokenizer ● ● we use a framing method based on a single character delimiter. assume stream of messages, delimited by FRAMING = we will use the character '\0‘ SPL/2010 48 SPL/2010 49 ● ● important part is connection termination and exception handling at any moment most of the code in low-level input/output and socket manipulation relates to error handling and connection termination. SPL/2010 50 Line Printing Protocol ● ● ● implement a specific protocol on the server side. when receives a message, prints it on the server side screen and adds a line number. line number is the state of the protocol. ● ● each client has its own line number. Two clients connected at the same time will see each one its own version of the line number. when protocol processes a message, - sends back message to client: ": printed" + date-time value when the message was processed (on the server side). ● timestamp acknowledgments. SPL/2010 51 SPL/2010 52 A Client ● ● before ConnectionHandler, review code of compatible TCP client for protocol we have just described. no new idea - it is similar to the TCP client we have reviewed in the previous section. SPL/2010 53 SPL/2010 54 Concurrency Models of TCP Servers Server quality criteria: ● ● ● ● Scalability: capability to server a large number of concurrent clients. Low accept latency: acceptance wait time Low reply latency: reply wait time after message received. High efficiency: use little resources on the server (RAM, number of threads CPU usage). SPL/2010 55 ● ● model the concurrency model of the server, define interface which controls concurrency application of each connection handler SPL/2010 56 ● Given: ● Encoder ● Tokenizer ● Protocol ● ServerConcurrencyModel defined the MessagingServer SPL/2010 57 SPL/2010 58 ● ● ● To obtain good quality, a TCP server will most often use multiple threads. 3 simple models of concurrency servers 3 implementations of preparing the ServerConcurrencyModel interface SPL/2010 59 Server Model 1: Single Thread ● 1 thread for; ● ● accepting a new client dealing requests, by applying run method of the passive ConnectionHandler object. SPL/2010 60 SPL/2010 61 Single Thread Model: Quality ● ● ● ● no scalability: at any given moment, it can serve at most one client. high accept latency: a second client must wait until first client disconnects low reply latency: all resources are concentrated on serving one client. Good efficiency: server uses exactly the resources needed to serve one client ● SPL/2010 62 When is model appropriate? ● ● time to process a full connection from one client is guaranteed to remain small. Example: server provides date and time value on the server machine. ● sends one string to the client then disconnects. SPL/2010 63 Server Model 2: Thread per Client ● assigns a new thread, for each connected client, by invoking the 'start' method over the runnable ConnectionHandler object. SPL/2010 64 SPL/2010 65 Model Quality: Scalability ● server can serve several concurrent clients, up to max threads running in the process. ● ● ● RAM of the host is used - each thread allocates a stack and thus consumes RAM Approx. 500 - 1000 threads become active in a single process. process does not defend itself – keeps creating new threads - dangerous for the host. SPL/2010 66 Model Quality: Latency ● Low accept latency: time from one accept to the next ~ time to create a new thread – ● ● short compared to delay in incoming client connections. Reply latency: resources of the server are spread among concurrent connections. ● reasonable number of active connections (~hundreds), load requested relatively low in CPU and RAM, SPL/2010 67 Model Quality: Efficiency ● Low efficiency: server creates full thread per connection, – – ● connection may be bound to Input/Output operations. ConnectionHandler thread will be blocked waiting for IO, ,still use the resources of the thread (RAM and Thread). Reactor architecture … SPL/2010 68 Server Model 3: Constant Number of Threads ● ● constant number of 10 threads (given by the Executor interface of Java) adding runnable ConnectionHandler object to task queue of a thread pool executor SPL/2010 69 Model Quality ● ● ● ● avoids server causing host crash when too many clients connect at the same time up to N concurrent client connections server behaves as "thread-per-connection" above N, accept latency will grow scalability is limited to amount of concurrent connections we believe we can support. SPL/2010 70