What are yRFCs? yRFCs are discussion documents on one or more issues related to the design, development or implementation of the Y-Comm architecture. Y-Comm is a new architecture being developed to support heterogeneous networking. yRFCs therefore represent the views of the authors of the document. They are non-binding and do not oblige anyone to agree with or to implement any concepts or details expressed therein. They can also be modified without notice. Finally, yRFCs are public documents and should not in whole or in part be the basis of a patent or copyright claim. Please contact the authors directly to discuss relevant issues. yRFC2: The Simple Protocol (SP) Specification Authors: Andriy Padiy (ap1122@live.mdx.ac.uk), Leroy Riley (lr347@live.mdx.ac.uk) and Glenford Mapp (g.mapp@mdx.ac.uk) This document was released to the Y-Comm Website Team on 12th January 2012 Update: 14th May 2012 Updates: a) Replaced the ECN field by the SCOPE field b) The mess_mtu size refers to the maximum message size, not the maximum packet size. 1.0 Introduction: This yRFC discusses the specification of a transport protocol for local area networking called the Simple Protocol. This protocol was designed to optimize transport in the local area. The motivation for doing this is based around the concept that it is necessary to separate the need for Local Area Networking which may be defined by different local conditions such as heterogeneous wireless networking or high speed communications from Internetworking which is more based on Wide Area Networking. The strategy of tuning TCP to adapt to these conditions has met with mixed results. So the plan is to develop a simple protocol which can be used to optimize local interactions. The Simple Protocol, which we call SP, is being used to provide this functionality. SP is a simple message-based system compared to TCP which is stream-based. 1|Page 1.1 Background Communications in Local-Area Network (LAN) and Wide-Area Network (WLAN) environments are beginning to take divergent paths. This has been motivated by several factors. The first is that local networks speeds are still increasing; 1 Gbps is common in the Local Area with 10 Gbps becoming available in a few years. In addition, the rise of wireless also means that a lot of peripheral networks will be wireless networks. This indicates that the transportation characteristics of end networks are will be dominated by characteristics of wireless communications which are completely different from wired systems. Hence, transport protocols such as TCP which were developed to support wired communications are not able to perform in an optimal way in wireless environments. Adapting TCP has had mixed results, because it is difficult to really tune the protocol for these diverse LAN conditions. The authors therefore believe that the argument that one transport protocol should be used for both global and local environments has been severely weakened. This paper looks at the development of a transport protocol specially designed for the local area. The authors believe that TCP should be used as a WAN protocol while a local protocol is used for local communications. 2.0 Requirements for a LAN Transport A transport protocol for LAN communications needs to have certain properties to optimize its performance which differs from WAN transport protocols such as TCP. Larger Window Sizes In order to make use of high-speed LANs, LAN protocols should use a much larger window size compared with WAN protocols. Since the LAN is fast, a bigger window size can be used by default. Support for Message-Based Communications TCP is based on stream-like communication. There are no message boundaries. However, most communication in LAN environments tends to be message or transactional based. So using a message-based approach is better for LAN protocols. Ease of Packet Processing: Keeping it simple. There is a strong case to use this design to try to simplify protocol processing. Thus the idea would be have a small number of connection states as well as defined packet types. Thus the packet type is used as key parameter to drive the main loop. Keep it flexible. One of the key issues is that since there are a lot of diverse applications and it is necessary for the protocol to give different qualities of service to different applications. This means that various mechanisms such as check-summing and error correction need to be set independently to yield different qualities of service. 2|Page 3.0 Protocol Specification Figure 1 shows the Diagram of the Simple Protocol while Figure 2 shows the length of the individual fields. Figure 1: Diagram of the Simple Protocol Figure 2: Showing the length of the Fields 3|Page The individual fields are detailed below: The DEST_ID is a connection identifier on the remote machine. The SRC_ID identifies the connection on the local machine. So the connection is independently identified by [DEST_ID, IPaddress (DEST_ID)] or [SRC_ID, IPaddress(SRC_ID)]. Note that a value of zero is not regarded as a valid connection identifier. Packet type is the type of packet being sent or received. SP supports a number of them: START: the first packet transmitted to set up a connection REJ: the connection has been rejected. CNTL: this is a control packet and will be sent reliably DATA: this is a data packet ACK: this is an Acknowledgement packet NACK: this is a NACK packet ECHO: this is an echo packet END: this is an end packet which is used to close a connection. PRI – 2 bits are used; hence SP supports 4 levels of priority. SP guarantees that a higher priority packet will always be delivered before a lower-priority packet. SC – 2bits are used to support the idea of scope. The concept of scope is the idea that each server should have a scope of operation which defines the region in which it operates. With this concept, only machines that are within the scope of operation of the server are allowed to access the server. There are 4 scopes which are represented by two bits: 00: This means that the server can only be accessed by processes on the same machine. 01: This means that the server is only accessible by machines on the same Local Area Network (LAN). 10: This means that the server is only accessible by machines on the same site. 11: This means that the server is globally accessible FLAGS: this comprises a field containing 8 bits: BIT (0): Window-Size is valid BIT (1): ST_CKS: Checksum this packet 4|Page BIT (2): ST_RTR: Recover packet if checksum error or missing BIT (3): ST_RETRANS: This is an indication that the packet has been retransmitted BIT (4): REMOTE_RESET: The connection has been reset by the other side BIT (5): REPLY_REQUESTED: A reply has been requested for this packet BIT (6): REPLY: This is a reply to a previous request BIT (7): End-of-Message: Indicates that the last message was completely received. CHKSUM: this is the 16 bit checksum. It is the same as used in TCP. TOTAL_LEN: This is the total length of the packet including the SP header. PBLOCK: This is used to signify which part of the message is contained in this packet. TBLOCK: The total number of blocks/packets in a message. MESS_SEQ_NO: the last message sent. Only DATA, CNTL and END packets can increase the MESS_SEQ_NO. The sending of other packet types does not increase the MESS_SEQ_NO of a connection. MESS_ACK_NO: the last message received. WINDOW_SIZE: This is 22 bits long and specifies the number of bytes that can be sent by the sending side before waiting for an acknowledgement from the receiver. Hence a maximum of 4 MBs can be sent before waiting for an acknowledgement. SYNC_NO: This is a variable which is used to ensure that ACKs have been correctly received. So every time a unique acknowledgement is received, the SYNC_NO is incremented. The SYNC is 10 bits long and must be randomly assigned at connection start-up. Connection States The Simple Protocol supports the following connection states: NOT_INUSE = 0; this connection is not valid CONN_REQUESTED = 1; a connection has been requested so a START packet has been transmitted but there has not been a reply. CONNECTED = 2; the connection is in the connected state. END_REQUESTED_LOCAL: = 3 the local end has sent an END packet to close down the connection and is waiting on an END packet from the remote end to completely close the connection. 5|Page END_REQUEST_REMOTE: = 4 the remote end has sent an END packet and is waiting on the local end. CLOSING = 5: the connection is closing. This means that END packets have been received and sent by both sides of the connection. However, unacknowledged or missing or retransmitted packets could still be received in this state. CLOSED = 6: the connection is closed and the resources can now be reclaimed. TIMERS There are a number of timers associated with every SP connection: CONN_TIMER: This is activated when a connection request is sent. When the timer expires the CONN_TIMER packet is resent. This process is repeated 3 times after which the connection is dropped. ACK_TIMER: This is activated when an acknowledgement is requested. When the timer expires an ACK packet is sent with ACK REPLY_REQUESTED. This process is repeated 3 times after which the outstanding data packets are retransmitted. Then the RETRANS_TIMER is started. When the RETRANS_TIMER expires the process is repeated 3 times then the connection is dropped. ECHO_TIMER:- This is used to time the end-to-end network latency. So when an ECHO_TIMER expires a packet is sent with REPLY_REQUESTED. When the receiving stack gets this packet it simply replies to the packet. Echo packets are therefore NULL packets which allow the system to measure the network latency of the connection. The ECHO_TIMER is used to ensure that packets are sent periodically. END_TIMER:- This timer is set to ensure that the first END packet is acknowledged. An END packet is regarded as part of the data stream and has a distinct message sequence no. When one side wants to close the connection, it sends an END_PACKET with a distinct message sequence number and starts the END_TIMER. If the END_TIMER expires then the END packet is resent. This process is repeated 3 times and then the connection is dropped. 6|Page SP Server State Transition Diagram Not in use = connection state open sp() START recvd Connection Requested Send and receive DATA, ACK, NACK, ACK_NACK START sent Send and receive DATA, ACK, NACK, ACK_NACK END sent End Requested Local Connected Send and receive DATA, END recvd ACK, NACK, ACK_NACK End Request Remote RST recvd or sent or sp_close() Send and receive DATA, ACK, NACK, ACK_NACK END recvd END sent Closing Closed Figure 3: Showing the Different Connections States Packet Formats SP is a bit unusual in that outside DATA and CONTROL packets all other packets in SP are the size of the SP header. This means that for certain packet types some header fields have been renamed to reflect the function of that packet type. This approach means that no part of the protocol header is wasted. However, this is compounded by the fact that SP is an asynchronous protocol so normal SP packets do not reflect the complete state of both sides of a connection. This can be seen in a normal DATA packet. PBLOCK and TBLOCK reflect the block that is being 7|Page transmitted in the message given by MESS_SEQ_NO. So these are variables associated with transmission or sending of data. The reception of messages is given by the variable MESS_ACK_NO which indicates the current message being received. Notice that with this SP header, you do not know the last block of that message (PBLOCK) that was received. You would only know whether or not the entire message has been completely received because the END_OF_MESSAGE bit will be set in the flags when the entire message has been received. In the case of Acknowledgement packets the PBLOCK and TBLOCK parameters are associated with MESS_ACK_NO and not MESS_SEQ_NO. So when an ACK is received it reveals more than the piggybacked information as the ACK reveals which was the last block of that message that was received. The most radical format change is for NACK packets. NACK packets indicate that there is a gap of missing packets. In SP, NACK packets delineate that gap by sending back information on packets at either side of that gap. So the last packet received before the gap in a NACK packet is given by the MESS_ACK_NO and the PBLOCK number in the NACK SP header while the received packet at the other end gap is given by MESS_SEQ_NO and the TBLOCK number. It is very important to realise that there is no way for SP to reliably work out exactly how packets are missing. It could know how many messages are missing but it would not know the size of each message. It is therefore up to the sender to just retransmit the packets. Note that in a stream with a number of gaps, SP is set up to deal with one gap at a time. So the system will keep transmitting NACK packets for the oldest gap until it is filled and then goes to the second oldest, etc. The Mechanisms Connection When Process A wants to start a connection to Process B, it chooses a SRC_ID which locally represents the connection structure. Note that the SRC_ID cannot be zero. It sends a START message with the REPLY_REQUSTED bit set. Note in this initial START message, the DEST_ID must be set to zero or the call is rejected since a connection id has not yet been allocated at the other end. The flags are set in the START packet and are taken to represent the type of the connection being requested. The application can also set its receive window size. If not, the default starting window size of 128 KBs is used. Process A must also randomly generate a 10-bit SYNC_NO value which is placed in the starting packet. After sending the START packet a CONN_TIMER is started. Process B gets the connection request and examines the source address as well as the type of connection being requested. If Process B does not want to connect, it issues a REJ packet. When the REJ packet is received by Process A, the connection is immediately shut down and all structures associated with the connection are released. 8|Page If Process B accepts the connection, it sends a START packet with the REPLY flag set indicating that it has accepted the connection. It first takes the SRC_ID of the incoming packet and then makes it the DEST_ID of the outgoing packet. It then chooses a local number or src_id and sets the SRC_ID of the outgoing packet to src_id. Note that the value of src_id cannot be zero. It then sets the same flags as the incoming packet. Note that since SP is meant to be quick there is no QoS negotiation built into the protocol, so if Process B does not want the same type of connection as Process A, then it must reject the connection. Process B then generates its window size and also generates a random 10-bit SYNC_NO value which it sends back to Process A. When Process A receives a START packet with a REPLY bit set and the DEST_ID equal to the SRC_ID of its START packet then it knows that the connection has accepted. It stops the CONN_TIMER and fills out the rest of the connection structure. It is also worth pointing out that in SP, the SYNC_NOs are crossed. So Process B must use the SYNC_NO generated by Process A in the original start packet and Process A must use the SYNC_NO generated by Process B in the reply packet. This helps to prevent replay attacks. Both Process A and Process B move to the CONNECTED state. We now look at the use of the SC field. When a client wishes to talk to a server, it must first get the IP address of the server and the scope of the server. This information will be stored in the DNS. The client asks to be connected to the server and also includes the scope of the server. The SP protocol will check to see if the server is reachable according to the specified scope. If not, the connection request is rejected. If the request is admissible, it sends a START packet with the IP destination of the server along with the scope of the server. On receiving a START packet with its REQUEST_REPLY bit set, the server looks at the scope of the destination in the IP packet. If the scope does not match or the Source IP address is not within the receiving server’s scope, then a REJ packet is sent back to the client. Data Transmission After the connection is made, i.e., both Processes are in the CONNECTED state, they can begin to exchange data. In SP, data is sent using messages and each message can be divided into a number of blocks. The total number of blocks of a message is given by the parameter TBLOCK in the SP header. Each block of the message is sent as one SP packet. The particular block is given by the parameter, PBLOCK, in the SP header. In SP, it is recommended that if a message is composed of several blocks, then each block except the last block, should be of the same size. This will allow the receiver to allocate memory to save the entire message at the start of message transfer. Every message is uniquely identified by the MESS_SEQ_NO parameter. The total blocks in the message is given by TBLOCK and the individual block is given by PBLOCK. When the receiver gets a DATA packet, if the DATA packet is the start of a message, the receiver increases the MESS_ACK_NO number for that connection. The total number of blocks in the message is given by TBLOCK and this is used to set the local variable 9|Page tblock_rx in the connection structure. The block of the message, PBLOCK, is used to set a local variable pblock_rx. So the first block in a message PBLOCK will be zero and hence pblock_rx is set to zero. When the last block of a message is received, i.e., PBLOCK is equal to TBLOCK – 1, the receiver sets an End-of-Message Flag (EOF) which is sent on outgoing packets. This indicates to the sender that the message given by MESS_ACK_NO has been completely received. So it means that the sender can de-allocate the blocks of that message because it has been successfully received by the receiver. Now we look at the issue of acknowledgements. In SP, it is the sender’s responsibility to manage the acknowledgement of the data sent to the receiver. This allows the sender to determine an appropriate rate of acknowledgements according to the data being sent. In order to get an acknowledgement from the receiver, the sender must request it. This can be done by setting the REPLY_REQUESTED flag in a DATA packet or in an ACK packet. When the receiver gets a data or acknowledgement packet with the REPLY_REQUESTED bit set, it sends an acknowledgement with the REPLY bit set indicating that it is a reply to a request. When the REPLY_REQUESTED bit is set in a data packet or acknowledgement packet, an ACK_TIMER is set, if this timer expires, an acknowledgement packet is sent with the REPLY_REQUESTED bit set. The process is repeated a number of times denoted by the ACK_RETRANS count. After this expires, then the data is retransmitted and the RETRANS_TIMER is started when this expires. The data is retransmitted a number of timers given RETRANS_COUNT after which the connection is dropped. We now look at retransmission. SP does not start a retransmission timer when a data packet with the REPLY_REQUESTED bit is set, instead as indicated above, it starts an ACK_TIMER which is used to get an acknowledgement from the other side before starting to retransmit the data. In addition, SP uses a RETRANS bit in the header to indicate that a packet has been retransmitted. This means that when the packet is first retransmitted, the packet must again be check-summed. This may seem problematic but it actually makes handling data at the receiver much easier. Closing the Connection When one side, say Process A, wants to close a connection, it sends an END packet with the REPLY_REQUESTED bit set which indicates that it wants to end the connection. A very important thing to grasp is that an END packet is treated as a terminal DATA packet and therefore is given its own MESS_SEQ_NO. On the closing request side, the state of the connection goes to END_REQUEST_LOCAL. Once this state is set, this side can no longer send new DATA packets. This side starts an END_TIMER which is used to check that the initial END packet has been received. If the END_TIMER expires, then the END packet is retransmitted for a number of times (NOTE: the END packet does not need to set the retransmission bit when it is resent). 10 | P a g e SP operates in a way that treats connections with respect to data transfer as two unidirectional streams. This means that shutting down data transfer in one direction does not mean that the data flow in the other direction will be immediately shut down. The other side can still send data packets until it is willing to shut down the connection. When the other side, say Process B, receives an END_PACKET with the REPLY_REQUESTED bit set, the receiver first increments the MESS_ACK_NO and sets the connection to END_REQUEST_REMOTE. There are two possible further responses. This first is to choose to close the connection immediately. This would be the choice of a server on the request of a client, say. In that case the receiver sets the connection to CLOSING and sends an END packet with the REPLY bit set. Once this packet is sent, the connection will go to CLOSED at which point the resources for the connection can be reclaimed. When the other side receives an END PACKET with the REPLY, it cancels the END_TIMER and then goes to CLOSING and then CLOSED. However, if the receiver of the original END packet still has data to be sent, it replies to the END packet with an ACK packet with the REPLY bit set. The originating sender treats this as an acknowledgement of its END packet being received, but also as an indication that the other side has not finished sending its data. If the other side does not generate any more new data the protocol could queue the END packet at the end of the send queue, the connection state is set to CLOSING to indicate that no new data could be sent but data still on the send queue can still be transmitted. After the data on the send queue is sent, then the END packet is sent. There are also abnormal ways of ending connections. The first is by setting the REMOTE_RESET bit in the flags field of any packet that is part of a valid connection. This indicates an abnormal condition has occurred and the connection must immediately be torn down. There are other cases of abnormal termination. For example, let us suppose BOTH sides send an END packet with REPLY_REQUESTED at around the same time. In this case both sides will be in the END_REQUEST_LOCAL state. If they then receive an END packet with the REQUESTED_REPLY bit set then they both go to CLOSING and then closed without sending an END packet with the REPLY bit set. This called an opportunistic close. However, if in the above scenario, one of the END packets gets lost then the side of the connection that gets the packet will close the connection. The END_TIMER on the other side will expire and thus the END packet will be resent. However, the connection will either be in a CLOSED state or will no longer exist as its resources have been reclaimed. In both cases a REJ packet must be sent to the other end forcing a complete shutdown. Operation of the SYNC_NO SYNC_NOs were included so as to track whether ACKs have been correctly received. So the SYNC_NO is only increased when a unique ACK has been received. A unique ACK 11 | P a g e acknowledges packets that have not yet been acknowledged. So when the receiver gets a unique ACK it increases the SYNC_NO on its outgoing connection indicating that the ACK was correctly received. The real idea behind a SYNC_NO is to avoid unnecessary retransmission due to ACKs being lost. So let us explore this scenario, Process A sends a message to Process B with the REPLY_REQUESTED bit set in the last data packet. At this point Process A starts the ACK_TIMER as indicated above. Process B gets the message including the last message. In the connection structure for the connection, there are two variables kept to monitor the SYNC_NOs that come from the other end, they are sync_no_rx and sync_no_rx_expected. The first is the SYNC_NO received in the last packet, while the latter is what the process expects it to be. When Process B sends an ACK as requested to Process A with the REPLY bit set, it increments sync_no_rx_expected as it assumes that Process A will get the ACK and increment its SYNC_NO in its header. But let us now suppose that the ACK gets lost, then at Process A, the ACK_TIMER expires and it sends an ACK with the REPLY_REQUESTED bit set. When Process B gets the ACK request with the REPLY_REQUESTED set, it checks these two variables. If the sync_no_rx is equal to the sync_no_rx_expected then some or all of the data packets sent by Process A did not make it to Process B and so Process B has not sent the acknowledgement. At this point Process B can choose to ignore the ACK packet since it knows that the data will eventually be retransmitted or it can send an ACK packet with the REPLY bit set to speed up the interaction. If these two variables are different then it means that the original ACK was lost. So Process B is obliged to send an ACK packet with the REPLY bit set so as to stop the retransmission of data packets. SYNC_NOs also has another function of making sure that there is randomness in the system and hence can be used to hinder the replay of SP connections. Control Packet SP also supports the idea of control packets. A Control packet is an in-stream mechanism to send control information or commands. Control packets are sent reliability, independent of the type of data connection being used. So Control packets are therefore sent with the REQUEST_REQUESTED bit set. Each control packet therefore has its own MESS_SEQ_NO. Control packets are handled very differently from data packets. Firstly control packets must be dealt with individually. So it means only one control packet can be outstanding at any point in time. Secondly, control packets are handled differently at the receiver. In order to handle a control packet the receiving application must install a control function in the connection structure. When the receiver gets a control packet, this function is called. If there is no function then the packet is discarded. 12 | P a g e Echo packet SP supports the idea of the measuring the latency of a network connection. Senders may use ECHO packets to adjust its sending rate or manage the congestion window. So the idea is that a long-term stream would be able to monitor the latency on the network and begin to sensibly adapt to maintain the speed of the system. In order to do this, the connection transmits ECHO packets periodically. Every time an ECHO packet is sent the time is recorded and the REPLY_REQUESTED bit is set. When an ECHO packet is received the receiver immediately sends an ECHO packet with the REPLY bit set. When this packet is received the time is also taken and the network latency of the connection is measured. The ECHO_TIMER is used to make sure that ECHO packets are sent periodically. A Common Interface It is important that SP has a common interface which is used by all applications. The prototype implementation of SP runs over UDP using Port No 6122. So the maximum size of a SP packet is (64K – 8 bytes (for the UDP header)). Presently the following application interface is used. int open_sp(unsigned int dest_IP_address, int conn_type, int max_mtu, int pri, int scope) This call opens an SP connection to a machine with a given IP address. The connection type is set by using the ST_CKS and ST_RTR as follows: UNRELIABLE: ST_CKS and ST_RTR = 0 FEC: ST_CKS = 1; ST_RTR = 0 RTR: ST_CKS = 0; ST_RTR = 1 RELIABLE = ST_CKS = 1 and ST_RTR = 1 The max_mtu is the maximum message size that will be sent on this connection by the machine requesting the connection. The pri stands for the priority of the connection. The scope is the scope of the destination process. This call returns the src_id for this connection which is the handle for this connection on this machine. int close_sp(int src_id) This closes a connection. int recv_buffer(int src_id, struct sp_pckt_q *pcktq, int *status, int recv_flags) This receives data on a connection. The src_id is the connection handle, the packtq parameter is actually a pointer to a queue of receive packets that form a message. The status variable is used by the protocol to signal various things to the receiver including when the window is full, etc. The recv_flags are not currently used and are usually set to zero. It can be used to signal whether the receiver will not block if there is no data. 13 | P a g e int send_buffer(int src_id, struct sp_pckt_b *sbp, int pblock, int tblock, int *status, int send_flags) This is the basic send call which sends an SP packet on a connection given by src_id. So it is up to the application to break down a message into a number of packets and then call this send_buffer function to send individual packets on that connection. This call assumes that the SP packet is passed in the argument has space for a SP header and data. It fills out the SP header and sends the packet. The send_flags parameter is normally set to zero. It can be used to indicate whether the sender is willing to block if window for the connection is full. int register_cntl_function(int src_id, int (*contl_funct)(struct sp_pckt_b *)) This call registers a control function which is called if a control packet is sent from the other side of the connection. int deregister_cntl_function(int src_id) This call deregisters the control function. int send_cntl_buffer(int src_id, struct sp_pckt_b *sbp) This call sends a CONTROL packet to the other side of the connection. Future of SP A protocol version of SP has been developed. It is hoped that a reference release will be made by summer 2012. 4. 0 Related Articles [Strayer92] W.T. Strayer, B.J. Dempsey, and A. C. Weaver. XTP: The Xpress Transfer Protocol. Addison and Wesley, 1992 [Jacobson93] V. Jacobson: A High-Performance TCP/IP Implementation. In Gigabit-perSecond TCP Wksp. Ctr. for Nat’l Res. Initiatives, March 1993. [Sterbenz95] J.P.G. Sterbenz. Protocols for High Speed Networks: Life After ATM? In Protocol for High Speed Networks IV, pages 3–18. Chapman & Hall, 1995. [Mapp97] Mapp G E, Pope S and Hopper A: The Design and Implementation of a HighSpeed User-Space Transport Protocol. Globecom 97, Phoenix, Arizona, 4th – 8th November 1997. 14 | P a g e