Checksum ‘offloading’ A look at how the Pro1000 NICs can be programmed to compute and insert TCP/IP checksums Network efficiency • Last time (in our ‘nictcp.c’ demo) we saw the amount of work a CPU would need to do when setting up an ethernet packet for transmission with TCP/IP protocol format • In a busy network this amount of packetcomputation becomes a ‘bottleneck’ that degrades overall system performance • But a lot of that work can be ‘offloaded’! The ‘loops’ are costly • To prepare for a packet-transmission, the device-driver has to execute a few dozen assignment-statements, to set up fields in the packet’s ‘headers’ and in the Transmit Descriptor that will be used by the NIC • Most of these assignments involve simple memory-to-memory copying of parameters • But the ‘checksum’ fields require ‘loops’ Can’t ‘unroll’ checksum-loops • One programming technique for speeding up loop-execution is known as ‘unrolling’, to avoid the ‘test-and-branch’ inefficiency: int sum = 0; sum += wp[0]; sum += wp[1]; sum += wp[2]; … sum += wp[99]; • But it requires knowing in advance what number of loop-iterations will be needed The ‘offload’ solution • Modern network controllers can be built to perform TCP/IP checksum calculations on packet-data as it is being fetched from ram • This relieves a CPU from having to do the most intense portion of packet preparation • But ‘checksum offloading’ is an optional capability that has to be ‘enabled’ – and ‘programmed’ for a specific packet-layout ‘Context’ descriptors • Intel’s Pro1000 network controllers employ special ‘Context’ Transmit-Descriptors for enabling and configuring the ‘checksumoffloading’ capability • Two kinds of Context Descriptor are used: – An ‘Offload’ Context Descriptor (Type 0) – A ‘Data’ Context Descriptor (Type 1) Context descriptor (type 0) 63 48 47 40 39 TUCSE TUCSO MSS HDRLEN 32 31 TUCSS RSV 16 15 IPCSE STA TUCMD DTYP =0 8 IPCSO 7 0 IPCSS PAYLEN DEXT=1 (Extended Descriptor) Legend: IPCSS (IP CheckSum Start) IPCSO (IP CheckSum Offset) IPCSE (IP CheckSum Ending) PAYLEN (Payload Length) TUCMD (TCP/UCP Command) HDRLEN (Header Length) TUCSS (TCP/UDP CheckSum Start) TUCSO (TCP/UDP CheckSum Offset) TUCSE (TCP/UDP CheckSum Ending) DTYP (Descriptor Type) STA (TCP/UDP Status) MSS (Maximum Segment Size) The TUCMD byte 7 IDE 6 5 4 SNAP DEXT (=1) reserved (=0) 3 RS 2 TSE Legend: IDE (Interrupt Delay Enable) SNAP (Sub-Network Access Protocol) DEXT (Descriptor Extension) RS (Report Status) TSE (TCP-Segmentation Enable) IP (Internet Protocol) TCP (Transport Control Protocol) always valid valid only when TSE=1 1 IP 0 TCP Context descriptor (type 1) 63 48 47 40 39 32 31 16 15 8 7 ADDRESS VLAN POPTS RSV STA DCMD DTYP =1 DTALEN DEXT=1 (Extended Descriptor) Legend: DTALEN (Data Length) DTYP (Descriptor Type) DCMD (Descriptor Command) STA (Status) RSV (Reserved) POPTS (Packet Options) VLAN (VLAN tag) 0 The DCMD byte 7 IDE 6 5 4 VLE DEXT (=1) reserved (=0) 3 RS 2 TSE Legend: IDE (Interrupt Delay Enable) VLE (VLAN Enable) DEXT (Descriptor Extension) RS (Report Status) TSE (TCP-Segmentation Enable) IFCS (Insert Frame CheckSum) EOP (End Of Packet)) always valid valid only when EOP=1 1 IFCS 0 EOP Our usage example • We’ve created a module named ‘offload.c’ which demonstrates the NIC’s checksumoffloading capability for TCP/IP packets • It’s a modification of our earlier ‘nictcp.c’ character-mode device-driver module • We have excerpted the main changes in a class-handout – the full version is online Data-type definitions // Our type-definition for the ‘Type 0’ Context-Descriptor typedef struct { unsigned char unsigned char unsigned short ipcss; ipcso; ipcse; unsigned char unsigned char unsigned short tucss; tucso; tucse; unsigned int unsigned int unsigned int paylen:20; dtyp:4; tucmd:8; unsigned char status; unsigned char hdrlen; unsigned short mss; } TX_CONTEXT_OFFLOAD; Definitions (continued) // Our type-definition for the ‘Type 1’ Context-Descriptor typedef struct { unsigned long long base_addr; unsigned int unsigned int unsigned int dtalen:20; dtyp:4; dcmd:8; unsigned char status; unsigned char pkt_opts; unsigned short vlan_tag; } TX_CONTEXT_DATA; typedef union { TX_CONTEXT_OFFLOAD TX_CONTEXT_DATA } TX_DESCRIPTOR; off; dat; Our packets’ layout Ethernet Header (14 bytes) IP Header (20 bytes) TCP Header (20 bytes) Packet-Data (length varies) 14 bytes 10 bytes 16 bytes HDR CKSUM (no options) TCP CKSUM (no options) How we use contexts • Our ‘offload.c’ driver will send a ‘Type 0’ Context Descriptor within ‘module_init()’ txring[ 0 ].off.ipcss = 14; txring[ 0 ].off.ipcso = 24; txring[ 0 ].off.ipcse = 34; // IP-header CheckSum Start // IP-header CheckSum Offset // IP-header CheckSum Ending txring[ 0 ].off.tucss = 34; txring[ 0 ].off.tucso = 50; txring[ 0 ].off.tucse = 0; // TCP/UDP-segment CheckSum Start // TCP/UDP-segment Checksum Offset // TCP/UDP-segment Checksum Ending txring[ 0 ].dtyp = 0; // Type 0 Context Descriptor txring[ 0 ].tucmd = (1<<5)|(1<<3); // DEXT=1, RS=1 iowrite32( 1, io + E1000_TDT ); // give ownership to NIC Using contexts (continued) • Our ‘offload.c’ driver will then use a Type 1 context descriptor every time its ‘write()’ function is called to transmit user-data • The network controller ‘remembers’ the checksum-offloading parameters that we sent during module-initialization, and so it continues to apply them to every outgoing packet (we keep our same packet-layout) Sequence of ‘write()’ steps • • • • • • • • Adjust the ‘len’ argument (if necessary) Copy ‘len’ bytes from the user’s ‘buf’ array Prepend the packet’s TCP Header Insert the pseudo-header’s checksum Prepend the packet’s IP Header Prepend the packet’s Ethernet Header Initialize the Data-Context Tx-Descriptor Give descriptor-ownership to the NIC The TCP pseudo-header • We do initialize the TCP Checksum field, (but this only needs a short computation) Zero Protocol-ID TCP Segment-length (= 6) Source IP-address Destination IP-address • The one’s complement sum of these six words is placed into ‘TCP Checksum’ Setting up the Type-1 Context int txtail = ioread32( io + E1000_TDT ); txring[ txtail ].dat.base_addr = tx_desc + (txtail * TX_BUFSIZ); txring[ txtail ].dat.dtalen = 54 + len; txring[ txtail ].dat.dtyp = 1; txring[ txtail ].dat.dcmd = 0; txring[ txtail ].dat.status = 0; txring[ txtail ].dat.pkt_opts = 3; // IXSM=1, TXSM=1 txring[ txtail ].dat.vlan_tag = vlan_id; txring[ txtail ].dat.dcmd |= (1<<0); txring[ txtail ].dat.dcmd |= (1<<3); txring[ txtail ].dat.dcmd |= (1<<5); txring[ txtail ].dat.dcmd |= (1<<6); txtail = (1 + txtail) % N_TX_DESC; iowrite32( txtail, io + E1000_TDT ); // EOP (End-Of-Packet) // RS (Report Status) // DEXT (Descriptor Extension) // VLE (VLAN Enable) In-class demonstration • We can demonstrate checksum-offloading by using our ‘dram.c’ device-driver to look at the packet that is being transmitted from one of our ‘anchor’ machines, and to look at the packet that gets received by another ‘anchor’ machine • The checksum-fields (at offsets 24 and 50) do get modified by the network hardware! In-class exercise • The NIC can also deal with packets having the UDP protocol-format – but you need to employ different parameters in the Type 0 Context Descriptor and arrange a ‘header’ for the UDP segment that has a different length and arrangement of parameters • Also the UDP protocol-ID is 17 (=0x11) UDP Header 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 Source Port Destination Port Length Checksum Data ::: Traditional ‘Big-Endian’ representation