Our ‘xmit1000.c’ driver Implementing a ‘packet-transmit’ capability with the Intel 82573L network interface controller Remenber ‘echo’ and ‘cat’? • Your device-driver module (named ‘uart.c’) was supposed to allow two programs that are running on a pair of adjacent PCs to communicate via a “null-modem” cable Transmitting… $ echo Hello > /dev/uart $_ Receiving… $ cat /dev/uart Hello _ ‘keep it simple’ • Let’s try to implement a ‘write()’ routine for our Intel Pro/1000 ethernet controllers that will provide the same basic functionality as we achieved with our serial UART driver • It should allow us to transmit a message by using the familiar UNIX ‘cat’ command to redirect output to a character device-file • Our device-file will be named ‘/dev/nic’ Driver’s components my_fops write ‘struct’ holds one function-pointer my_write() This function will program the actual data-transfer my_get_info() This function will allow us to inspect the transmit-descriptors module_init() This function will detect and configure the hardware, define page-mappings, allocate and initialize the descriptors, start the ‘transmit’ engine, create the pseudo-file and register ‘my_fops’ module_exit() This function will do needed ‘cleanup’ when it’s time to unload our driver – turn off the ‘transmit’ engine, free the memory, delete page-table entries, the pseudo-file, and the ‘my_fops’ Kzalloc() • Linux kernels since 2.6.13 offer this convenient function for allocating pre-zeroed kernel memory • It has the same syntax as the ‘kmalloc()’ function (described in our texts), but adds the after-effect of zeroing out the newly-allocated memory-area void *kmem = kmalloc( region_size, GFP_KERNEL ); memset( kmem, 0x00, region_size ); /* can be replaced with */ void *kmem = kzalloc( region_size, GFP_KERNEL ); • Thus it does two logically distinct actions (often coupled anyway) within a single function-call Single page-frame option Packet-Buffer (3-KB) (reused for successive transmissions) 4KB PageFrame Descriptor-Buffer (1-KB) (room for up to 256 descriptors) Our Tx-Descriptor ring After writing the data into our packet-buffer, and writing its length to the the current TAIL descriptor, our driver will advance the TAIL index; the NIC responds by reading the current HEAD descriptor, fetching its data, then advancing the HEAD index as it sends our data out over the wire. TAIL HEAD descriptor 0 descriptor 1 Our ‘reusable’ transmit-buffer (1536 bytes) descriptor 2 descriptor 3 descriptor 4 descriptor 5 descriptor 6 one packet-buffer descriptor 7 Array of 8 transmit-descriptors ‘/proc/xmit1000’ • This pseudo-file can be examined anytime to find out what values (if any) the NIC has ‘written back’ into the transmit-descriptors (i.e., the descriptor-status information) and current values in registers TDH and TDT: $ cat /proc/xmit1000 Direct Memory Access • The NIC is able to ‘fetch’ descriptors from host-system’s memory (and also can read the data from our packet-buffer) as well as ‘store’ a status-report back into the host’s memory by temporarily becoming the Bus Master (taking control of the system-bus away from the CPU so that it can perform the ‘fetch’ and ‘store’ operations directly, without CPU involvement or interference) Configuration registers CTRL CTRL_EXT Device Control Extended Device Control TIPG Transmit Inter-Packet Gap TCTL Transmit Control TDBAL Transmit Descriptor-queue Base-Address (LOW) TDBAH Transmit Descriptor-queue Base-Address (HIGH) TDLEN Transmit Descriptor-queue Length TDH Transmit Descriptor-queue HEAD TDT Transmit Descriptor-queue TAIL TXDCTL Transmit Descriptor-queue Control The ‘initialization’ sequence • • • • • • • • Detect the network interface controller Obtain its i/o-memory address and size Remap the i/o-memory into kernel-space Allocate memory for buffer and descriptors Initialize the array of transmit-descriptors Reset the NIC and configure its operations Create the ‘/proc/xmit1000’ pseudo-file Register our ‘write()’ driver-method The ‘cleanup’ sequence • Usually the steps here follow those in the initialization sequence -- but in backwards order: • • • • • Unregister the device-driver’s file-operations Delete the ‘/proc/xmit1000’ pseudo-file Disable the NIC’s ‘transmit’ engine Release the allocated kernel-memory Unmap the NIC’s i/o-memory region Our ‘write()’ algorithm • • • • • • Get index of the current TAIL descriptor Confine the amount of user-data Copy user-data into the packet-buffer Setup the packet’s Ethernet Header Setup packet-length in the TAIL descriptor Now hand over this descriptor to the NIC (by advancing the value in register TDT) • Tell the kernel how many bytes were sent Recall Tx-Descriptor Layout 31 0 Buffer-Address low (bits 31..0) 0x0 Buffer-Address high (bits 63..32) 0x4 CMD CSO special Packet Length (in bytes) CSS reserved =0 status Buffer-Address = the packet-buffer’s 64-bit address in physical memory Packet-Length = number of bytes in the data-packet to be transmitted CMD = Command-field CSO/CSS = Checksum Offset/Start (in bytes) STA = Status-field 0x8 0xC Suggested C syntax typedef struct { unsigned long long unsigned short unsigned char unsigned char unsigned char unsigned char unsigned short } TX_DESCRIPTOR; base_addr; pkt_length; cksum_off; desc_cmd; desc_stat; cksum_org; special; Transmit IPG (0x0410) IPG = Inter-Packet Gap 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 R IPG After Deferral =0 (Recommended value = 7) IPG Part 1 IPG (Recommended value = 8) IPG Back-To-Back (Recommended value = 8) This register controls the Inter-Packet Gap timer for the Ethernet controller. Note that the recommended TIPG register-value to achieve IEEE 802.3 compliant minimum transfer IPG values in full- and half-duplex operations would be 00702008 (hexadecimal), equal to (7<<20) | (8<<10) | (8<<0). 82573L Transmit Control (0x0400) 31 R =0 30 R =0 29 R 28 MULR 27 26 TXCSCMT =0 15 14 13 12 25 UNO RTX 11 COLD (lower 4-bits) (COLLISION DISTANCE) EN = Transmit Enable PSP = Pad Short Packets CT = Collision Threshold (=0xF) COLD = Collision Distance (=0x3F) 24 RTLC 23 R =0 10 0 9 22 21 20 18 17 16 COLD (upper 6-bits) SW XOFF 8 19 (COLLISION DISTANCE) 7 6 5 I S CT L TBI (COLLISION ASDV THRESHOLD) SPEED L O mode S U 4 3 P S P 2 1 0 R0 =0 0N E R =0 SWXOFF = Software XOFF Transmission RLTC = Retransmit on Late Collision UNORTX = Underrun No Re-Transmit TXCSCMT = TxDescriptor Minimum Threshold MULR = Multiple Request Support 82573L Our driver’s elections Here’s a C programming style that ‘documents’ the programmer’s choices. int tx_control = 0; tx_control |= (0<<1); tx_control |= (1<<3); tx_control |= (15<<4); tx_control |= (63<<12); tx_control |= (0<<22); tx_control |= (1<<24); tx_control |= (0<<25); tx_control |= (0<<26); tx_control |= (0<<28); // EN-bit (Enable Transmit Engine) // PSP-bit (Pad Short Packets) // CT=15 (Collision Threshold) // COLD=63 (Collision Distance) // SWXOFF-bit (Software XOFF Tx) // RTLC-bit (Re-Transmit on Late Collision) // UNORTX-bit (Underrun No Re-Transmit) // TXCSMT=0 (Tx-descriptor Min Threshold) // MULR-bit (Multiple Request Support) iowrite32( tx_control, io + E1000_TCTL ); // Transmit Control register 82573L An ‘e1000.c’ anomaly? • The official Linux kernel is delivered with a device-driver supporting Intel’s ‘Pro/1000’ gigabit ethernet controllers (several) • Often this driver will get loaded by default during the system’s startup procedures • But it will interfere with your own driver if you try to write a substitute for ‘e1000.ko’ • So you will want to remove it with ‘rmmod’ Side-effect of ‘rmmod’ • We’ve observed an unexpected side-effect of ‘unloading’ the ‘e1000.ko’ device-driver • The PCI Configuration Space’s command register gets modified in a way that keeps the NIC from working with your own driver • Specifically, the Bus Mastering capability gets disabled (by clearing bit #2 in the PCI Configuration Space’s word at address 4) What to do about it? • This effect doesn’t arise on our ‘anchor’ cluster machines, but you may encounter it when you try using our demo elsewhere • Here’s the simple “fix” to turn Bus Master capability back on (in your ‘module_init()’) u16 pci_cmd; // declares a 16-bit variable pci_read_config_word( devp, 4, &pci_cmd ); // read current word pci_cmd |= (1<<2); // turn on the Bus Master enabled-bit pci_write_config_word( devp, 4, pci_cmd ); // write modification In-class demo • We demonstrate our ‘xmit1000.c’ driver on an ‘anchor’ machine, with some help from a companion-module (named ‘recv1000.c’) which is soon-to-be discussed in class Transmitting… Receiving… $ echo Hello > /dev/nic $_ $ cat /dev/nic Hello _ anchor01 anchor05 LAN In-class exercise • Open three or more terminal-windows on your PC’s graphical desktop, and login to a different ‘anchor’ machine in each one • Install the ‘xmit1000.ko’ module on one of the anchor machines, and then install our ‘recv1000.ko’ module on the other stations • Execute the ‘cat /dev/nic’ command on the receiver-stations, and then run an ‘echo’ command on the transmitter-station