82573L Initializing our Pro/1000 Chicken-and-Egg? • We want to create a Linux Kernel Module that can serve application-programs as a character-mode device-driver for our NIC • So, as with the UART device, we will need to implement ‘read()’ and ‘write()’ methods • But which method should we do first? • No way to “test” a ‘read()’ method without having a way to send packets to our NIC How ‘transmit’ works List of Buffer-Descriptors descriptor0 descriptor1 descriptor2 descriptor3 0 0 0 0 Buffer0 Buffer1 Buffer2 We setup each data-packets that we want to be transmitted in a ‘Buffer’ area in ram We also create a list of buffer-descriptors and inform the NIC of its location and size Then, when ready, we tell the NIC to ‘Go!’ (i.e., start transmitting), but let us know when these transmissions are ‘Done’ Buffer3 Random Access Memory Registers’ Names Memory-information registers TDBA(L/H) = Transmit-Descriptor Base-Address Low/High (64-bits) TDLEN = Transmit-Descriptor array Length TDH = Transmit-Descriptor Head TDT = Transmit-Descriptor Tail Transmit-engine control registers TXDCTL = Transmit-Descriptor Control Register TCTL = Transmit Control Register Notification timing registers TIDV = Transmit Interrupt Delay Value TADV = Transmit-interrupt Absolute Delay Value Tx-Desc Ring-Buffer 0x00 TDBA base-address 0x10 0x20 TDH (head) 0x30 TDLEN (in bytes) 0x40 0x50 0x60 TDT (tail) 0x70 0x80 = owned by hardware (nic) = owned by software (cpu) Circular buffer (128-bytes minimum) Tx-Descriptor Control (0x3828) 31 0 30 29 0 28 0 15 0 27 0 25 24 0 0 0 G R A N 13 12 11 10 0 14 26 0 FRC HTHRESH FRC 0 DPLX SPD (Host Threshold) 23 22 0 0 9 8 21 20 19 18 17 16 WTHRESH (Writeback Threshold) 7 I L 0 O0 S 6 00 5 A S D E 4 3 2 1 L PTHRESH R 0 00 00 (Prefetch S Threshold) T “This register controls the fetching and write back of transmit descriptors. The three threshhold values are used to determine when descriptors are read from, and written to, host memory. Their values can be in units of cache lines or of descriptors (each descriptor is 16 bytes), based on the value of the GRAN bit (0=cache lines, 1=descriptors). When GRAN = 1, all descriptors are written back (even if not requested).” --Intel manual Recommended for 82573: 0x01010000 (GRAN=1, WTHRESH=1) 0 Transmit Control (0x0400) 31 R =0 30 R =0 29 R 28 MULR 27 26 TXCSCMT =0 15 14 13 12 25 UNO RTX 11 COLD (lower 4-bits) (COLLISION DISTANCE) EN = Transmit Enable PSP = Pad Short Packets CT = Collision Threshold (=0xF) COLD = Collision Distance (=0x3F) 24 RTLC 23 R =0 10 0 9 22 21 20 18 17 16 COLD (upper 6-bits) SW XOFF 8 19 (COLLISION DISTANCE) 7 6 5 I S CT L TBI (COLLISION ASDV THRESHOLD) SPEED L O mode S U 4 3 P S P 2 1 0 R0 =0 0N E R =0 SWXOFF = Software XOFF Transmission RLTC = Retransmit on Late Collision UNORTX = Underrun No Re-Transmit TXCSCMT = TxDescriptor Minimum Threshold MULR = Multiple Request Support 82573L Tx Configuration Word (0x0178) 31 ANE 30 Tx Config 29 28 R ITCE 15 SPD BYPS =0 14 R =0 27 IAME 13 26 R =0 12 EE ASD RST CHK 25 24 23 Tx Tx LS Reserved LS(=0)Flow =0 DF PB PAR PAR EN EN 11 10 R R =0 =0 22 9 R 8 21 R =0 7 20 19 18 Phy DMA Pwr Dyn Down GE En 6 R R TxConfigWord =0 =0 =0 5 17 R RO DIS =0 4 16 3 R R R =0 =0 =0 2 0 1 0 0 ANE = Auto-Negotiation Enable TxConfig = Transmit Configuration Control bit TxConfigWord = Transmit Configuration Word This register has two meanings, depending on the state of the ANE bit (i.e., setting ANE=1 enables the hardware auto-negotiation machine). Applicable only in SerDes mode; program as 0 for internal-PHY mode. 82573L Legacy Tx-Descriptor Layout 31 0 Buffer-Address low (bits 31..0) 0x0 Buffer-Address high (bits 63..32) 0x4 CMD CSO special Packet Length (in bytes) CSS reserved =0 status Buffer-Address = the packet-buffer’s 64-bit address in physical memory Packet-Length = number of bytes in the data-packet to be transmitted CMD = Command-field CSO/CSS = Checksum Offset/Start (in bytes) STA = Status-field 0x8 0xC Suggested C syntax typedef struct { unsigned long long unsigned short unsigned char unsigned char unsigned char unsigned char unsigned short } tx_descriptor; base_addr; pkt_length; cksum_off; desc_cmd; desc_stat; cksum_org; special; TxDesc Command-field 7 6 IDE 5 VLE DEXT 4 reserved =0 3 2 RS 1 IC 0 IFCS EOP EOP = End Of Packet (1=yes, 0=no) IFCS = Insert Frame CheckSum (1=yes, 0=no) – provided EOP is set IC = Insert CheckSum (1=yes, 0=no) as indicated by CSO/CSS fields RS = Report Status (1=yes, 0=no) DEXT = Descriptor Extension (1=yes, 0=no) use ‘0’ for Legacy-Mode VLE = VLAN-Packet Enable (1=yes, 0=no) – provided EOP is set IDE = Interrupt-Delay Enable (1=yes, 0=no) TxDesc Status field 3 reserved =0 2 1 LC 0 EC DD DD = Descriptor Done this bit is written back after the NIC processes the descriptor provided the descriptor’s RS-bit was set (i.e., Report Status) EC = Excess Collisions indicates that the packet has experienced more than the maximum number of excessive collisions (as defined by the TCTL.CT field) and therefore was not transmitted. (This bit is meaningful only in HALF-DUPLEX mode.) LC = Late Collision indicates that Late Collision has occurred while operating in HALF-DUPLEX mode. Note that the collision window size is dependent on the SPEED: 64-bytes for 10/100-MBps, or 512-bytes for 1000-Mbps. Bit-mask definitions enum { DD = (1<<0), EC = (1<<1), LC = (1<<2), // Descriptor Done // Excess Collisions // Late Collision EOP = (1<<0), // End Of Packet IFCS = (1<<1), // Insert Frame CheckSum IC = (1<<2), // Insert CheckSum as per CSO/CSS RS = (1<<3), // Report Status DEXT = (1<<5), // Descriptor Extension VLE = (1<<6), // VLAN packet IDE = (1<<7) // Interrupt-Delay Enable }; Allocating kernel-memory • Our 82573L device-driver will need to use a segment of contiguous physical memory which is cache-aligned and non-pageable • As explained in our LDD3 textbook, such a memory-block can be allocated using the Linux kernel’s ‘kmalloc()’ function (and it can later be deallocated using ‘kfree()’) • The maximum-size allocation is 128-KB • You should use the ‘GFP_KERNEL’ flag Network MTU • Unless the ‘Large-Send’ functionality has been enabled, there will be a maximum length for your network ‘datagrams’ equal to 1536 bytes (=0x0600) • So if you reused the same Packet-Buffer for successive transmissions, you could fit your packet-buffer and a moderate-sized Descriptor-Buffer into one 4KB-pageframe Single page-frame option Descriptor-Buffer (1-KB) (room for up to 256 descriptors) 4KB PageFrame Packet-Buffer (3-KB) (reused for successive transmissions) Another design-option… Descriptor-Buffer (128-bytes) (room for 16 descriptors) 4KB PageFrame 16 Packet-Buffers (3968-bytes) (248-bytes per buffer ) Initialization • Your device-driver needs to initialize your 82573L hardware to a known state, and configure its options for your desired mode of operation • The Device Control register has bits which let you initiate a ‘device reset’ operation • The Device Status register has bits which inform you when a ‘reset’ has completed Device Status (0x0008) 31 ? 30 29 28 0 0 27 0 26 0 25 24 0 0 23 0 0 22 0 21 20 0 0 19 18 GIO Master EN 17 0 16 0 0 some undocumented functionality? 15 0 14 0 13 0 12 0 11 0 10 PHY reset 9 ASDV 8 7 6 I S L SPEED L O S U FD = Full-Duplex LU = Link Up TXOFF = Transmission Paused SPEED (00=10Mbps,01=100Mbps, 10=1000Mbps, 11=reserved) ASDV = Auto-negotiation Speed Detection Value 5 0 4 TX OFF 3 2 1 0 Function ID 0 0U L F D 82573L Device Control (0x0000) 31 30 29 R PHY VME RST =0 15 28 27 26 TFCE RFCE RST 14 13 R R R =0 =0 =0 12 25 23 22 21 R R R R R =0 =0 =0 =0 =0 11 FRC FRC DPLX SPD FD = Full-Duplex GIOMD = GIO Master Disable SLU = Set Link Up FRCSPD = Force Speed FRCDPLX = Force Duplex 24 10 R =0 9 SPEED 8 =0 19 ADV D3 WUC 7 R 20 6 S L U R =0 5 18 17 D/UD status 4 R R =0 =0 3 R R R =0 =0 =1 16 2 1 0 GIO M 0 D R 0=0 F D SPEED (00=10Mbps, 01=100Mbps, 10=1000Mbps, 11=reserved) ADVD3WUP = Advertise Cold Wake Up Capability D/UD = Dock/Undock status RFCE = Rx Flow-Control Enable RST = Device Reset TFCE = Tx Flow-Control Enable PHYRST = Phy Reset VME = VLAN Mode Enable 82573L Extended Control (0x0018) 31 R =0 30 R =0 ? 29 28 R ITCE 15 SPD BYPS =0 14 R =0 27 IAME 13 26 R =0 12 EE ASD RST CHK 25 24 23 22 Tx LS Tx LS Flow =0 9 8 DF PB PAR PAR EN EN 11 10 21 R =0 7 20 19 18 Phy DMA Pwr Dyn Down GE En 6 5 17 R 3 R R R R R R R R R =0 =0 =0 =0 =0 =0 =0 =0 =0 ASDCHK = AutoSpeed Detection Check EERST = EEPROM Reset SPDBYPS = Speed-selection Bypass RODIS = Relaxed-Ordering Disable DMADynGE = DMA Dynamic-Gating Enable PhyPwrDownEn = Phy PowerDown Enable R RO DIS =0 4 16 =0 2 1 0 R0 =0 R 0=0 R =0 TxLSFlow = Tx Large-Send Flow TxLS = Tx Large-Send functionality PBPAREN = Packet-Buffer Parity-Error Detect DFPAREN = Descriptor-FIFO Parity-Error Detect IAME = Interrupt-Acknowledge Auto-Mask Enable ITCE = Interrupt Timers Cleared Enable 82573L Example // clear STATUS bit #31 iowrite32( 0x00000000, io + E1000_STATUS ); // initiate Device-Reset and Phy-Reset iowrite32( 0x84000000, io + E1000_CTRL ); // wait until STATUS bit #31 is set while ( ( ioread32( io + E1000_STATUS )&(1<<31)) == 0 ); // program Link Up with desired operating-mode settings iowrite32( 0x00040241, io + E1000_CTRL ); // wait until LU-bit in STATUS is set while ( ( ioread32( io + E1000_STATUS )&(1<<10)) == 0 ); Interrupt Cause Read (0x00C0) Mechanism for NIC-event notifications 31 30 29 28 27 26 25 24 23 22 21 20 19 18 INT R R R R R R R R R R R R R assert =0 =0 =0 =0 =0 =0 =0 =0 =0 =0 =0 =0 =0 15 TXD LOW 14 13 12 11 10 R R R R R =0 =0 =0 =0 =0 9 MDAC 8 R =0 7 6 5 4 RXT0 RXO R RXD MT0 =0 17 16 S R P D A C K 3 R =0 2 1 0 L S0 C T X 0Q E T X D W TXDW = Transmit Descriptor Written back LSC = Link Status Changed TXQE = Transmit Queue Empty MDAC = MDI/O Access Completed SRPD = Small Receive Packet Detected ACK = Receive ACK-frame detected RXT0 = Receiver Timer Interrupt RXO = Receiver Overrun TXDLOW = Transmit Descriptor Low Threshhold Reached RXDMT0 = Receive Descriptor Minimum Threshhold Reached INT-Assert = Interrupt Assertion is still pending In-Class Exercise #1 • Try compiling and installing our ‘tryreset.c’ demo-module, and examine the messages put in the kernel’s log-file (use ‘dmesg’) • Then modify the module-code so that it also outputs the value in the ICR register (Interrupt Cause Read) during each pass through the two ‘busy-waiting’ loops • #define E1000_ICR 0x00C0 In-Class Exercise #2 • Apply the save techniques we employed in our earlier ‘announce.c’ demo-module so that the ‘printk()’ statements in ‘tryreset.c’ get replaced by statements that will show the messages onscreen, or in the current desktop window, rather than writing them to the kernel’s (out-of-view) log-file