More 82573L details Getting ready to write and test a character-mode device-driver for our anchor-LAN’s ethernet controllers A ‘nic.c’ character driver? my_fops my_isr() ioctl my_ioctl() open my_open() read my_read() write my_write() release my_release() module_init() module_exit() Statistics registers • The 82573L has several dozen statistical counters which automatically operate to keep track of significant events affecting the ethernet controller’s performance • Most are 32-bit ‘read-only’ registers, and they are automatically cleared when read • Your module’s initialization routine could read them all (to start counting from zero) Initializing the nic’s counters • The statistical counters all have addressoffsets in the range 0x04000 – 0x04FFF • You can use a very simple program-loop to ‘clear’ each of these read-only registers // Here ‘io’ is the virtual base-address of the nic’s i/o-memory region { int r; // clear all of the Pro/1000 controller’s statistical counters for (r = 0x4000; r < 0x4FFF; r += 4) ioread32( io + r ); } A few ‘counter’ examples 0x4000 0x400C 0x4014 0x4018 0x4074 0x4078 0x407C 0x40D0 0x40D4 0x40F0 0x40F4 CRCERRS RXERRC SCC ECOL GPRC BPRC MPRC TPR TPT MPTC BPTC CRC Errors Count Receive Error Count Single Collision Count Excessive Collision Count Good Packets Received Broadcast Packets Received Multicast Packets Received Total Packets Received Total Packets Transmitted Multicast Packets Transmitted Broadcast Packets Transmitted Ethernet packet layout • Total size normally can vary from 64 bytes up to 1536 bytes (unless ‘jumbo’ packets and/or ‘undersized’ packets are enabled) • The NIC expects a 14-byte packet ‘header’ and it appends a 4-byte CRC check-sum 0 6 destination MAC address (6-bytes) 12 source MAC address (6-bytes) 14 Type/length (2-bytes) the packet’s data ‘payload’ goes here (usually varies from 56 to 1500 bytes) Cyclic Redundancy Checksum (4-bytes) Filter registers • All the modern ethernet controllers have a built-in ‘filtering’ capability which allows the NIC to automatically discard any packets having a destination-address different from the controller’s own unique MAC address • But the 82573L offers a more elaborate filtering mechanism (and can also ‘reject’ packets based on the ‘source’ addresses) How ‘receive’ works List of Buffer-Descriptors descriptor0 descriptor1 descriptor2 descriptor3 0 0 0 0 Buffer0 Buffer1 Buffer2 We setup memory-buffers where we want received packets to be placed by the NIC We also create a list of buffer-descriptors and inform the NIC of its location and size Then, when ready, we tell the NIC to ‘Go!’ (i.e., start receiving), but to let us know when these receptions have occurred Buffer3 Random Access Memory Receive Control (0x0100) 31 R =0 30 29 0 28 27 F 0LXBUF 15 B A M 14 R =0 13 MO 26 25 SE CRC BSEX 12 24 R 23 22 PMCF DPF =0 11 DTYP 10 9 8 RDMTS 21 20 R CFI =0 7 6 I S L LBML O S U 19 CFI EN 5 18 17 BSIZE VFE 4 16 3 2 LPE MPE UPE SBP 0 1 0 E R 0N =0 EN = Receive Enable DTYP = Descriptor Type DPF = Discard Pause Frames SBP = Store Bad Packets MO = Multicast Offset PMCF = Pass MAC Control Frames UPE = Unicast Promiscuous Enable BAM = Broadcast Accept Mode BSEX = Buffer Size Extension MPE = Multicast Promiscuous Enable BSIZE = Receive Buffer Size SECRC = Strip Ethernet CRC LPE = Long Packet reception Enable VFE = VLAN Filter Enable FLXBUF = Flexible Buffer size LBM = Loopback Mode CFIEN = Canonical Form Indicator Enable RDMTS = Rx-Descriptor Minimum Threshold Size CFI = Cannonical Form Indicator bit-value Registers’ Names Memory-information registers RDBA(L/H) = Receive-Descriptor Base-Address Low/High (64-bits) RDLEN = Receive-Descriptor array Length RDH = Receive-Descriptor Head RDT = Receive-Descriptor Tail Receive-engine control registers RXDCTL = Receive-Descriptor Control Register RCTL = Receive Control Register Notification timing registers RDTR = Receive-interrupt packet Delay Timer RADV = Receive-interrupt Absolute Delay Value Rx-Desc Ring-Buffer 0x00 RDBA base-address 0x10 0x20 RDH (head) 0x30 RDLEN (in bytes) 0x40 0x50 0x60 RDT (tail) 0x70 0x80 = owned by hardware (nic) = owned by software (cpu) Circular buffer (128-bytes minimum) Rx-Descriptor Control (0x2828) 31 30 29 28 27 26 25 R R R R R R R =0 =0 =0 =0 =0 =0 =0 15 14 R R =0 =0 13 0 12 11 24 G R A N 10 23 22 R R =0 =0 21 1 --------0 9 8 FRC HTHRESH FRC 0 DPLX SPD (Host Threshold) 7 20 19 18 17 16 SDP1 SDP0 ADV DATA DATA WTHRESH D3 --------- --------(Writeback Threshold) WUC D/UD 0 status 0 6 R R =0 =0 5 A S D E 4 3 2 1 0 L PTHRESH R 0 00 00 S Threshold) (Prefetch T Prefetch Threshold – A prefetch operation is considered when the number of valid, but unprocessed, receive descriptors that the ethernet controller has in its on-chip buffer drops below this threshold. Host Threshold - A prefetch occurs if at least this many valid descriptors are available in host memory Writeback Threshold - This field controls the writing back to host memory of already processed receive descriptors in the ethernet controller’s on-chip buffer which are ready to be written back to host memory GRAN (Granularity): 1=descriptor-size, 0=cacheline-size Legacy Rx-Descriptor Layout 31 0 Buffer-Address low (bits 31..0) 0x0 Buffer-Address high (bits 63..32) 0x4 Packet Checksum VLAN tag Packet Length (in bytes) 0x8 Errors 0xC Status Buffer-Address = the packet-buffer’s 64-bit address in physical memory Packet Length = number of bytes in the data-packet that has was received Packet Checksum = the16-bit one’s-complement of the entire logical packet Status = shows if descriptor has been used and if it’s last in a logical packet Errors = valid only when DD and EOP are set in the descriptor’s Status field Suggested C syntax typedef struct { unsigned long long unsigned short unsigned short unsigned char unsigned char unsigned short } rx_descriptor; base_addr; pkt_length; checksum; desc_stat; desc_errs; vlan_tag; RxDesc Status-field 7 6 PIF IPCS 5 4 TCPCS UDPCS 3 2 VP 1 IXSM 0 EOP DD DD = Descriptor Done (1=yes, 0=no) shows if nic is finished with descriptor EOP = End Of Packet (1=yes, 0=no) shows if this packet is logically last IXSM = Ignore Checksum Indications (1=yes, 0=no) VP = VLAN Packet match (1=yes, 0=no) USPCS = UDP Checksum calculated in packet (1=yes, 0=no) TCPCS = TCP Checksum calculated in packet (1=yes, 0=no) IPCS = IPv4 Checksum calculated on packet (1=yes, 0=no) PIF = Passed In-exact Filter (1=yes, 0=no) shows if software must check RxDesc Error-field 7 6 RXE 5 IPE TCPE 4 3 reserved =0 reserved =0 2 1 SEQ 0 SE RXE = Received-data Error (1=yes, 0=no) IPE = IPv4-checksum error TCPE = TCP/UDP checksum error (1=yes, 0=no) SEQ = Sequence error (1=yes, 0=no) SE = Symbol Error (1=yes, 0=no) CE = CRC Error or alignment error (1=yes, 0=no) CE Network Administration • Some higher-level networking protocols require the Operating System to setup a translation between the ‘hostname’ for a workstation and the hardware-address of its Network Interface Controller • One mechanism for doing this is creation of a specially-named textfile (‘/etc/ethers’) that provides database for translations In-class exercise #1 • We put a file named ‘ethers’ on our course website that offers a template for defining the translation database that software can consult on our ‘anchor’ cluster’s LAN • One of the eight workstations’ entries has been filled in already: 00:30:48:8A:30:03 anchor00.cs.usfca.edu • Can you complete this database by adding the MAC addresses for the other 7 machines? Our ‘seereset.c’ demo • We created this LKM to demonstrate the sequence of ‘state-changes’ that three of our network controller’s registers undergo in response to initiating a ‘reset’ operation • The programming technique used here is one which we think could be useful in lots of other hardware programming situations where a vendor’s manual may not answer all our questions about how devices work In-class exercise #2 • Try redirecting the output from this ‘cat’ command to a file, like this: $ cat /proc/seereset > seereset.out • Then edit this textfile, adding a comment to each line which indicates the bit(s) that experienced a ‘change-of-state’ from the line that came before it (thereby providing yourself with a running commentary as to how the NIC proceeds through a ‘reset’)