What’s needed to receive? A look at the minimum steps required for programming our anchor nic’s to receive packets A disappointment • Our former ‘nicwatch.cpp’ application does not seem to work reliably to show packets being received by the 82573L controller • It was based on the ‘raw sockets’ protocol implemented within the Linux kernel’s vast networking subsystem, thus offering us the prospect of a ‘hardware-independent’ tool - if only it would show us all the packets! Two purposes… • So let’s discard ‘nicwatch.cpp’ in favor of writing our own hardware-specific module that WILL be able to show us all the nic’s received packets, independently of Linux’s various layers of networking protocol code • And let’s keep it as simple as possible, so we can see which programming steps are the truly essential ones for the 82573L nic Accessing 82573L registers • Device registers are hardware mapped to a range of addresses in physical memory • We can get the location and extent of this memory-range from a BAR register in the 82573L device’s PCI Configuration Space • We then request the Linux kernel to setup an I/O ‘remapping’ of this memory-range to ‘virtual’ addresses within kernel-space i/o-memory remapping Local-APIC IO-APIC nic registers APIC registers nic registers vram 1-GB kernel code/data vram user space dynamic ram physical address-space ‘virtual’ address-space 3-GB Kernel memory allocation • The NIC requires that some host memory for packet-buffers and receive descriptors • The kernel provides a ‘helper function’ for reserving a suitable region of memory in kernel-space which is both ‘non-pageable’ and ‘physically contiguous’ (i.e., kzalloc()) • It’s our job is to decide how much memory our network controller hardware will need Ethernet packet layout • Total size normally can vary from 64 bytes up to 1522 bytes (unless ‘jumbo’ packets and/or ‘undersized’ packets are enabled) • The NIC expects a 14-byte packet ‘header’ and it appends a 4-byte CRC check-sum 0 6 destination MAC address (6-bytes) 12 source MAC address (6-bytes) 14 Type/length (2-bytes) the packet’s data ‘payload’ goes here (usually varies from 56 to 1500 bytes) Cyclic Redundancy Checksum (4-bytes) Rx-Descriptor Ring-Buffer 0x00 RDBA base-address 0x10 0x20 RDH (head) 0x30 RDLEN (in bytes) 0x40 0x50 0x60 RDT (tail) 0x70 0x80 = owned by hardware (nic) = owned by software (cpu) Circular buffer (128-bytes minimum – and must be a multiple of 128 bytes) Our ‘nicspy.c’ module • It will be a ‘character-mode’ device-driver • It will only implement ‘read()’ and ‘ioctl()’ • The ‘read()’ function will cause a task to sleep until a network packet has arrived • An interrupt-handler will wake up the task • A ‘get_info’ function will be provided as a debugging aid, so the NIC’s Rx descriptorqueue can be conveniently inspected Sixteen packet-buffers • Our ‘nicspy.c’ driver allocates 16 buffers of size 1536 bytes (i.e., for normal ethernet) for the Rx Descriptor Queue (256 bytes) for the sixteen packet-buffers unused unused 32-KB allocated (16 packet-buffers, plus Rx-Descriptor Queue) #define KMEM_SIZE void 0x8000 // 32KB = size of kernel memory allocation *kmem = kzalloc( KMEM_SIZE, GFP_KERNEL ); if ( !kmem ) return –ENOMEM; Format for an Rx Descriptor 16 bytes Base-address (64-bits) The device-driver initializes this ‘base-address’ field with the physical address of a packet-buffer Packetlength Packetstatus errors checksum VLAN tag The network controller will ‘write-back’ the values for these fields when it has transferred a received packet’s data into this packet-buffer Suggested C syntax typedef struct { unsigned long long unsigned short unsigned short unsigned char unsigned char unsigned short } RX_DESCRIPTOR; base_address; packet_length; packet_cksum; desc_status; desc_errors; VLAN_tag; ‘Legacy Format’ for the Intel Pro1000 network controller’s Receive Descriptors RxDesc Status-field 7 6 PIF IPCS 5 4 TCPCS UDPCS 3 2 VP 1 IXSM 0 EOP DD DD = Descriptor Done (1=yes, 0=no) shows if nic is finished with descriptor EOP = End Of Packet (1=yes, 0=no) shows if this packet is logically last IXSM = Ignore Checksum Indications (1=yes, 0=no) VP = VLAN Packet match (1=yes, 0=no) USPCS = UDP Checksum calculated in packet (1=yes, 0=no) TCPCS = TCP Checksum calculated in packet (1=yes, 0=no) IPCS = IPv4 Checksum calculated on packet (1=yes, 0=no) PIF = Passed In-exact Filter (1=yes, 0=no) shows if software must check RxDesc Error-field 7 6 RXE 5 IPE TCPE 4 3 reserved =0 reserved =0 2 1 SEQ 0 SE RXE = Received-data Error (1=yes, 0=no) IPE = IPv4-checksum error TCPE = TCP/UDP checksum error (1=yes, 0=no) SEQ = Sequence error (1=yes, 0=no) SE = Symbol Error (1=yes, 0=no) CE = CRC Error or alignment error (1=yes, 0=no) CE Essential ‘receive’ registers enum { E1000_CTRL E1000_STATUS E1000_ICR E1000_IMS E1000_IMC E1000_RCRL E1000_RDBAL E1000_RDBAH E1000_RDLEN E1000_RDH E1000_RDT E1000_RXDCTL E1000_RA }; 0x0000, 0x0008, 0x00C0, 0x00D0, 0x00D8, 0x0100, 0x2800, 0x2804, 0x2808, 0x2810, 0X2818, 0x2828, 0x5400, // Device Control // Device Status // Interrupt Cause Read // Interrupt Mask Set // Interrupt Mask Clear // Receive Control // Rx Descriptor Base Address Low // Rx Descriptor Base Address High // Rx Descriptor Length // Rx Descriptor Head // Rx Descriptor Tail // Rx Descriptor Control // Receive address-filter Array Receive Control (0x0100) 31 R =0 30 29 0 28 27 F 0LXBUF 15 B A M 14 R =0 13 MO 26 25 SE CRC BSEX 12 24 R 23 22 PMCF DPF =0 11 DTYP 10 9 8 RDMTS 21 20 R CFI =0 7 6 I S L LBML O S U 19 CFI EN 5 18 17 BSIZE VFE 4 16 3 2 LPE MPE UPE SBP 0 1 0 E R 0N =0 EN = Receive Enable DTYP = Descriptor Type DPF = Discard Pause Frames SBP = Store Bad Packets MO = Multicast Offset PMCF = Pass MAC Control Frames UPE = Unicast Promiscuous Enable BAM = Broadcast Accept Mode BSEX = Buffer Size Extension MPE = Multicast Promiscuous Enable BSIZE = Receive Buffer Size SECRC = Strip Ethernet CRC LPE = Long Packet reception Enable VFE = VLAN Filter Enable FLXBUF = Flexible Buffer size LBM = Loopback Mode CFIEN = Canonical Form Indicator Enable RDMTS = Rx-Descriptor Minimum Threshold Size CFI = Canonical Form Indicator bit-value We used 0x0000801C in RCTL to prepare the ‘receive engine’ prior to enabling it Device Control (0x0000) 31 30 29 R PHY VME RST =0 15 28 27 26 TFCE RFCE RST 14 13 R R R =0 =0 =0 12 25 23 22 21 R R R R R =0 =0 =0 =0 =0 11 FRC FRC DPLX SPD FD = Full-Duplex GIOMD = GIO Master Disable SLU = Set Link Up FRCSPD = Force Speed FRCDPLX = Force Duplex 24 10 R =0 9 SPEED 8 =0 19 ADV D3 WUC 7 R 20 6 S L U R 18 17 D/UD status =0 5 4 R R =0 =0 3 R R R =0 =0 =1 16 2 1 0 GIO M 0 D R 0=0 F D SPEED (00=10Mbps, 01=100Mbps, 10=1000Mbps, 11=reserved) ADVD3WUP = Advertise Cold Wake Up Capability D/UD = Dock/Undock status RFCE = Rx Flow-Control Enable RST = Device Reset TFCE = Tx Flow-Control Enable PHYRST = Phy Reset VME = VLAN Mode Enable We used 0x040C0241 to initiate a ‘device reset’ operation 82573L Device Status (0x0008) 31 ? 30 29 28 0 0 27 0 26 0 25 24 0 0 23 0 0 22 0 21 20 0 0 19 18 GIO Master EN 17 0 16 0 0 some undocumented functionality? 15 0 14 0 13 0 12 0 11 0 10 PHY RA 9 ASDV 8 7 6 I S L SPEED L O S U FD = Full-Duplex LU = Link Up TXOFF = Transmission Paused SPEED (00=10Mbps,01=100Mbps, 10=1000Mbps, 11=reserved) ASDV = Auto-negotiation Speed Detection Value PHYRA = PHY Reset Asserted 5 0 4 TX OFF 3 2 1 0 Function ID 0 0U L F D 82573L PCI Bus Master DMA 82573L i/o-memory Host’s Dynamic Random Access Memory Rx Descriptor Queue on-chip RX descriptors packet-buffer on-chip TX descriptors packet-buffer packet-buffer packet-buffer packet-buffer packet-buffer packet-buffer DMA RX and TX FIFOs (32-KB total) Our ‘read()’ algorithm unsigned int rx_curr; ssize_t my_read( struct file *file, char *buf, size_t len, loff_t *pos ) { // our global variable ‘rx_curr’ is the descriptor-array index // for the next receive-buffer descriptor to be processed if ( this descriptor’s status is zero ) put calling task to sleep; // wakeup the task when a fresh packet has been received copy received data from the packet-buffer to user’s buffer clear this descriptor’s status advance our global variable ‘rx_curr’ to the next descriptor return the number of data-bytes transferred } ‘nicspy.cpp’ • This application calls our device-driver’s ‘read()’ function repeatedly, and displays the ‘raw’ ethernet packet-data each time • It requires our ‘nicspy.c’ device-driver to be installed in the kernel, obviously • There’s no ‘clash’ of filenames here – and their similarity helps keep them together: nicspy.c and nicspy.ko nicspy.cpp and nicspy (the kernel-side) ( the user-side ) in-class demo • We can install ‘nicspy.ko’ on one of our anchor machines – making sure ‘eth1’ is ‘down’ before we do our module-install – and then we run ‘nicspy’ on that machine • Next we install our ‘nicping.ko’ module on some other anchor machine – be sure its ‘eth1’ interface is ‘down’ beforehand – and then use ‘cat /proc/nicping’ for a transmit