Accessing the NIC A look at the mechanisms that software can use to interact with our 82573L network interface Typical NIC hardware packet main memory TX FIFO buffer B U S CPU nic RX FIFO transceiver LAN cable No way to communicate SOFTWARE Linux operating system kernel (no knowledge of the NIC) These components lack a way to interact HARDWARE Network interface controller (no knowledge of the OS) Role for a ‘device driver’ SOFTWARE Linux operating system kernel (no knowledge of the NIC) device driver module (knows about both the OS and the NIC) HARDWARE Network interface controller (no knowledge of the OS) Three x86 address-spaces accessed using a large variety of processor instructions (mov, add, or, shr, push, etc.) and virtual-to-physical address-translation memory space (4GB) accessed only by using the processor’s special ‘in’ and ‘out’ instructions (without any translation of port-addresses) i/o space (64KB) PCI configuration space (16MB) i/o-ports 0x0CF8-0x0CFF dedicated to accessing PCI Configuration Space Interface to PCI Configuration Space PCI Configuration Space Address Port (32-bits) 31 CONFADD ( 0x0CF8) E N 23 reserved 16 15 bus (8-bits) 11 10 device (5-bits) 8 7 function (3-bits) 2 doubleword (6-bits) 0 00 Enable Configuration Space Mapping (1=yes, 0=no) PCI Configuration Space Data Port (32-bits) 31 CONFDAT ( 0x0CFC) 0 82573L • Two mechanisms for accessing the NIC: – I/O space (allows booting over the network) – Memory-mapped I/O (available after booting) • Both mechanisms require probing for PCI Configuration Space information (for I/O port-number or memory-mapped address) • Probing requires knowing the device’s IDs (the VENDOR_ID and the DEVICE_ID) Our NIC’s ID numbers • The VENDOR_ID for Intel Corporation: – 0x8086 (famous chip-number for IBM-PCs) • The DEVICE_ID for Intel’s 82573L NIC: – 0x109A (found in Intel’s documentation p.141) NIC’s access mechanisms • The two ways for accessing our network controller’s ‘control’ and ‘status’ registers are explained in Chapter 13 of the Intel Open Source Programmer’s Reference • This Chapter also includes a detailed list of the names and functional descriptions for the various network interface registers • All are accessed as 32-bit ‘doubewords’ PCI Configuration Space A non-volatile parameter-storage area for each PCI device-function PCI Configuration Space Header (16 doublewords – fixed format) 64 doublewords PCI Configuration Space Body (48 doublewords – variable format) PCI Configuration Header 16 doublewords 31 0 Status Register BIST Header Type Command Register Latency Timer Cache Line Size 31 0 Device ID Vendor ID Class Code Class/SubClass/ProgIF Revision ID Dwords 1- 0 3- 2 Base Address 1 Base Address 0 5- 4 Base Address 3 Base Address 2 7- 6 Base Address 5 Base Address 4 9- 8 CardBus CIS Pointer 11 - 10 Subsystem Device ID Subsystem Vendor ID reserved capabilities pointer Expansion ROM Base Address 13 - 12 Maximum Minimum Interrupt Latency Grant Pin Interrupt Line reserved 15 - 14 Interface to PCI Configuration Space PCI Configuration Space Address Port (32-bits) 31 CONFADD ( 0x0CF8) E N 23 reserved 16 15 bus (8-bits) 11 10 device (5-bits) 8 7 function (3-bits) 2 doubleword (6-bits) 0 00 Enable Configuration Space Mapping (1=yes, 0=no) PCI Configuration Space Data Port (32-bits) 31 CONFDAT ( 0x0CFC) 0 Reading PCI Configuration Data • Step one: Output the desired longword’s address (bus, device, function, and dword) with bit 31 set to 1 (to enable access) to the Configuration-Space Address-Port • Step two: Read the designated data from the Configuration-Space Data-Port: # read the PCI Header-Type field (byte 2 of dword 3) for bus=0, device=0, function=0 movl $0x8000000C, %eax # setup address in EAX movw $0x0CF8, %dx # setup port-number in DX outl %eax, %dx # output address to port mov inl shr movb $0x0CFC, %dx %dx, %eax $16, %eax %al, header_type # setup port-number in DX # input configuration longword # shift word 2 into AL register # store Header Type in variable Demo Program • We created a short Linux utility that searches for and reports all of your system’s PCI devices • It’s named “pciprobe.cpp” on our CS686 website • It uses some C++ macros that expand to Intel input/output instructions -- which normally are ‘privileged’ instructions that a Linux applicationprogram is not allowed to execute (segfault!) • Our system administrator (Alex Fedosov) has created a utility (named “iopl3”) that will allow your command-shell to acquire I/O privileges Example: network interface • We identify the network interface controller in our classroom PC’s by class-code 0x02 • The subclass-code 0x00 is for ‘ethernet’ • We can identify the NIC from its VENDOR and DEVICE identification-numbers: • VENDOR_ID = 0x8086 • DEVICE_ID = 0x109A (for Intel Corporation) (for 82573L controller) • You can use the ‘grep’ command to search for these numbers in this header-file: </usr/src/linux/include/linux/pci_ids.h> The NIC’s PCI ‘resources’ 16 doublewords 31 0 Status Register BIST Header Type Command Register Latency Timer Cache Line Size 31 0 DeviceID 0x109A VendorID 0x8086 Class Code Class/SubClass/ProgIF Revision ID Dwords 1- 0 3- 2 Base Address 1 Base Address 0 5- 4 Base Address 3 Base Address 2 7- 6 Base Address 5 Base Address 4 9- 8 CardBus CIS Pointer 11 - 10 Subsystem Device ID Subsystem Vendor ID reserved capabilities pointer Expansion ROM Base Address 13 - 12 Maximum Minimum Interrupt Latency Grant Pin Interrupt Line reserved 15 - 14 Linux PCI helper-functions #include <linux/pci.h> struct pci_dev unsigned int unsigned int void *devp; mmio_base; mmio_size; *io; devp = pci_get_device( VENDOR_ID, DEVICE_ID, NULL ); if ( devp == NULL ) return –ENODEV; mmio_base = pci_resource_start( devp, 0 ); mmio_size = pci_resource_len( devp, 0 ); io = ioremap_nocache( mmio_base, iomm_size ); if ( io == NULL ) return –ENOSPC; Mechanisms compared io NIC i/o-memory Each NIC register has its own address in memory (allows one-step access) kernel memory-space user memory-space Access to all of the NIC’s registers is muliplexed through a pair of I/O-ports (requires multiple instructions) addr CPU’s ‘virtual’ address-space data CPU’s ‘I/O’ address-space C language hides complexity • For the multiplexed i/o-space access… // Your module uses just one C statement to ‘input’ a NIC register-value: int device_status = inl( ioport + 0x0008 ); // but the C compiler translates this statement into SIX cpu-instructions: mov mov out ioport, %dx $8, %eax %eax, %dx add in mov $4, %dx %dx, %eax %eax, device_status Seeing through the C • For the i/o-memory ‘mapped’ access… // Your module uses just one C statement to ‘fetch’ a NIC register-value: int device_status = ioread32( io + 0x0008 ); // but the C compiler translates this statement into three cpu-instructions: mov mov mov io, %esi 8(%esi), %eax %eax, device_status Course’s continuing theme is… “Using the computer to study the computer” TimeStamp Counter • The x86 processor has a 64-bit register named the TimeStamp Counter (TSC) • It continuously increments each cpu cycle (so with a 2-GHz processor, it increments two-billion times every second) • It can be read at any time using a special processor instruction (named RDTSC); its value appears in the EDX:EAX registers Using CLI and STI • For doing accurate timing measurements, your module can temporarily disable your CPU’s response to ‘interrupt’ requests Basic steps for performing an uninterrupted ‘elapsed time’ measurement: step 1: step 2: step 3: step 4: step 5: step 6: Turn off interrupts (using ‘cli’) Read the TimeStamp Counter Perform a NIC register access Read the TimeStamp Counter Turn on interrupts (using ‘sti’) Subtract start-time from finish-time In-class exercise • Look at our ‘timing.c’ demo module – for an ‘inline’ assembly language example using the ‘rdtsc’, ‘cli’ and ‘sti’ instructions • Add some extra code of your own, and do an additional timing measurement, so you can compare the execution-times for the NIC’s two register-access mechanisms • How much faster is memory-mapped I/O?