Our ‘xmit1000.c’ driver Implementing a ‘packet-transmit’ capability with the Intel 82573L

advertisement
Our ‘xmit1000.c’ driver
Implementing a ‘packet-transmit’
capability with the Intel 82573L
network interface controller
Remenber ‘echo’ and ‘cat’?
• Your device-driver module (named ‘uart.c’)
was supposed to allow two programs that
are running on a pair of adjacent PCs to
communicate via a “null-modem” cable
Transmitting…
$ echo Hello > /dev/uart
$_
Receiving…
$ cat /dev/uart
Hello _
‘keep it simple’
• Let’s try to implement a ‘write()’ routine for
our Intel Pro/1000 ethernet controllers that
will provide the same basic functionality as
we achieved with our serial UART driver
• It should allow us to transmit a message
by using the familiar UNIX ‘cat’ command
to redirect output to a character device-file
• Our device-file will be named ‘/dev/nic’
Driver’s components
my_fops
write
‘struct’ holds one
function-pointer
my_write()
This function will program the actual data-transfer
my_get_info()
This function will allow us to inspect the transmit-descriptors
module_init()
This function will detect and configure
the hardware, define page-mappings,
allocate and initialize the descriptors,
start the ‘transmit’ engine, create the
pseudo-file and register ‘my_fops’
module_exit()
This function will do needed ‘cleanup’
when it’s time to unload our driver –
turn off the ‘transmit’ engine, free the
memory, delete page-table entries,
the pseudo-file, and the ‘my_fops’
Kzalloc()
• Linux kernels since 2.6.13 offer this convenient
function for allocating pre-zeroed kernel memory
• It has the same syntax as the ‘kmalloc()’ function
(described in our texts), but adds the after-effect
of zeroing out the newly-allocated memory-area
void *kmem = kmalloc( region_size, GFP_KERNEL );
memset( kmem, 0x00, region_size );
/* can be replaced with */
void *kmem = kzalloc( region_size, GFP_KERNEL );
• Thus it does two logically distinct actions (often
coupled anyway) within a single function-call
Single page-frame option
Packet-Buffer (3-KB)
(reused for successive transmissions)
4KB
PageFrame
Descriptor-Buffer (1-KB)
(room for up to 256 descriptors)
Our Tx-Descriptor ring
After writing the data into our packet-buffer, and writing its length to the
the current TAIL descriptor, our driver will advance the TAIL index; the
NIC responds by reading the current HEAD descriptor, fetching its data,
then advancing the HEAD index as it sends our data out over the wire.
TAIL
HEAD
descriptor 0
descriptor 1
Our
‘reusable’
transmit-buffer
(1536 bytes)
descriptor 2
descriptor 3
descriptor 4
descriptor 5
descriptor 6
one packet-buffer
descriptor 7
Array of 8 transmit-descriptors
‘/proc/xmit1000’
• This pseudo-file can be examined anytime
to find out what values (if any) the NIC has
‘written back’ into the transmit-descriptors
(i.e., the descriptor-status information) and
current values in registers TDH and TDT:
$ cat /proc/xmit1000
Direct Memory Access
• The NIC is able to ‘fetch’ descriptors from
host-system’s memory (and also can read
the data from our packet-buffer) as well as
‘store’ a status-report back into the host’s
memory by temporarily becoming the Bus
Master (taking control of the system-bus
away from the CPU so that it can perform
the ‘fetch’ and ‘store’ operations directly,
without CPU involvement or interference)
Configuration registers
CTRL
CTRL_EXT
Device Control
Extended Device Control
TIPG
Transmit Inter-Packet Gap
TCTL
Transmit Control
TDBAL
Transmit Descriptor-queue Base-Address (LOW)
TDBAH
Transmit Descriptor-queue Base-Address (HIGH)
TDLEN
Transmit Descriptor-queue Length
TDH
Transmit Descriptor-queue HEAD
TDT
Transmit Descriptor-queue TAIL
TXDCTL
Transmit Descriptor-queue Control
The ‘initialization’ sequence
•
•
•
•
•
•
•
•
Detect the network interface controller
Obtain its i/o-memory address and size
Remap the i/o-memory into kernel-space
Allocate memory for buffer and descriptors
Initialize the array of transmit-descriptors
Reset the NIC and configure its operations
Create the ‘/proc/xmit1000’ pseudo-file
Register our ‘write()’ driver-method
The ‘cleanup’ sequence
• Usually the steps here follow those in the
initialization sequence -- but in backwards
order:
•
•
•
•
•
Unregister the device-driver’s file-operations
Delete the ‘/proc/xmit1000’ pseudo-file
Disable the NIC’s ‘transmit’ engine
Release the allocated kernel-memory
Unmap the NIC’s i/o-memory region
Our ‘write()’ algorithm
•
•
•
•
•
•
Get index of the current TAIL descriptor
Confine the amount of user-data
Copy user-data into the packet-buffer
Setup the packet’s Ethernet Header
Setup packet-length in the TAIL descriptor
Now hand over this descriptor to the NIC
(by advancing the value in register TDT)
• Tell the kernel how many bytes were sent
Recall Tx-Descriptor Layout
31
0
Buffer-Address low (bits 31..0)
0x0
Buffer-Address high (bits 63..32)
0x4
CMD
CSO
special
Packet Length (in bytes)
CSS
reserved
=0
status
Buffer-Address = the packet-buffer’s 64-bit address in physical memory
Packet-Length = number of bytes in the data-packet to be transmitted
CMD = Command-field
CSO/CSS = Checksum Offset/Start (in bytes)
STA = Status-field
0x8
0xC
Suggested C syntax
typedef struct {
unsigned long long
unsigned short
unsigned char
unsigned char
unsigned char
unsigned char
unsigned short
} TX_DESCRIPTOR;
base_addr;
pkt_length;
cksum_off;
desc_cmd;
desc_stat;
cksum_org;
special;
Transmit IPG (0x0410)
IPG = Inter-Packet Gap
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
R
IPG After Deferral
=0
(Recommended value = 7)
IPG Part 1
IPG
(Recommended value = 8)
IPG Back-To-Back
(Recommended value = 8)
This register controls the Inter-Packet Gap timer for the Ethernet controller.
Note that the recommended TIPG register-value to achieve IEEE 802.3
compliant minimum transfer IPG values in full- and half-duplex operations
would be 00702008 (hexadecimal), equal to (7<<20) | (8<<10) | (8<<0).
82573L
Transmit Control (0x0400)
31
R
=0
30
R
=0
29
R
28
MULR
27
26
TXCSCMT
=0
15
14
13
12
25
UNO
RTX
11
COLD (lower 4-bits)
(COLLISION DISTANCE)
EN = Transmit Enable
PSP = Pad Short Packets
CT = Collision Threshold (=0xF)
COLD = Collision Distance (=0x3F)
24
RTLC
23
R
=0
10
0
9
22
21
20
18
17
16
COLD (upper 6-bits)
SW
XOFF
8
19
(COLLISION DISTANCE)
7
6
5
I
S
CT
L
TBI
(COLLISION
ASDV THRESHOLD)
SPEED
L
O
mode
S
U
4
3
P
S
P
2
1
0
R0
=0
0N
E
R
=0
SWXOFF = Software XOFF Transmission
RLTC = Retransmit on Late Collision
UNORTX = Underrun No Re-Transmit
TXCSCMT = TxDescriptor Minimum Threshold
MULR = Multiple Request Support
82573L
Our driver’s elections
Here’s a C programming style that ‘documents’ the programmer’s choices.
int
tx_control = 0;
tx_control |= (0<<1);
tx_control |= (1<<3);
tx_control |= (15<<4);
tx_control |= (63<<12);
tx_control |= (0<<22);
tx_control |= (1<<24);
tx_control |= (0<<25);
tx_control |= (0<<26);
tx_control |= (0<<28);
// EN-bit (Enable Transmit Engine)
// PSP-bit (Pad Short Packets)
// CT=15 (Collision Threshold)
// COLD=63 (Collision Distance)
// SWXOFF-bit (Software XOFF Tx)
// RTLC-bit (Re-Transmit on Late Collision)
// UNORTX-bit (Underrun No Re-Transmit)
// TXCSMT=0 (Tx-descriptor Min Threshold)
// MULR-bit (Multiple Request Support)
iowrite32( tx_control, io + E1000_TCTL );
// Transmit Control register
82573L
An ‘e1000.c’ anomaly?
• The official Linux kernel is delivered with a
device-driver supporting Intel’s ‘Pro/1000’
gigabit ethernet controllers (several)
• Often this driver will get loaded by default
during the system’s startup procedures
• But it will interfere with your own driver if
you try to write a substitute for ‘e1000.ko’
• So you will want to remove it with ‘rmmod’
Side-effect of ‘rmmod’
• We’ve observed an unexpected side-effect
of ‘unloading’ the ‘e1000.ko’ device-driver
• The PCI Configuration Space’s command
register gets modified in a way that keeps
the NIC from working with your own driver
• Specifically, the Bus Mastering capability
gets disabled (by clearing bit #2 in the PCI
Configuration Space’s word at address 4)
What to do about it?
• This effect doesn’t arise on our ‘anchor’
cluster machines, but you may encounter
it when you try using our demo elsewhere
• Here’s the simple “fix” to turn Bus Master
capability back on (in your ‘module_init()’)
u16
pci_cmd;
// declares a 16-bit variable
pci_read_config_word( devp, 4, &pci_cmd ); // read current word
pci_cmd |= (1<<2);
// turn on the Bus Master enabled-bit
pci_write_config_word( devp, 4, pci_cmd ); // write modification
In-class demo
• We demonstrate our ‘xmit1000.c’ driver on
an ‘anchor’ machine, with some help from
a companion-module (named ‘recv1000.c’)
which is soon-to-be discussed in class
Transmitting…
Receiving…
$ echo Hello > /dev/nic
$_
$ cat /dev/nic
Hello _
anchor01
anchor05
LAN
In-class exercise
• Open three or more terminal-windows on
your PC’s graphical desktop, and login to
a different ‘anchor’ machine in each one
• Install the ‘xmit1000.ko’ module on one of
the anchor machines, and then install our
‘recv1000.ko’ module on the other stations
• Execute the ‘cat /dev/nic’ command on the
receiver-stations, and then run an ‘echo’
command on the transmitter-station
Download