lesson23

advertisement
What’s needed to receive?
A look at the minimum steps
required for programming our
82573L nic to receive packets
Accessing 82573L registers
• Device registers are hardware mapped to
a range of addresses in physical memory
• We can get the location and extent of this
memory-range from a BAR register in the
82573L device’s PCI Configuration Space
• We then request the Linux kernel to setup
an I/O ‘remapping’ of this memory-range to
‘virtual’ addresses within kernel-space
Linux address-spaces
nic registers
kernel
space
128-TB
kernel code/data
stack
64-GB
dynamic ram
user
space
nic registers
dynamic ram
physical address-space
shared libraries
.text, .data, .bss
‘virtual’ address-space
128-TB
Kernel memory allocation
• The NIC requires that some host memory
for packet-buffers and receive descriptors
• The kernel provides a ‘helper function’ for
reserving a suitable region of memory in
kernel-space which is both ‘non-pageable’
and ‘physically contiguous’ (i.e., kzalloc())
• It’s our job is to decide how much memory
our network controller hardware will need
Format for an Rx Descriptor
16 bytes
Base-address (64-bits)
The device-driver initializes
this ‘base-address’ field
with the physical address
of a packet-buffer
Packetlength
Packetstatus errors
checksum
VLAN
tag
The network controller
will ‘write-back’ the
values for these fields
when it has transferred
a received packet’s data
into this descriptor’s
packet-buffer
Suggested C syntax
typedef struct
{
unsigned long long
unsigned short
unsigned short
unsigned char
unsigned char
unsigned short
} RX_DESCRIPTOR;
base_address;
packet_length;
packet_cksum;
desc_status;
desc_errors;
vlan_tag;
‘Legacy Format’ for the Intel Pro1000 network controller’s Receive Descriptors
Ethernet packet layout
• Total size normally can vary from 64 bytes
up to 1522 bytes (unless ‘jumbo’ packets
and/or ‘undersized’ packets are enabled)
• The NIC expects a 14-byte packet ‘header’
and it appends a 4-byte CRC check-sum
0
6
destination MAC address
(6-bytes)
12
source MAC address
(6-bytes)
14
Type/length
(2-bytes)
the packet’s data ‘payload’ goes here
(usually varies from 56 to 1500 bytes)
Cyclic Redundancy
Checksum (4-bytes)
Rx-Descriptor Ring-Buffer
0x00
RDBA
base-address
0x10
0x20
RDH (head)
0x30
RDLEN
(in bytes)
0x40
0x50
0x60
RDT (tail)
0x70
0x80
= owned by hardware (nic)
= owned by software (cpu)
Circular buffer (128-bytes minimum – and must be a multiple of 128 bytes)
Packet-buffers and descriptors
• Our ‘nicrx.c’ module allocates 8 buffers of
size 2K-bytes (i.e., more than enough for
any normal Ethernet packets)
for the Rx Descriptor Queue (128 bytes)
for the eight packet-buffers
16K + 128 bytes allocated (8 packet-buffers, plus Rx-Descriptor Queue)
RxDesc Status-field
7
6
PIF
IPCS
5
4
TCPCS UDPCS
3
2
VP
1
IXSM
0
EOP
DD
DD = Descriptor Done (1=yes, 0=no) shows if nic is finished with descriptor
EOP = End Of Packet (1=yes, 0=no) shows if this packet is logically last
IXSM = Ignore Checksum Indications (1=yes, 0=no)
VP = VLAN Packet match (1=yes, 0=no)
USPCS = UDP Checksum calculated in packet (1=yes, 0=no)
TCPCS = TCP Checksum calculated in packet (1=yes, 0=no)
IPCS = IPv4 Checksum calculated on packet (1=yes, 0=no)
PIF = Passed In exact Filter (1=yes, 0=no) shows if software must check
RxDesc Error-field
7
6
RXE
5
IPE
TCPE
4
3
reserved
=0
reserved
=0
2
1
SEQ
0
SE
RXE = Received-data Error (1=yes, 0=no)
IPE = IPv4-checksum error
TCPE = TCP/UDP checksum error (1=yes, 0=no)
SEQ = Sequence error (1=yes, 0=no)
SE = Symbol Error (1=yes, 0=no)
CE = CRC Error or alignment error (1=yes, 0=no)
CE
Essential ‘receive’ registers
enum
{
E1000_CTRL
E1000_STATUS
E1000_RCRL
E1000_RDBAL
E1000_RDBAH
E1000_RDLEN
E1000_RDH
E1000_RDT
E1000_RXDCTL
E1000_RA
};
0x0000,
0x0008,
0x0100,
0x2800,
0x2804,
0x2808,
0x2810,
0X2818,
0x2828,
0x5400,
// Device Control
// Device Status
// Receive Control
// Rx Descriptor Base Address Low
// Rx Descriptor Base Address High
// Rx Descriptor Length
// Rx Descriptor Head
// Rx Descriptor Tail
// Rx Descriptor Control
// Receive address-filter Array
Programming steps
1) Detect the presence of the 82573L network controller (VENDOR_ID, DEVICE_ID)
2) Obtain the physical address-range where the nic’s device-registers are mapped
3) Ask the kernel to map this address range into the kernel’s virtual address-space
4) Copy the network controller’s MAC-address into a 6-byte array for future access
5) Allocate a block of kernel memory large enough for our descriptors and buffers
6) Insure that the network controller’s ‘Bus Master’ capability has been enabled
7) Select our desired configuration-options for the DEVICE CONTROL register
8) Perform a nic ‘reset’ operation (by toggling bit 26), then delay until reset completes
9) Select our desired configuration-options for the RECEIVE CONTROL register
10) Initialize our array of Receive Descriptors with the physical addresses of buffers
11) Initialize the Receive Engine’s registers (for Rx-Descriptor Queue and Control)
12) Give ‘ownership’ of all of our Rx-Descriptors to the network controller
13) Enable the Receive Engine
14) Install our ‘/proc/nicrx’ pseudo-file (for user-diagnostic purposes)
NOTE: Steps 1) through 8) are the same as for our ‘nictx.c’ kernel module.
Device Control (0x0000)
31
30
29
R
PHY
VME
RST
=0
15
28
27
26
TFCE RFCE RST
14
13
R
R
R
=0
=0
=0
12
25
23
22
21
R
R
R
R
R
=0
=0
=0
=0
=0
11
FRC FRC
DPLX SPD
FD = Full-Duplex
GIOMD = GIO Master Disable
SLU = Set Link Up
FRCSPD = Force Speed
FRCDPLX = Force Duplex
24
10
R
=0
9
SPEED
8
=0
19
ADV
D3
WUC
7
R
20
6
S
L
U
R
18
17
D/UD
status
=0
5
4
R
R
=0
=0
3
R
R
R
=0
=0
=1
16
2
1
0
GIO
M
0
D
R
0=0
F
D
SPEED (00=10Mbps, 01=100Mbps, 10=1000Mbps, 11=reserved)
ADVD3WUP = Advertise Cold Wake Up Capability
D/UD = Dock/Undock status
RFCE = Rx Flow-Control Enable
RST = Device Reset
TFCE = Tx Flow-Control Enable
PHYRST = Phy Reset
VME = VLAN Mode Enable
We used 0x04000A49 to initiate a ‘device reset’ operation
82573L
Device Status (0x0008)
31
?
30
29
28
0
0
27
0
26
0
25
24
0
0
23
0
0
22
0
21
20
0
0
19
18
GIO
Master
EN
17
0
16
0
0
some undocumented functionality?
15
0
14
0
13
0
12
0
11
0
10
PHY
RA
9
ASDV
8
7
6
I
S
L
SPEED
L
O
S
U
FD = Full-Duplex
LU = Link Up
TXOFF = Transmission Paused
SPEED (00=10Mbps,01=100Mbps, 10=1000Mbps, 11=reserved)
ASDV = Auto-negotiation Speed Detection Value
PHYRA = PHY Reset Asserted
5
0
4
TX
OFF
3
2
1
0
Function
ID 0
0U
L
F
D
82573L
Receive Control (0x0100)
31
R
=0
30
29
0
28
27
F
0LXBUF
15
B
A
M
14
R
=0
13
MO
26
25
SE
CRC
BSEX
12
24
R
23
22
PMCF
DPF
=0
11
DTYP
10
9
8
RDMTS
21
20
R
CFI
=0
7
6
I
S
L
LBML
O
S
U
19
CFI
EN
5
18
17
BSIZE
VFE
4
16
3
2
LPE MPE UPE SBP
0
1
0
E
R
0N
=0
EN = Receive Enable
DTYP = Descriptor Type
DPF = Discard Pause Frames
SBP = Store Bad Packets
MO = Multicast Offset
PMCF = Pass MAC Control Frames
UPE = Unicast Promiscuous Enable
BAM = Broadcast Accept Mode
BSEX = Buffer Size Extension
MPE = Multicast Promiscuous Enable BSIZE = Receive Buffer Size
SECRC = Strip Ethernet CRC
LPE = Long Packet reception Enable VFE = VLAN Filter Enable
FLXBUF = Flexible Buffer size
LBM = Loopback Mode
CFIEN = Canonical Form Indicator Enable
RDMTS = Rx-Descriptor Minimum Threshold Size
CFI = Canonical Form Indicator bit-value
We used 0x1440821C in RCTL to prepare the ‘receive engine’ prior to enabling it
Rx-Descriptor Control (0x2828)
31
0
30
29
0
28
0
15
0
27
0
25
24
0
0
0
G
R
A
N
13
12
11
10
0
14
26
0
FRC HTHRESH
FRC
0
DPLX
SPD
(Host
Threshold)
23
22
0
0
9
8
21
20
19
18
17
16
WTHRESH
(Writeback Threshold)
7
I
L
0
O0
S
6
00
5
A
S
D
E
4
3
2
1
L
PTHRESH
R
0
00 00
(Prefetch
S Threshold)
T
“This register controls the fetching and write back of receive descriptors.
The three threshold values are used to determine when descriptors are
read from, and written to, host memory. Their values can be in units of
cache lines or of descriptors (each descriptor is 16 bytes), based on the
value of the GRAN bit (0=cache lines, 1=descriptors). When GRAN = 1,
all descriptors are written back (even if not requested).” --Intel manual
Recommended for 82573: 0x01010000 (GRAN=1, WTHRESH=1)
0
PCI Bus Master DMA
82573L i/o-memory
Host’s Dynamic Random Access Memory
on-chip RX descriptors
packet-buffer
on-chip TX descriptors
packet-buffer
Descriptor Queue
packet-buffer
packet-buffer
packet-buffer
packet-buffer
packet-buffer
DMA
RX and TX FIFOs
(32-KB total)
Pthresh and Hthresh
• When the number of unprocessed descriptors in
the NIC’s on-chip memory has fallen below the
Prefetch Threshold, and the number of valid
descriptors in host memory which are owned by
the NIC is at least equal to the Host Threshold,
then the NIC will fetch that number of descriptors
in a single ‘burst’ DMA-transfer
Wthresh
• When the number of descriptors waiting in
the NIC’s on-chip memory to be written
back to Host memory is at least equal to
the Writeback Thrershold, then the NIC will
write back that number of descriptors in a
single ‘burst’ DMA-transfer
Experiment #1
• Let’s install our ‘nicrx.c’ kernel module on
one host, and use the ‘cat’ command to
view its queue of Rx-Descriptors:
$ /sbin/insmod nicrx.ko
$ cat /proc/nicrx
• Then let’s install our ‘nictx.c’ module on a
different host on the same local network:
$ /sbin/insmod nictx.ko
• Now look again at the receive descriptors!
Experiment #2
• Install our ‘dram.c’ device-driver module
on both of these host-machines, and use
our ‘fileview’ utility to look at the contents
of each module’s packet-buffers – you’ll
find their physical addresses displayed if
you use ‘cat’ to see the descriptor-queues:
$ cat /proc/nictx
and $ cat /proc/nicrx
Experiment #3
• Our ‘nicrx.c’ module had enabled both the
Unicast and Multicast promiscuous modes
• So let’s watch what happens when we use
the ‘/sbin/ifconfig’ command (with ‘sudo’)
to bring up a secondary network interface
on another host on the same segment of
our local network
• Do you recognize these new packets?
Experiment #4
• With ‘nicrx.c’ module installed on one host,
log on to two other hosts on the same LAN
and bring up their ‘eth1’ network interfaces
• Use the ‘ping’ command on one of these
two hosts to try contacting the other one
• What do you observe about any packets
that are received by the host where our
‘nicrx.c’ module had been installed?
In-class exercise
• Suppose you turn off the UPE-bit (bit #3)
in the Receive Control register (in nicrx.c)
• From another host on the same segment,
bring up its ‘eth1’ interface, then adjust its
routing table so that all multicast packets
are sent out via the secondary interface:
$ sudo /sbin/route add –net 224.0.0.0 netmask 255.0.0.0 device eth1
• If you ‘ping’ a multicast address, will the
ICMP datagram be received by ‘nicrx.c’?
Download