lesson07.ppt

advertisement
What’s needed to receive?
A look at the minimum steps
required for programming our
anchor nic’s to receive packets
A disappointment
• Our former ‘nicwatch.cpp’ application does
not seem to work reliably to show packets
being received by the 82573L controller
• It was based on the ‘raw sockets’ protocol
implemented within the Linux kernel’s vast
networking subsystem, thus offering us the
prospect of a ‘hardware-independent’ tool - if only it would show us all the packets!
Two purposes…
• So let’s discard ‘nicwatch.cpp’ in favor of
writing our own hardware-specific module
that WILL be able to show us all the nic’s
received packets, independently of Linux’s
various layers of networking protocol code
• And let’s keep it as simple as possible, so
we can see which programming steps are
the truly essential ones for the 82573L nic
Accessing 82573L registers
• Device registers are hardware mapped to
a range of addresses in physical memory
• We can get the location and extent of this
memory-range from a BAR register in the
82573L device’s PCI Configuration Space
• We then request the Linux kernel to setup
an I/O ‘remapping’ of this memory-range to
‘virtual’ addresses within kernel-space
i/o-memory remapping
Local-APIC
IO-APIC
nic
registers
APIC registers
nic registers
vram
1-GB
kernel code/data
vram
user
space
dynamic
ram
physical address-space
‘virtual’ address-space
3-GB
Kernel memory allocation
• The NIC requires that some host memory
for packet-buffers and receive descriptors
• The kernel provides a ‘helper function’ for
reserving a suitable region of memory in
kernel-space which is both ‘non-pageable’
and ‘physically contiguous’ (i.e., kzalloc())
• It’s our job is to decide how much memory
our network controller hardware will need
Ethernet packet layout
• Total size normally can vary from 64 bytes
up to 1522 bytes (unless ‘jumbo’ packets
and/or ‘undersized’ packets are enabled)
• The NIC expects a 14-byte packet ‘header’
and it appends a 4-byte CRC check-sum
0
6
destination MAC address
(6-bytes)
12
source MAC address
(6-bytes)
14
Type/length
(2-bytes)
the packet’s data ‘payload’ goes here
(usually varies from 56 to 1500 bytes)
Cyclic Redundancy
Checksum (4-bytes)
Rx-Descriptor Ring-Buffer
0x00
RDBA
base-address
0x10
0x20
RDH (head)
0x30
RDLEN
(in bytes)
0x40
0x50
0x60
RDT (tail)
0x70
0x80
= owned by hardware (nic)
= owned by software (cpu)
Circular buffer (128-bytes minimum – and must be a multiple of 128 bytes)
Our ‘nicspy.c’ module
• It will be a ‘character-mode’ device-driver
• It will only implement ‘read()’ and ‘ioctl()’
• The ‘read()’ function will cause a task to
sleep until a network packet has arrived
• An interrupt-handler will wake up the task
• A ‘get_info’ function will be provided as a
debugging aid, so the NIC’s Rx descriptorqueue can be conveniently inspected
Sixteen packet-buffers
• Our ‘nicspy.c’ driver allocates 16 buffers of
size 1536 bytes (i.e., for normal ethernet)
for the Rx Descriptor Queue (256 bytes)
for the sixteen packet-buffers
unused
unused
32-KB allocated (16 packet-buffers, plus Rx-Descriptor Queue)
#define KMEM_SIZE
void
0x8000 // 32KB = size of kernel memory allocation
*kmem = kzalloc( KMEM_SIZE, GFP_KERNEL );
if ( !kmem ) return –ENOMEM;
Format for an Rx Descriptor
16 bytes
Base-address (64-bits)
The device-driver initializes
this ‘base-address’ field
with the physical address
of a packet-buffer
Packetlength
Packetstatus errors
checksum
VLAN
tag
The network controller
will ‘write-back’ the
values for these fields
when it has transferred
a received packet’s data
into this packet-buffer
Suggested C syntax
typedef struct
{
unsigned long long
unsigned short
unsigned short
unsigned char
unsigned char
unsigned short
} RX_DESCRIPTOR;
base_address;
packet_length;
packet_cksum;
desc_status;
desc_errors;
VLAN_tag;
‘Legacy Format’ for the Intel Pro1000 network controller’s Receive Descriptors
RxDesc Status-field
7
6
PIF
IPCS
5
4
TCPCS UDPCS
3
2
VP
1
IXSM
0
EOP
DD
DD = Descriptor Done (1=yes, 0=no) shows if nic is finished with descriptor
EOP = End Of Packet (1=yes, 0=no) shows if this packet is logically last
IXSM = Ignore Checksum Indications (1=yes, 0=no)
VP = VLAN Packet match (1=yes, 0=no)
USPCS = UDP Checksum calculated in packet (1=yes, 0=no)
TCPCS = TCP Checksum calculated in packet (1=yes, 0=no)
IPCS = IPv4 Checksum calculated on packet (1=yes, 0=no)
PIF = Passed In-exact Filter (1=yes, 0=no) shows if software must check
RxDesc Error-field
7
6
RXE
5
IPE
TCPE
4
3
reserved
=0
reserved
=0
2
1
SEQ
0
SE
RXE = Received-data Error (1=yes, 0=no)
IPE = IPv4-checksum error
TCPE = TCP/UDP checksum error (1=yes, 0=no)
SEQ = Sequence error (1=yes, 0=no)
SE = Symbol Error (1=yes, 0=no)
CE = CRC Error or alignment error (1=yes, 0=no)
CE
Essential ‘receive’ registers
enum
{
E1000_CTRL
E1000_STATUS
E1000_ICR
E1000_IMS
E1000_IMC
E1000_RCRL
E1000_RDBAL
E1000_RDBAH
E1000_RDLEN
E1000_RDH
E1000_RDT
E1000_RXDCTL
E1000_RA
};
0x0000,
0x0008,
0x00C0,
0x00D0,
0x00D8,
0x0100,
0x2800,
0x2804,
0x2808,
0x2810,
0X2818,
0x2828,
0x5400,
// Device Control
// Device Status
// Interrupt Cause Read
// Interrupt Mask Set
// Interrupt Mask Clear
// Receive Control
// Rx Descriptor Base Address Low
// Rx Descriptor Base Address High
// Rx Descriptor Length
// Rx Descriptor Head
// Rx Descriptor Tail
// Rx Descriptor Control
// Receive address-filter Array
Receive Control (0x0100)
31
R
=0
30
29
0
28
27
F
0LXBUF
15
B
A
M
14
R
=0
13
MO
26
25
SE
CRC
BSEX
12
24
R
23
22
PMCF
DPF
=0
11
DTYP
10
9
8
RDMTS
21
20
R
CFI
=0
7
6
I
S
L
LBML
O
S
U
19
CFI
EN
5
18
17
BSIZE
VFE
4
16
3
2
LPE MPE UPE SBP
0
1
0
E
R
0N
=0
EN = Receive Enable
DTYP = Descriptor Type
DPF = Discard Pause Frames
SBP = Store Bad Packets
MO = Multicast Offset
PMCF = Pass MAC Control Frames
UPE = Unicast Promiscuous Enable
BAM = Broadcast Accept Mode
BSEX = Buffer Size Extension
MPE = Multicast Promiscuous Enable BSIZE = Receive Buffer Size
SECRC = Strip Ethernet CRC
LPE = Long Packet reception Enable VFE = VLAN Filter Enable
FLXBUF = Flexible Buffer size
LBM = Loopback Mode
CFIEN = Canonical Form Indicator Enable
RDMTS = Rx-Descriptor Minimum Threshold Size
CFI = Canonical Form Indicator bit-value
We used 0x0000801C in RCTL to prepare the ‘receive engine’ prior to enabling it
Device Control (0x0000)
31
30
29
R
PHY
VME
RST
=0
15
28
27
26
TFCE RFCE RST
14
13
R
R
R
=0
=0
=0
12
25
23
22
21
R
R
R
R
R
=0
=0
=0
=0
=0
11
FRC FRC
DPLX SPD
FD = Full-Duplex
GIOMD = GIO Master Disable
SLU = Set Link Up
FRCSPD = Force Speed
FRCDPLX = Force Duplex
24
10
R
=0
9
SPEED
8
=0
19
ADV
D3
WUC
7
R
20
6
S
L
U
R
18
17
D/UD
status
=0
5
4
R
R
=0
=0
3
R
R
R
=0
=0
=1
16
2
1
0
GIO
M
0
D
R
0=0
F
D
SPEED (00=10Mbps, 01=100Mbps, 10=1000Mbps, 11=reserved)
ADVD3WUP = Advertise Cold Wake Up Capability
D/UD = Dock/Undock status
RFCE = Rx Flow-Control Enable
RST = Device Reset
TFCE = Tx Flow-Control Enable
PHYRST = Phy Reset
VME = VLAN Mode Enable
We used 0x040C0241 to initiate a ‘device reset’ operation
82573L
Device Status (0x0008)
31
?
30
29
28
0
0
27
0
26
0
25
24
0
0
23
0
0
22
0
21
20
0
0
19
18
GIO
Master
EN
17
0
16
0
0
some undocumented functionality?
15
0
14
0
13
0
12
0
11
0
10
PHY
RA
9
ASDV
8
7
6
I
S
L
SPEED
L
O
S
U
FD = Full-Duplex
LU = Link Up
TXOFF = Transmission Paused
SPEED (00=10Mbps,01=100Mbps, 10=1000Mbps, 11=reserved)
ASDV = Auto-negotiation Speed Detection Value
PHYRA = PHY Reset Asserted
5
0
4
TX
OFF
3
2
1
0
Function
ID 0
0U
L
F
D
82573L
PCI Bus Master DMA
82573L i/o-memory
Host’s Dynamic Random Access Memory
Rx Descriptor Queue
on-chip RX descriptors
packet-buffer
on-chip TX descriptors
packet-buffer
packet-buffer
packet-buffer
packet-buffer
packet-buffer
packet-buffer
DMA
RX and TX FIFOs
(32-KB total)
Our ‘read()’ algorithm
unsigned int
rx_curr;
ssize_t my_read( struct file *file, char *buf, size_t len, loff_t *pos )
{
// our global variable ‘rx_curr’ is the descriptor-array index
// for the next receive-buffer descriptor to be processed
if ( this descriptor’s status is zero ) put calling task to sleep;
// wakeup the task when a fresh packet has been received
copy received data from the packet-buffer to user’s buffer
clear this descriptor’s status
advance our global variable ‘rx_curr’ to the next descriptor
return the number of data-bytes transferred
}
‘nicspy.cpp’
• This application calls our device-driver’s
‘read()’ function repeatedly, and displays
the ‘raw’ ethernet packet-data each time
• It requires our ‘nicspy.c’ device-driver to be
installed in the kernel, obviously
• There’s no ‘clash’ of filenames here – and
their similarity helps keep them together:
nicspy.c and nicspy.ko
nicspy.cpp and nicspy
(the kernel-side)
( the user-side )
in-class demo
• We can install ‘nicspy.ko’ on one of our
anchor machines – making sure ‘eth1’ is
‘down’ before we do our module-install –
and then we run ‘nicspy’ on that machine
• Next we install our ‘nicping.ko’ module on
some other anchor machine – be sure its
‘eth1’ interface is ‘down’ beforehand – and
then use ‘cat /proc/nicping’ for a transmit
Download