How to Minimize Transport Protocol Processing: Implementation and Evaluation of

advertisement
How to Minimize Transport Protocol Processing:
Implementation and Evaluation of
Network Level Framing
Pål Halvorsen, Thomas Plagemann, and Vera Goebel
Institute for Informatics, University of Oslo
Norway
4th International Workshop on
Multimedia Network Systems and Applications (MNSA ’02),
Vienna, Austria, July 2002
Overview
 Application scenario
 The INSTANCE project
 Network Level Framing (NLF)
 design and implementation
 performance evaluation
 Summary and conclusions
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Application Scenario
Media-on-Demand server:
Applicable in applications like News- or
Video-on-Demand provided by city-wide
cable or pay-per-view companies
Multimedia Storage
Server
Network
Retrieval is the bottleneck:
Some important factors:
• Memory management
• Communication protocol processing
• Error management
MNSA’02, Vienna, Austria, July 2002
Network
Project goals:
Optimize performance within a
single server:
• Reduce resource requirements
• Maximize number of clients
© 2002 Pål Halvorsen
The INSTANCE Project
 We try to make optimal use of a
given set of resources:

memory architecture

integrated error management
 network
Project goals:
level framing
(NLF)
Optimize performance within a
single server:
• Reduce resource requirements
• Maximize number of clients
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Traditional Approach
TRANSPORT
TRANSPORT
TRANSPORT
TRANSPORT
NETWORK
NETWORK
NETWORK
NETWORK
LINK
LINK
LINK
LINK
Upload to server
Frequency: low (1)
MNSA’02, Vienna, Austria, July 2002
Download from server
Frequency: very high
© 2002 Pål Halvorsen
Network Level Framing (NLF): Basic Idea
TRANSPORT
TRANSPORT
TRANSPORT
TRANSPORT
NETWORK
NETWORK
NETWORK
NETWORK
LINK
LINK
LINK
LINK
Upload to server
Frequency: low (1)
MNSA’02, Vienna, Austria, July 2002
Download from server
Frequency: very high
© 2002 Pål Halvorsen
When to Store Packets
UDP
Transport Layer
TCP
TCP
or
or
UDP/FEC
UDP/FEC
UDP
UDP
Network Layer
IP
IP
IP
IP
Link Layer
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Splitting the UDP Protocol
udp_PreOut()
udp_output()
Prepend UDP and IP headers
Temporarily connect
Prepare pseudo header for
checksum, clear unknown fields
udp_output()
Prepend UDP and IP headers
Precalculate checksum
Prepare pseudo header
for checksum
Calculate checksum
UDP
UDP
udp_QuickOut()
Update UDP and IP headers
Fill in some other IP header fields
UDP
Update checksum, i.e., only add
checksum of prior unknown fields
Hand over datagram to IP
Fill in other IP header fields
Disconnect connect
socket
MNSA’02, Vienna, Austria, July 2002
Hand over datagram to IP
© 2002 Pål Halvorsen
Traditional Checksum Operations – I
 The UDP checksum covers three fields:
 A 12 byte pseudo header containg fields from the IP header
 The 8 byte UDP header
 The UDP data (payload)

Simplified checksum calculation function (in_cksum):
u_16int_t *w;
int checksum;
for each mbuf in packet {
w = mbuf -> m_data;
while data in mbuf {
checksum += w;
w++;
}
}
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Traditional Checksum Operations – II
 Traditional checksum operation:
u_16int_t *w;
int checksum;
for each mbuf in packet {
w = mbuf -> m_data;
while data in mbuf {
checksum += w;
w++;
}
}
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Modified Checksum Operations
 NLF checksum operation:
+
+
=
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Implementation – I
data
 Straight forward implementation:
precalculated
header
(meta-data)
 To allow flexibility, we have one data and
one meta-data file:
data
meta-data
UDP
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Implementation – II
 NLF version 1:
 most of the UDP/IP processing is spent on checksum calculation


precalculate checksum over data payload
during transmission time:


generate header
calculate checksum over header and add precalculated payload checksum
 NLF version 2:
 several reports show increased performance using header templates


precalculate checksum over data payload
during stream open:



generate header template
calculate header checksum
during transmission time:


block copy header template
add header template checksum, payload checksum, and packet length field
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Performance: Test Setup
 Implemented in NetBSD 1.5.2
 Dell Precision Workstation 620
 PIII 933 MHz CPU
 3 COM 1 Gbps NIC
 Software probe
 RDTSC instruction
 CPUID instruction
 probe overhead 206 cycles
 Performed tests using 1 KB, 2 KB, 4 KB, and 8 KB UDP packets
 Transmitting 225 MB of data
 Data is transmitted using the zero-copy data path
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Performance: Checksum
Overhead increases linearly
with payload size
11899
23674
7000
CPU cycles
6000
5000
Traditional
4000
UDP data
3000
UDP data +
header
2000
1000
0
1 KB
2 KB
4 KB
Packet size
8 KB
Overhead is constant
regardless of payload
~ 50 cycles less
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Performance: Header Overhead
1000
CPU cycles
800
~25 cycles more
600
NLF, v1
NLF, v2
400
200
0
1 KB
2 KB
4 KB
Packet size
8 KB
NLF version 3: use header template checksum, but
generate header instead of block copy
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Performance: UDP
12304
24108
7000
6000
CPU cycles
5000
Traditional
NLF, v1
NLF, v2
NLF, v3
4000
3000
2000
1000
0
1 KB
MNSA’02, Vienna, Austria, July 2002
2 KB
4 KB
Packet size
8 KB
© 2002 Pål Halvorsen
Conclusions and Future Work

Network Level Framing reduces communication
system processing by precalculating


payload checksum (off-line)
header checksum (stream open)
 Gain per packet is dependent of packet payload size,
e.g., 1 KB (8 KB)  97.3 % (99.6 %)
 Our mechanisms (at least) double the
number of concurrent clients
 Ongoing and future work:
 NLF in lower protocols (ongoing)
 On-board processing
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Questions??
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Related Work
 Checksum caching in memory
high data rates 
cached elements will be removed
before it can be reused
 Header templates
 block-copying is time consuming
 On-Board processing
useful and becoming “off-the-shelve” hardware
may be nice to combine with NLF
MNSA’02, Vienna, Austria, July 2002
© 2002 Pål Halvorsen
Download