Increasing Web Server Throughput with Network Interface Data

advertisement
Increasing Web Server Throughput
with Network Interface Data Caching
Hyong-youb Kim, Vijay S. Pai, and Scott Rixner
Rice Computer Architecture Group
http://www.cs.rice.edu/CS/Architecture/
October 9, 2002
Anatomy of a Web Request
 Static
CPU
content web server
Main
Memory
Request
Interconnect
Network
Interface Network
File
Headers
Request
File
Headers
95 % utilization
2
Problem
 Inefficient
use of local interconnect
– Repeated transfers
– Transfer every bit of data sent out to network
 Local
interconnect bottleneck
 Transfer overhead exacerbates inefficiency
– Overhead reduces available bandwidth
– E.g. Peripheral Component Interconnect (PCI)
• 30 % transfer overhead
3
Solution
 Network
–
–
–
–
interface data caching
Cache data in network interface
Reduces interconnect traffic
Software-controlled cache
Minimal changes to the operating system
 Prototype
web server
– Up to 57% reduction in PCI traffic
– Up to 31% increase in server performance
– Peak 1571 Mb/s of content throughput
• Breaks PCI bottleneck
4
Outline
 Background
 Network
Interface Data Caching
 Implementation
 Experimental Prototype / Results
 Summary
5
Network Interface Data Cache
 Software-controlled
CPU
Main
Memory
Request
cache in network interface
Interconnect
X
Network
Interface Network
Headers
Request
File
Cache
Headers
File
6
Web Traces
 Five
web traces
– Realistic working set / file distribution
 Berkeley
computer science department
 IBM
 NASA
Kennedy Space Center
 Rice computer science department
 1998 World Cup
7
Content Locality

Block cache with 4KB block size
8-16MB caches capture locality
8
Outline
 Background
 Network
Interface Data Caching
 Implementation
– OS modification / NIC API
 Experimental
Prototype / Results
 Summary
9
Unmodified Operating System
 Transmit
File
data flow
Page
Network
Stack
Page
Page
Device
Driver
Packet
Packet
Packet
Page
2.
Protocol
processing
3. 1.
Inform
network
interface
Identify pages
Break into packets
10
Modified Operating System


OS completely controls network interface data cache
Minimal changes to the OS
Page
File
Page
Page
Network
Stack
Page
Device
Driver
Packet
Packet
Packet
Cache
Directory
3. Protocol processing
2. Inform
Annotate
5.
network interface
Break
into
packets
1.
Identify
pages
4. Query directory
(New
step)
(Unmodified)
(Unmodified)
(Unmodified)
(New step)
11
Operating System Modification
 Device
Driver
– Completely controls cache
– Makes allocation/use/replacement decisions
 Cache
directory (in device driver)
– An entry is a tuple of
•
•
•
•
file identifier
offset within file
file revision number
flags
– Sufficient to maintain cache coherence
12
Network Interface API




Initialize
Insert data into the cache
Append data to a packet
Append cached data to a packet
Append
Main
Memory
Append cached data Append cached data
TX Buffer
Interconnect
Cache
Network
Interface
13
Outline
 Background
 Network
Interface Data Caching
 Implementation
 Experimental Prototype / Results
 Summary
14
Prototype Server
 Athlon
2200+ processor, 2GB RAM
 64-bit, 33 MHz PCI bus (2 Gb/s)
 Two Gigabit Ethernet NICs (4 Gb/s)
– Based on programmable Tigon 2 controller
– Firmware implements new API
 FreeBSD
4.6
– 850 lines of new code/150 lines of kernel changes
 thttpd
web server
– High performance lightweight web server
– Supports zero-copy sendfile
15
Results: PCI Traffic
PCI
saturated
30 % Overhead
~60
%%
Content
traffic
60
utilization
1198 Mb/s of
HTTP content
~1260 Mb/s is limit!
16
Results: PCI Traffic Reduction
36-57 % reduction
four traces
Good
temporal
reuse
Lowwith
temporal
reuse
CPU
bottleneck
Low
PCI
utilization
17
Results: World Cup
Temporal reuse (84 %)
PCI utilization (69 %)
57 % traffic reduction
7% throughput increase
794 Mb/s w/o caching
849 Mb/s w/ caching
CPU bottleneck
18
Results: Rice
Temporal reuse (40 %)
PCI utilization (91 %)
40 % traffic reduction
17% throughput increase
1126 Mb/s w/o caching
1322 Mb/s w/ caching
Breaks PCI bottleneck
19
Results: NASA
Temporal reuse (71 %)
PCI utilization (95 %)
54 % traffic reduction
31% throughput increase
1198 Mb/s w/o caching
1571 Mb/s w/ caching
Break PCI bottleneck
20
Summary
 Network
–
–
–
–
interface data caching
Exploits web request locality
Network protocol independent
Interconnect architecture independent
Minimal changes to OS
 36-57%
reductions in PCI traffic
 7-31% increase in server performance
 Peak 1571Mb/s of content throughput
– Surpasses PCI bottleneck
21
Download