ND The research group on Networks & Distributed systems

advertisement
ND
The research group on
Networks & Distributed systems
ND activities
• ICON – Interconnection Networks
– Interconnection networks are tightly coupled/short distance networks
with extreme demands on bandwidth, latency, and delivery
– Problem areas: Effective routing/topologies, fault-tolerance/dynamic
reconfiguration, and Quality of Service
• VINNER – End-to-end Internet communications
– Problem area: Network resilience – as a set of methods and
techniques that improve the user perception of network robustness
and reliability.
2
ND activities
• QuA - Support of Quality of Service in component
architectures
– Problem area: How to develop applications that are sensitive to QoS
on a component architecture platform and how dynamic QoS
management and adaptation can be supported
• Relay – Resource utilization in time-dependent
distributed systems
– Problem area: Reduce the effects of resource limitations and
geographical distances in interactive distributed applications – through
a toolkit of kernel extensions, programmable subsystems, protocols
and decision methods
3
Assessment of Data Path
Implementations for Download
and Streaming
Pål Halvorsen1,2, Tom Anders Dalseng1 and Carsten Griwodz1,2
1Department
2Simula
of Informatics, University of Oslo, Norway
Research Laboratory, Norway
Overview
• Motivation
• Existing mechanisms in Linux
• Possible enhancements
• Summary and Conclusions
5
Delivery Systems
Network
bus(es)
6
Delivery Systems
application
user space
kernel space
file system
bus(es)
communication
system
7
 several in-memory data movements
and context switches
Intel Hub Architecture
Pentium 4
Processor
registers
application
file system
communication
system
disk
network card
cache(s)
memory
controller
hub
RDRAM
file system
RDRAM
communication system
RDRAM
application
RDRAM
I/O
controller
hub
PCI slots
network card
PCI slots
PCI slots
disk
8
Motivation
• Data copy operations are expensive
– consume CPU, memory, hub, bus and interface resources
(proportional to data size)
– profiling shows that ~40% of CPU time is consumed by copying
data between user and kernel space
– gap between memory and CPU speeds increase
– different access times to different banks
• System calls make a lot of switches between user and kernel
space
9
Zero–Copy
Data Paths
application
user space
kernel space
file system
data_pointer
bus(es)
communication
system
data_pointer
10
Motivation
• Data copy operations are expensive
– consume CPU, memory, hub, bus and interface resources
(proportional to data size)
– profiling shows that ~40% of CPU time is consumed by copying
data between user and kernel
– gap between memory and CPU speeds increase
– different access times to different banks
• System calls make a lot of switches between user and kernel
space
• A lot of research has been performed in this area
• BUT, what is the status today of commodity operating
systems?
11
Existing Linux
Data Paths
Content
Download
application
user space
kernel space
file system
bus(es)
communication
system
13
Content Download: read / send
application
application
buffer
read
send
kernel
copy
page cache
DMA transfer


2n copy operations
2n system calls
copy
socket buffer
DMA transfer
14
Content Download: mmap / send
application
mmap
send
kernel
page cache
DMA transfer


n copy operations
1 + n system calls
copy
socket buffer
DMA transfer
15
Content Download: sendfile
application
sendfile
kernel
gather DMA transfer
page cache
append descriptor
socket buffer
DMA transfer

0 copy operations

1 system calls
16
Download:
Results
• Content
Tested transfer
of 1 GB
file on Linux 2.6
• Both UDP (with enhancements) and TCP
UDP
TCP
17
Streaming
application
user space
kernel space
file system
bus(es)
communication
system
18
Streaming: read / send
application
application buffer
read
send
kernel
copy
page cache
DMA transfer


2n copy operations
2n system calls
copy
socket buffer
DMA transfer
19
Streaming: read / writev
application
application buffer
read
kernel
writev
copy
page cache
DMA transfer


copy
copy
socket buffer
DMA transfer
3n copy operations  One copy more than previous solution
2n system calls
20
Streaming: mmap / send
application
application buffer
mmap
cork send send uncork
kernel
copy
page cache
DMA transfer


2n copy operations
1 + 4n system calls
copy
socket buffer
DMA transfer
21
Streaming: mmap / writev
application
application buffer
mmap
writev
kernel
copy
page cache
DMA transfer


copy
socket buffer
DMA transfer
2n copy operations
1 + n system calls  Three calls less than previous solution
22
Streaming: sendfile
application
application buffer
cork
send
sendfile
kernel
uncork
gather DMA transfer
page cache
append descriptor
copy
socket buffer
DMA transfer


n copy operations
4n system calls
23
Results
• Streaming:
Tested streaming
of 1 GB file on Linux 2.6
• RTP over UDP
Compared to not sending an RTP header
over UDP, we get an increase of 29%
(additional send call)
More copy operations and system calls required
 potential for improvements
TCP sendfile
(content download)
24
Enhanced Streaming
Data Paths
Enhanced Streaming: mmap / msend
application
mmap
application buffer
cork send
kernel
DMA transfer


msend
send uncork
gather DMA transfer
page cache
msend allows to send data from an
mmap’ed file without copy
appendcopy
descriptor
copy
socket buffer
DMA transfer
n copy operations  One copy less than previous solution
1 + 4n system calls
26
Enhanced Streaming: mmap / rtpmsend
application
mmap
application buffer
cork send
kernel
sendrtpmsend
uncork
gather DMA transfer
page cache
RTP header copy integrated into
msend system call
append descriptor
copy
socket buffer
DMA transfer


n copy operations
1 + n system calls  Three calls less than previous solution
27
Enhanced Streaming: mmap/krtpmsend
application
application buffer
An RTP engine in the kernel
adds RTP headers
rtpmsend
krtpmsend
kernel
gather DMA transfer
copy
RTP engine
page cache
append descriptor
socket buffer
DMA transfer


0 copy operations  One copy less than previous solution
1 system call
 One call less than previous solution
28
Enhanced Streaming: rtpsendfile
application
application buffer
cork
send
RTP header copy integrated into
sendfile system call
sendfile
rtpsendfile
uncork
kernel
gather DMA transfer
page cache
append descriptor
copy
socket buffer
DMA transfer


n copy operations
n system calls  existing solution requires three more calls per packet
29
Enhanced Streaming: krtpsendfile
application
application buffer
An RTP engine in the kernel
adds RTP headers
rtpsendfile
krtpsendfile
kernel
gather DMA transfer
copy
RTP engine
page cache
append descriptor
socket buffer
DMA transfer


0 copy operations  One copy less than previous solution
1 system call
 One call less than previous solution
30
Streaming:
• Enhanced
Tested streaming
of 1Results
GB file on Linux 2.6
• RTP over UDP
mmap based mechanisms
sendfile based mechanisms
31
Conclusions
• Current commodity operating systems still pay a high
price for streaming services
• However, small changes in the system call layer might be
sufficient to remove most of the overhead
• Conclusively, commodity operating systems still have
potential for improvement with respect to streaming
support
• What can we hope to be supported?
• Road ahead:
optimize the code, make patch and submit to kernel.org
32
Questions??
33
Download