Understanding & Addressing Blocking-Induced Server Latency Yaoping Ruan Vivek Pai

advertisement
Understanding & Addressing
Blocking-Induced Server Latency
Yaoping Ruan
IBM T.J. Watson Research Center
Vivek Pai
Princeton University
Background – Web servers
 Previous work focuses on throughput
 SPECWeb99 mixes throughput & latency
 Network delay dominated end-user
latency
 Server latency contribution increasing
 Connection speed increase
 Multiple data centers reduces round-trip time
2
Paper Contributions
 Understand server-induced latency
 Observe in both Flash & Apache
 Identify blocking in filesystem-related queues
 Quantify effects – service inversion
 Address problem




In both servers
Using portable techniques
With scalable results
5-50x latency reduction
3
Outline
 Experimental setup & measurement
methodology
 Identify blocking in Web servers
 New server design
 Results of the new servers
4
Experimental Setup
 Server-client setup
 3 GHz P4 w/ 1 GB memory
 FreeBSD 4.6 operating system
 Web servers
 Flash & Apache 1.3
 Fairly tuned for performance
 Workloads
 SPECweb99 static
 3 GB dataset and 1024 simultaneous
connections
5
Latency Analysis Methodology
 Response time vs
load
 Infinite-demand
 20, 40, 60, 80, 90, 95%
of the infinite-demand
request rate
 Record mean & 5th,
50th, 95th percentiles
of latency CDF
Flash Latency Profile – 336 Mb/s
6
Evidence for Blocking in Flash
 Event-driven model
 Select( ) or kevent( )
 Each call returns ready
events
 About 60-70
events/call
 But we have free
CPU – should return
more often, fewer
ready events
CDF of # of ready events for Flash
7
Evidence for Blocking in Apache
 Multiple processes


Blocking expected
Hard to identify
excessive blocking
 Two configurations
 Sample % of ready
processes per sec.
 Bimodal (0 or 60+)

Worse with 1024
8
Identifying Blocking using DeBox
 Exclusive vnode locks
 To reduce complexity and avoid possible
deadlocks
 Directory walk locks
 Lock overlapping between the parent and
child directory
 Locks during disk access
 Only downgrade the parent’s lock when the
child needs disk access
 Result: lock convoys
9
Growth of Median Latencies
 But medians
shouldn’t grow
this fast
Response Time (ms)
250
200
150
100
 Working set <
200MB
50
0
20%
80%
95%
100%
Load Level
Apache
Flash
10
Response Time vs Dataset Size
>99.5 %
cache hits
11
Service Inversion
 CDF breakdowns
 Split CDF by decile
 Group responses by
size
 Service inversion =
difference between
actual order and
ideal order
10
9
…
4
3
2
1
12
Ideal Service Breakdown
100%
90%
80%
large files
70%
60%
50%
40%
30%
small files
20%
10%
0%
1
2
3
Series1
4
5
Series2
6
7
Series3
8
9
10
Series4
13
Flash Service Breakdown
100%
large files
90%
80%
70%
60%
50%
40%
30%
20%
small files
10%
0%
1
2
3
Series1
4
5
Series2
6
7
Series3
8
9
10
Series4
Flash CDF breakdown by decile at load level 0.95
Service inversion at this level is 0.58
14
Apache Service Breakdown
100%
large files
90%
80%
70%
60%
50%
40%
30%
20%
small files
10%
0%
1
2
3
Series1
4
5
Series2
6
7
Series3
8
9
10
Series4
Apache CDF breakdown by decile at load level 0.95
Service inversion at this level is 0.58
15
Solution
 Let blocking happen, elsewhere
 Move filesystem calls out of process
 Shared backend
 Cache open file descriptors
 Perform misses via helper processes
 Prefetch cold disk blocks
 IPC better than blocking
16
Flashpache
Apache
Flashpache
17
Flash Ready Events
Mean events: Flash – 61, fdpass – 15, New-Flash – 1.6
18
Flashpache Ready Processes
19
New-Flash Latency Profile
Flash Latency Profile – 336 Mb/s
New-Flash Latency Profile – 450 Mb/s
Latency improvement : 6X in mean, 43X in median
Median & 95th percentile virtually flat
20
Flashpache Performance
Apache Latency Profile – 241 Mb/s Flashpache Latency Profile – 273 Mb/s
Latency improvement: 15X in mean, 9X in median
Median & 95th percentile virtually flat
21
Response Time vs. Dataset
22
Flash Service Breakdowns
100%
100%
90%
90%
80%
80%
70%
70%
60%
60%
50%
50%
40%
40%
30%
30%
20%
20%
10%
10%
0%
0%
1
2
3
Series1
4
5
Series2
Flash
6
7
Series3
8
Series4
9
10
1
2
3
Series1
4
5
Series2
6
Series3
7
8
9
10
Series4
New-Flash
23
Flashpache Service Breakdowns
100%
100%
90%
90%
80%
80%
70%
70%
60%
60%
50%
50%
40%
40%
30%
30%
20%
20%
10%
10%
0%
0%
1
2
3
Series1
4
5
Series2
6
7
Series3
Apache
8
Series4
9
10
1
2
3
Series1
4
5
Series2
6
Series3
7
8
9
10
Series4
Flashpache
24
Latency Scalability
Apache
Response Time (msec)
Response Time (msec)
250
250
200
150
100
50
0
Flashpache
200
150
100
50
0
P II
P III
P4
P II
P III
P4
Response time at 0.95 load level
25
In The Paper
 More details, measurements,
breakdowns
 Quantifying service inversion
26
Conclusion
 Much server latency originates from
head-of-line blocking
 Impact on latency higher than
throughput
 Blocking degrades service fairness
 Possible to solve in server application
27
Thank you
www.cs.princeton.edu/nsg/papers
Princeton University
Network Systems Group
Apache on Linux
Apache
Flashpache
29
Effects on Response Time
Flash latency CDFs
30
Service Inversion
31
Download