Document 10638477

advertisement
Critical Section Characterization and Acceleration
Timothy Zhu, Huapeng Zhou
in Real World Applications
Carnegie Mellon University
Problem Statement
Ø Performance of multithreaded applications is
limited by critical sections
Ø Measure statistics of critical sections in real
world applications
Probability
0.25
0.2
0.15
0.1
0.05
0
0
5
10
# threads waiting for bottleneck
critical section (N=11)
Average Waiting Time
Motivation
70000
60000
Ø Build an analytical model to understand
critical section behavior
50000
40000
30000
20000
10000
0
0
10
20
# total threads (N)
Ø Simulate how accelerating specific critical
sections affects performance
The Model
Methodology
Ø Key insight: Critical sections look like queues in
a closed system
Z = time executing
Ø The Workflow
non-critical section
X = throughput
= number of
iterations around
loop per sec
pthread
hooking
library
N = number of threads
scheduling
simulation
post
processing
and
analysis
raw
data
real world
application
trace
data
ACS
performance
prediction
Ø Benchmarks
R = time waiting on and
executing critical section
Theoretical Bounds
E[R] ≥ max(D, N*Dmax – E[Z])
X ≤ min(N/(D+E[Z]), 1/Dmax)
Dmax is the maximum duration of executing a critical
section
D is the sum of durations executing critical sections
ACS
simulation
q memcached
q mysql: oltp-simple
q mysql: oltp-nontrx
q mysql: oltp-complex
Ø Runs on dual socket 6-core Xeon processors
(with HT OS sees 24 cores) with 48 Gb RAM
Experimental Results
Identify the bottleneck critical section
4.408 4.275 4.349 3.889 4.163 4.367 N N* 2 4 6 10 12 20 11.203
9.588
12.990
17.575
13.759
13.791
mysql nontrx ACS x2 throughput Dmax D improvement 653
2010
0.699
829
2210
0.870
621
1851
0.918
715
2024
0.969
865
2350
1.038
1088 2903
1.227
Ø  It is not always a good idea to speedup critical
sections at the cost of slowing down the other
threads. When it is a good idea, it is important
to speedup the bottleneck critical sections.
mysql complex
mysql nontrx
memcached SJF
memcached SJF fair
mysql simple SJF
mysql simple SJF fair
mysql complex SJF
mysql complex SJF fair
mysql nontrx SJF
mysql nontrx SJF fair
25.00%
1.08
20.00%
15.00%
10.00%
5.00%
0.00%
0
5
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
0
5
10
# threads
10
# threads
15
15
20
20
top: no ACS, bottom: ACS x2
Ø  Queueing theory can provide insightful
performance predictions, particularly when
the number of threads is small or large.
Normalized throughput improvement
2 4 6 10 12 20 mysql simple
30.00%
% error
N* memcached
% error
N memcached ACS x2 throughput Dmax D improvement 1214 1402 0.786 1358 1685 1.064 1450 1840 1.281 1851 2389 1.737 2022 2580 1.820 2599 3372 1.904 The impact of scheduling
ACS throughput prediction
1.06
1.04
1.02
1
0.98
0.96
1
2
4
6
# threads
10
12
20
Ø  Preliminary results show that there may be
some potential benefit for scheduling critical
sections.
Download