Critical Section Characterization and Acceleration Timothy Zhu, Huapeng Zhou in Real World Applications Carnegie Mellon University Problem Statement Ø Performance of multithreaded applications is limited by critical sections Ø Measure statistics of critical sections in real world applications Probability 0.25 0.2 0.15 0.1 0.05 0 0 5 10 # threads waiting for bottleneck critical section (N=11) Average Waiting Time Motivation 70000 60000 Ø Build an analytical model to understand critical section behavior 50000 40000 30000 20000 10000 0 0 10 20 # total threads (N) Ø Simulate how accelerating specific critical sections affects performance The Model Methodology Ø Key insight: Critical sections look like queues in a closed system Z = time executing Ø The Workflow non-critical section X = throughput = number of iterations around loop per sec pthread hooking library N = number of threads scheduling simulation post processing and analysis raw data real world application trace data ACS performance prediction Ø Benchmarks R = time waiting on and executing critical section Theoretical Bounds E[R] ≥ max(D, N*Dmax – E[Z]) X ≤ min(N/(D+E[Z]), 1/Dmax) Dmax is the maximum duration of executing a critical section D is the sum of durations executing critical sections ACS simulation q memcached q mysql: oltp-simple q mysql: oltp-nontrx q mysql: oltp-complex Ø Runs on dual socket 6-core Xeon processors (with HT OS sees 24 cores) with 48 Gb RAM Experimental Results Identify the bottleneck critical section 4.408 4.275 4.349 3.889 4.163 4.367 N N* 2 4 6 10 12 20 11.203 9.588 12.990 17.575 13.759 13.791 mysql nontrx ACS x2 throughput Dmax D improvement 653 2010 0.699 829 2210 0.870 621 1851 0.918 715 2024 0.969 865 2350 1.038 1088 2903 1.227 Ø It is not always a good idea to speedup critical sections at the cost of slowing down the other threads. When it is a good idea, it is important to speedup the bottleneck critical sections. mysql complex mysql nontrx memcached SJF memcached SJF fair mysql simple SJF mysql simple SJF fair mysql complex SJF mysql complex SJF fair mysql nontrx SJF mysql nontrx SJF fair 25.00% 1.08 20.00% 15.00% 10.00% 5.00% 0.00% 0 5 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 0 5 10 # threads 10 # threads 15 15 20 20 top: no ACS, bottom: ACS x2 Ø Queueing theory can provide insightful performance predictions, particularly when the number of threads is small or large. Normalized throughput improvement 2 4 6 10 12 20 mysql simple 30.00% % error N* memcached % error N memcached ACS x2 throughput Dmax D improvement 1214 1402 0.786 1358 1685 1.064 1450 1840 1.281 1851 2389 1.737 2022 2580 1.820 2599 3372 1.904 The impact of scheduling ACS throughput prediction 1.06 1.04 1.02 1 0.98 0.96 1 2 4 6 # threads 10 12 20 Ø Preliminary results show that there may be some potential benefit for scheduling critical sections.