R. Hashemian , D. Krishnamurthy , M. Arlitt , N. Carlsson

advertisement
R. Hashemian 1 , D. Krishnamurthy 1 , M. Arlitt 2 , N. Carlsson 3
1. University of Calgary
2. HP Labs
3. Linköping University
by: Raoufehsadat Hashemian
The 4th ACM/SPEC International Conference on
Performance Engineering
ICPE 2013
OUTLINE
• Introduction
• Scalability Evaluation
• Scalability Enhancement Approach
• Validation
• Conclusion
2
Improving the Scalability of a Multi-core Web Server
ICPE13
INTRODUCTION
PROBLEM DESCRIPTION
• Enterprise applications
Response Time
• Performance: Improving QoS
• e.g. Lower response times
• Cost: Less money spent on hardware
• e.g. Improving effective utilization
40
30
20
10
0
0
50
CPU Utilization (%)
100
• Goal: Higher utilization and acceptable response time
• How to achieve this “Goal” for Web servers running on Multicore hardware?
3
Improving the Scalability of a Multi-core Web Server
ICPE13
INTRODUCTION
BACKGROUND
• Web servers before multi-core
• Mature topic, wide-ranging discussions
• Multi-core architecture
• Most research on batch (non-interactive) workload
• Web servers running on Multi-core
• BUS problem in UMA system (Veal et al.`07)
• Multiple Web server instances: 1 instance per
processor (Scogland et al.`09, Boyd et.al,10 Gaud et. al,11)
5
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY EVALUATION
SCALABILITY MEASUREMENT
• Measure Web server scalability for two workloads
• Evaluate the effectiveness of multiple Web server
approach in the server’s scalability
• Scalability
• Maximum Achievable Throughput (MAT)
4
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY EVALUATION
EXPERIMENTAL SETUP
• 2 x 4 core Intel Xeon E5620 processors
NUMA Architecture
Processor 0
Microarch.
Nehalem
Frequency
Processor 1
2.4 GHz
C
0
C
2
C
4
C
6
C
0
C
2
C
4
C
6
L1 Cache
32K IC - 32K DC
L1
L1
L1
L1
L1
L1
L1
L1
L2 Cache
256K
L2
L2
L2
L2
L2
L2
L2
L2
L3 Cache
12M (Inclusive)
Inter-conn.
QPI -5.86 GT/s
Memory
16GB - DDR3-1333
• OS: Linux, kernel 3, Ubuntu
L3
Memory
Bank 0
L3
Memory
Bank 1
• Webserver: Lighttpd
• Application Server: php (FastCGI module)
6
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY EVALUATION
WORKLOADS
• TCP/IP Intensive workload
• High TCP connection rate
• Processing: low user level & high kernel level
• 1 KB static file, up to 155,000 requests/second
• SPECweb Support workload
• Both static requests and php requests
• Wider range of request types
• Processing: high user level & moderate kernel level
7
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY EVALUATION
CONFIGURATION TUNING
• Change default lighttpd recommendation (1 Lighttpd worker
process per core)
• Disable default Linux scheduling (use affinity)
• Distribute interrupt handling load
• Improved MAT up to 69%
• Balanced utilization levels for the eight cores
• Fully utilized the server
8
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY EVALUATION
RESULTS
• TCP/IP Intensive workload
Maximum Achievable Throughput
Scalability
• Sub-linear
146,000 req/sec
• SPECweb Support workload
Maximum Achievable Throughput
Scalability
• Almost linear
23,000 req/sec
Number of Cores
9
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY EVALUATION
RESPONSE DISTRIBUTION ANALYSIS
1
Response time vs. Core Count
0.8
• “Low response time” requests
P [ X <= x ]
• Static requests
• Performance degrades
• “High response time” requests
1
• Dynamic requests
• Performance improves
0.98
0.6
0.96
0.4
0.94
0.92
0.2
0.9
0
1 Core
0 -1
10
Knowing this behavior, how can we
improve the scalability?
10
2
10
0
10
2 Core
1
10
10
4 Core
8 Core
2
3
10
10
x = Response time (msec)
CDF of Response times
80% CPU Utilization
SPECweb Support Workload
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY ENHANCEMENT
MULTIPLE WEBSITE REPLICAS
• Approach: Use 1 Web server instance per processor
• Goal: Reduce inter-processor data migration
Single Replica Process
NIC1 Queue
NIC 2 Queue
Replica 1 Process
Replica 2 Process
NIC 1 Queue
NIC 2 Queue
Processor 0
Processor 1
Processor 0
Processor 1
NIC
1
NIC
2
NIC
1
NIC
2
Original Configuration
with one replica
11
Alternative Configuration
with two replicas
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY ENHANCEMENT
Response time (ms)
Response time (ms)
EVALUATING NEW CONFIGURATION
Request rate (req/sec)
Request rate (req/sec)
TCP/IP Intensive Workload
SPECweb Support Workload
 Scalability Improvement
 Scalability Degradation
 MAT increment: 12.3%
 MAT decrement: 10%
12
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY ENHANCEMENT
EVALUATING NEW CONFIGURATION
• The response time inflation for Dynamic requests dominates the improvement
achieved for Static requests
• Mean and 99.9th percentile response times increase with 2-replicas
• Hypothesis:
CDF of Response times
80% CPU Utilization
22,000 req/sec
SPECweb Support workload
1
P [X <= x]
• Cache contention with 2replicas due to the larger
working set size of dynamic
requests
0.95
0.9
0
10
2
10
4
10
Response Time (ms)
13
Improving the Scalability of a Multi-core Web Server
ICPE13
VALIDATION
INTER-CONNECT TRAFFIC
• Inter-connect traffic decreased
• No significant decrement
• Improved performance for Static
significantly
• Improved performance
Inter-connect Traffic
(Bytes/sec)
Inter-connect Traffic
(Bytes/sec)
requests
Request Rate (req/sec)
TCP/IP Intensive Workload
14
Request Rate (req/sec)
SPECweb Support Workload
Improving the Scalability of a Multi-core Web Server
ICPE13
VALIDATION
LAST LEVEL CACHE
• Last Level cache (LLC) HIT ratio degrades with 2-replica configuration
L3 Cache HIT Ratio
Confirms the cache contention hypothesis
Request Rate (req/sec)
SPECweb Support Workload
15
Improving the Scalability of a Multi-core Web Server
ICPE13
CONCLUSIONS
• Multi-core Web server: scalable after tuning
• 80% utilization with acceptable response time
• Multiple Website Replicas
• The effect on the scalability is workload dependent
• Dynamic requests trigger LLC contention
• Contention may be architecture and application dependent
• Future plan:
• Design and develop an automatic, workload adaptive technique
which decides about best configuration
16
Improving the Scalability of a Multi-core Web Server
ICPE13
Raoufeh Hashemian
University of Calgary, Canada
rhashem@ucalgary.ca
This work is financially supported by:
17
Improving the Scalability of a Multi-core Web Server
ICPE13
REFERENCES
•
Cherkasova et al.`00:Characterizing Temporal Locality and its Impact on Web Server Performance,
International Conference on Computer Communications and Networks’00, Cherkasova; Ciardo; HP Labs
•
Elnozahy et al.`03: Energy Conservation Policies for Web Servers, USITS '03, Elnozahy; Kistler; Ramakrishnan; IBM
•
Majo et al.`12: Matching Memory Access Patterns and Data Placement for NUMA Systems, GC’12, Majo;
Gross; ETH
•
Blagodurov et al.`11: A case for NUMA-aware contention management on multicore systems, USENIX
ATC'11, Blagodurov; Zhuravlev; Dashti; Fedorova; SFU
•
Veal et al.`07: Performance scalability of a multi-core web server. ACM/IEEE ANCS’07,Veal; Foong; Intel
•
Scogland et al.`09: Asymmetric interactions in symmetric multi-core systems: Analysis, enhancements and
evaluation. ACM/IEEE SC’08, Scogland; Balaji; Feng; Narayanaswamy,
•
Boyd et.al`10: An analysis of linux scalability to many cores, USENIX OSDI’10, Boyd-Wickizer; Clements;
Mao; Pesterev; Kaashoek; Morris; Zeldovich, MIT
•
Gaud et. al`11: Application-level optimizations on numa multicore architectures: the apache case study, RRLIG-011, Gaud; Lachaize; Lepers; Muller; Quema.
-2
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY EVALUATION
CONFIGURATION TUNING
• Network interrupt handling
• 4 RSS queue per NIC port
• Each queue bind to one core
Response time (msec)
2.0
Before Distributing Int. Load
After Distributing Int. Load
1.5
1.0
0.5
0.0
0
50,000
100,000
150,000
200,000
Rate (req/sec)
-3
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY EVALUATION
CONFIGURATION TUNING
• OS scheduling
Response time (msec)
• Binding each lighttpd process to 1 core
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
No Affinity
With affinity
0
-4
50000
100000
150000
Rate (req/sec)
200000
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY EVALUATION
WEB TIER VS. APPLICATION TIER
•
Static: Requests with lower response time
•
• Processed only in Web tier (lighttpd)
Dynamic: Requests with higher response time
Processed only in Web and application tiers (lighttpd and php)
Response time (ms)
•
-5
File size (Byte)
Improving the Scalability of a Multi-core Web Server
ICPE13
SCALABILITY EVALUATION
EXPERIMENTAL SETUP
-6
Improving the Scalability of a Multi-core Web Server
ICPE13
Download