TPC-H Published Results

advertisement
SQL Server Scaling on Big Iron
(NUMA) Systems TPC-H
Joe Chang
jchang6@yahoo.com
www.qdpma.com
About Joe Chang
SQL Server Execution Plan Cost Model
True cost structure by system architecture
Decoding statblob (distribution statistics)
SQL Clone – statistics-only database
Tools
ExecStats – cross-reference index use by SQLexecution plan
Performance Monitoring,
Profiler/Trace aggregation
TPC-H
TPC-H
DSS – 22 queries, geometric mean
60X range plan cost, comparable actual range
Power – single stream
Tests ability to scale parallel execution plans
Throughput – multiple streams
Scale Factor 1 – Line item data is 1GB
875MB with DATE instead of DATETIME
Only single column indexes allowed, Ad-hoc
Observed Scaling Behaviors
Good scaling, leveling off at high DOP
Perfect Scaling ???
Super Scaling
Negative Scaling
especially at high DOP
Execution Plan change
Completely different behavior
TPC-H Published Results
TPC-H SF 100GB
2-way Xeon 5355, 5570, 5680, Opt 6176
100,000
80,000
Xeon 5355
5570 HDD
5570 SSD
5680 SSD
5570 Fusion
Opt 6176
60,000
40,000
20,000
0
Power
Throughput
Between 2-way Xeon 5570, all are close, HDD has best throughput,
SATA SSD has best composite, and Fusion-IO has be power.
Westmere and Magny-Cours, both 192GB memory, are very close
QphH
TPC-H SF 300GB
8x QC/6C & 4x12C Opt,
160,000
Opt 8360 4C
Opt 8439 6C
X 7560 8C
140,000
120,000
Opt 8384 4C
Opt 6716 12
100,000
80,000
60,000
40,000
20,000
0
Power
Throughput
6C Istanbul improved over 4C Shanghai by 45% Power,
73% Through-put, 59% overall.
4x12C 2.3GHz improved17% over 8x6C 2.8GHz
QphH
TPC-H SF 1000
140,000
120,000
100,000
80,000
60,000
Opt 8439 SQL
Superdome
40,000
Opt 8439 Sybase
Superdome 2
20,000
0
Power
Throughput
QphH
Oracle RAC, 64-nodes, 128 Xeon 5450 quad-core 3.0GHz processors
Power 782,608, 5.6X higher than Superdome 2 with 64-cores
TPC-H SF 3TB
X7460 & X7560
250,000
200,000
150,000
16 x X7460
8 x 7560
POWER6
M9000
100,000
50,000
0
Power
Throughput
QphH
Nehalem-EX 64 cores better
than 96 Core 2.
TPC-H SF 100GB, 300GB & 3TB
100,000
80,000
Xeon 5355
5570 HDD
5570 SSD
5680 SSD
5570 Fusion
Opt 6176
SF100 2-way
60,000
40,000
20,000
0
Power
Throughput
QphH
Westmere and Magny-Cours
are very close
Between 2-way Xeon 5570, all
are close, HDD has best
through-put, SATA SSD has
best composite, and Fusion-IO
has be power
160,000
Opt 8360 4C
Opt 8439 6C
X 7560 8C
140,000
120,000
SF300 8x QC/6C & 4x12C
Opt 8384 4C
Opt 6716 12
6C Istanbul improved over 4C
Shanghai by 45% Power, 73%
Through-put, 59% overall.
4x12C 2.3GHz improved17%
over 8x6C 2.8GHz
100,000
80,000
60,000
40,000
20,000
0
Power
Throughput
SF 3TB X7460 & X7560
200,000
150,000
16 x X7460
8 x 7560
100,000
50,000
0
QphH
32 x Pwr6
Nehalem-EX 64 cores better
than 96 Core 2.
TPC-H Published Results
SQL Server excels in Power
Limited by Geometric mean, anomalies
Trails in Throughput
Other DBMS get better throughput than power
SQL Server throughput below Power
by wide margin
Speculation – SQL Server does not throttle
back parallelism with load?
TPC-H SF100
Total Mem
Through
put
Processors
GHz Cores GB
SQL
SF
2 Xeon 5355
2.66
8
64
5sp2
100
23,378.0 13,381.0 17,686.7
2x5570 HDD
2.93
8
144 8sp1
100
67,712.9 38,019.1 50,738.4
2x5570 SSD
2.93
8
144 8sp1
100
70,048.5 37,749.1 51,422.4
5570 Fusion
2.93
8
144 8sp1
100
72,110.5 36,190.8 51,085.6
2 Xeon 5680
3.33
12
192
8r2
100
99,426.3 55,038.2 73,974.6
2 Opt 6176
2.3
24
192
8r2
100
94,761.5 53,855.6 71,438.3
Power
QphH
TPC-H SF300
Processors
Total Mem
GHz Cores GB
SQL
SF
Power
Through
put
QphH
4 Opt 8220
2.8
8
128 5rtm
300
25,206.4 13,283.8 18,298.5
8 Opt 8360
2.5
32
256 8rtm
300
67,287.4 41,526.4 52,860.2
8 Opt 8384
2.7
32
256 8rtm
300
75,161.2 44,271.9 57,684.7
8 Opt 8439
2.8
48
256 8sp1
300 109,067.1 76,869.0 91,558.2
4 Opt 6176
2.3
48
512
8r2
300 129,198.3 89,547.7 107,561.2
4 Xeon 7560
2.26
32
640
8r2
300 152,453.1 96,585.4 121,345.6
All of the above are HP results?, Sun result Opt 8384, sp1,
Pwr 67,095.6, Thr 45,343.5, QphH 55,157.5
TPC-H 1TB
Processors
Total Mem
GHz Cores GB
SQL
SF
Power
Through
put
QphH
8 Opt 8439
2.8
48
512 8R2? 1000 95,789.1 69,367.6 81,367.6
8 Opt 8439
2.8
48
384 ASE 1000 108,436.8 96,652.7 102,375.3
Itanium 9140
1.6
64
384
O11g
1000 111,557.0 128,259.1 123,323.1
Itanium 9350
1.73
64
512
O11R2
1000 139,181.0 141,188.1 140,181.1
Xeon 5450
3.0
512 2048
O RAC
1000 782,608.7 1,740,122 1,166,977
TPC-H 3TB
Processors
Total Mem
GHz Cores GB
SQL
SF
Power
Through
put
QphH
16 Xeon 7460 2.66
96
1024 8r2
3000 120,254.8 87,841.4 102,254.8
8 Xeon 7560
2.26
64
512
8r2
3000 185,297.7 142,685.6 162,601.7
POWER6
5.0
64
512
Sybase
3000 142,790.7 171,607.4 156,537.3
SPARC
2.88
128
512
O11R2
3000 182,350.7 216,967.7 198,907.5
TPC-H Published Results
Processors
GHz Cores GB
Total Mem
SQL
SF
Power
Through
put
QphH
2 Xeon 5355
2.66
8
64
5sp2
100
23,378
13,381
17,686.7
2 Xeon 5570
2.93
8
144 8sp1
100
72,110.5 36,190.8 51,085.6
2 Xeon 5680
3.33
12
192
8r2
100
99,426.3 55,038.2 73,974.6
2 Opt 6176
2.3
24
192
8r2
100
94,761.5 53,855.6 71,438.3
4 Opt 8220
2.8
8
128 5rtm
300
25,206.4 13,283.8 18,298.5
8 Opt 8360
2.5
32
256 8rtm
300
67,287.4 41,526.4 52,860.2
8 Opt 8384
2.7
32
256 8rtm
300
75,161.2 44,271.9 57,684.7
8 Opt 8439
2.8
48
256 8sp1
300 109,067.1 76,869.0 91,558.2
4 Opt 6176
2.3
48
512
8r2
300 129,198.3 89,547.7 107,561.2
8 Xeon 7560
2.26
64
512
8r2
3000 185,297.7 142,685.6 162,601.7
SF100 Big Queries (sec)
60
50
5570 HDD
5570 SSD
5570 FusionIO
5680 SSD
6176 SSD
Query time in sec
40
30
20
10
0
Q1
Q9
Q13
Q18
Q21
Xeon 5570 with SATA SSD poor on Q9, reason unknown
Both Xeon 5680 and Opteron 6176 big improvement over Xeon 5570
SF100 Middle Q
8
7
5570 HDD
5570 SSD
5680 SSD
6176 SSD
5570 FusionIO
Query time in sec
6
5
4
3
2
1
0
Q3
Q5
Q7
Q8
Q10
Q11
Q12
Q16
Xeon 5570-HDD and 5680-SSD poor on Q12, reason unknown
Opteron 6176 poor on Q11
Q22
SF100 Small Queries
3.0
Query time in sec
2.5
5570 HDD
5570 SSD
5680 SSD
6176 SSD
5570 FusionIO
2.0
1.5
1.0
0.5
0.0
Q2
Q4
Q6
Q14
Xeon 5680 and Opteron poor on Q20
Note limited scaling on Q2, & 17
Q15
Q17
Q19
Q20
SF300 Big Queries
120
8 x 8360 QC 2M
100
8 x 8384 QC 6M
Query time in sec
8 x 8439 6C
4 x 6176 12C
80
4 x 7560 8C
60
40
20
0
Q1
Q9
Q13
Q18
Opteron 6176 poor relative to 8439 on Q9 & 13,
same number of total cores
Q21
SF300 Middle Q
28
24
8x8384 QC 6M
8x8439 6C
4x6176 12C
4x7560 8C
20
Query time in sec
8x8360 QC 2M
16
12
8
4
0
Q3
Q5
Q7
Q8
Q10
Q11
Q12
Q16
Opteron 6176 much better than 8439 on Q11 & 19
Worse on Q12
Q19
Q20
Q22
SF300 Small Q
6
5
8 x 8360 QC 2M
8 x 8384 QC 6M
8 x 8439 6C
4 x 6176 12C
Query time in sec
4 x 7560 8C
4
3
2
1
0
Q2
Q4
Q6
Q14
Q15
Opteron 6176 much better on Q2, even with 8439 on others
Q17
SF1000 Sybase vs. SQL Server
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Query time, Sybase relative SQL Server, both on DL785 48-core
SF1000 Large Queries
400
350
SQL Server
Sybase
300
250
200
150
100
50
0
Q1
Q9
Q13
Q18
Q21
SF1000 Middle Queries
80
SQL Server
70
Sybase
60
50
40
30
20
10
0
Q3
Q5
Q7
Q8
Q10
Q11
Q12
Q17
Q19
SF1000 Small Queries
35
30
SQL Server
25
Sybase
20
15
10
5
0
Q2
Q4
Q6
Q14
Q15
Q16
Q20
Q22
SF1000 Itanium - Superdome
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Query time, Superdome 2 versus Superdome,
16-way quad-core and 32-way dual-core
512-core C2 RAC vs. 64-core It2
14
12
10
8
6
4
2
0
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Query time, Superdome 2 versus RAC,
16-way quad-core (64 cores) and 64-node 2-way quad-core (512 cores)
Oracle RAC 5.6X higher Power
SF 3TB – 8×7560 versus 16×7460
5.6X
2.5
2.0
1.5
1.0
0.5
0.0
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Broadly 50% faster overall, 5X+ on one, slower on 2, comparable on 3
64 cores, PWR6 vs. Xeon 7560
6
5
4
3
2
1
0
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Query time, POWER6 relative to X7560
Overall, Xeon 7560 is 30% faster on power,
but wide variations on individual queries, some with Pwr6 faster
SF3000 Big Queries
600
Uni 16x6
500
DL980 8x8
Pwr6
400
M9000
300
200
100
0
Q1
Q9
Q13
Q18
Q21
SF3000 Middle and Small Q
200
Uni 16x6
180
DL980 8x8
160
Pwr6
140
M9000
120
100
80
60
40
20
0
Q3
Q5
Q7
Q8
Q10
Q11
Q12
Q16
Q17
Q19
60
Uni 16x6
50
DL980 8x8
Pwr6
40
M9000
30
20
10
0
Q2
Q4
Q6
Q14
Q15
Q16
Q20
Q22
TPC-H Summary
Scaling is impressive on some SQL
Limited ability (value) is scaling small Q
Anomalies, negative scaling
TPC-H Queries
Q1 Pricing Summary Report
Query 2 Minimum Cost Supplier
Wordy, but only touches the small tables, second lowest plan cost (Q15)
Q3
Q6 Forecasting Revenue Change
Q7 Volume Shipping
Q8 National Market Share
Q9 Product Type Profit Measure
Q11 Important Stock Identification
Non-Parallel
Parallel
Q12 Random IO?
Q13 Why does Q13 have perfect scaling?
Q17 Small Quantity Order Revenue
Q18 Large Volume Customer
Non-Parallel
Parallel
Q19
Q20?
Date functions are usually written as
because Line Item date columns are “date” type
CAST helps DOP 1 plan, but get bad plan for parallel
This query may get a poor execution plan
Q21 Suppliers Who Kept Orders Waiting
Note 3 references to Line Item
Q22
Download