Mobile - Computer Sciences Dept.

advertisement
Department of Computer Science
x86
RISC
CISC
1
Department of Computer Science
Power Struggles: Revisiting the RISC vs. CISC
Debate on Contemporary ARM and x86
Architectures
Emily Blem, Jaikrishnan Menon,
and Karthikeyan Sankaralingam
2
Department of Computer Science
x86
x86
x86
What role if any does RISC vs. CISC play in
this power struggle?
RISC
RISC
& CISC
RISCCISC
& CISC
3
Department of Computer Science
ISA being RISC or CISC does not matter for
modern microprocessors
4
Department of Computer Science
Overview


Methods
11 Key Findings
6 on performance
 3 on power
 2 on power/performance tradeoffs


Conclusion
5
Department of Computer Science
Overview


Methods
11 Key Findings
6 on performance
 3 on power
 2 on power/performance tradeoffs


Conclusion
6
Department of Computer Science
Platforms
BeagleBoard
ARM Cortex A8
PandaBoard
ARM Cortex A9
Linux 2.6
Intel Atom
N450
GCC
Intel Sandy Bridge
Core i7
7
Department of Computer Science
Workloads
Mobile
CoreMark
WebKit
Desktop
Server
SPEC CPU2006
Lighttpd
CLucene
Database kernels
8
/ 37
Department of Computer Science
Measurements

Performance measurement on real hardware

Extensive use of performance counters


Cycles, instructions, cache misses, branch misses…
Power measurements using Wattsup meters
9
Department of Computer Science
Overview


Methods
11 Key Findings
6 on performance
 3 on power
 2 on power/performance tradeoffs


Conclusion
10
Department of Computer Science
But first…
What is performance?
1
𝑇 = 𝑁 × 𝐶𝑃𝐼 ×
𝑓
“Iron Law of Performance” – Clark
11
Department of Computer Science
Performance
30
(130)
(72)
(24)
(344)
Normalized Time
25
20
A8
Atom
15
A9
10
i7
5
0
Mobile
SPEC - INT
SPEC - FP
Server
Key Finding 1
Large performance differences due to varying clock
frequencies and core characteristics
12
Department of Computer Science
Cycle counts
6
Normalized Cycles
5
4
A8
Atom
3
A9
2
i7
1
0
Mobile
SPEC - INT
SPEC - FP
Server
Key Finding 2
Cycle count differences are less than 2.5X
13
Department of Computer Science
Instruction counts
Normalized Macro-ops
Macro-op counts are nearly same across
2
ARM and x86
1.5
ARM
1
x86
0.5
0
Mobile
SPEC - INT
SPEC - FP
Server
Key Finding 3
CPI is less for x86 implementations
14
Department of Computer Science
Instruction Mix
Key Finding 4
ISA effects are indistinguishable
15
Department of Computer Science
Key Findings
1. Large performance gaps across cores
2. After accounting for clock frequency,
performance gaps within 2.5X
3. CPI is less for x86 implementations
4. ISA effects are indistinguishable
16
Department of Computer Science
Why are performance gaps present?
10
Normalized Cycle Count
9
Instruction Count
8
Cache Related
7
Branch Related
6
Issue Width Related
5
4
3
2
1
0
Benchmarks
17
Department of Computer Science
Case study: omnetpp
Cycle Count (Billions)
5
Insts
4
Branch
Branch
Misses
Misses
3
I-Cache
Microarchitecture
Key
ISAFinding
Effect: Effect
5
1:
2:
3:
Issue
2
A9
A9experiences
Performance
experiences
A9’s
ARM
issue
has
15x
gaps
29x
width
4%
more
more
due
more
isinstruction
to
branch
half
instructions
microarchitecture
thatmispredictions
of
cache
i7’s misses
Width
1
0
i7
A9
18
Department of Computer Science
Key Findings
4.
Large performance gaps across cores
After accounting for clock frequency, performance gaps within 2.5X
CPI is less for x86 implementations
ISA effects are indistinguishable
5.
Performance gaps due to microarchitecture
6.
RISC or CISC choice does not play a role in
performance-driving µarch decisions
1.
2.
3.

Details in paper
19
Department of Computer Science
Power and Energy
1.
2.
3.
4.
5.
6.
Large performance gaps across cores
After accounting for clock frequency, performance gaps within 2.5X
CPI is less for x86 implementations
ISA effects are indistinguishable
Performance gaps due to microarchitecture
RISC or CISC choice does not play a role in performance-driving µarch decisions
7. x86 implementations are higher power – dictated
by performance targets
8. Power consumption is tied to microarchitectural
design decisions
9. Energy consumption also tied to microarchitectural
design decisions
20
Department of Computer Science
Power-Performance Tradeoffs
40
Power (W)
35
30
25
i7
20
15 Cortex A8
Cortex A9
10
Atom
5
0
0
1
2
3
4
5
6
7
8
Performance (MIPS)
Key Finding 10
Regardless of ISA, processors follow cubic
power/performance trends
21
Department of Computer Science
Energy-Delay analysis
Considering ED, A15 is 46% lower
than any other design we considered.
Considering
ED,2,i7i7isisbest
more than 2X
Considering
ED>1.4
better than next best design
Key Finding 11
Microarchitecture and design choices are key –
not the ISA
22
Department of Computer Science
Conclusion
ISA being RISC or CISC does not matter for
power and performance of modern
processors.
23
Department of Computer Science
What is the ISA’s role?

Supporting specialization
AVX crypto, Virtualization extensions
 Jazelle DBX, ARM Trustzone…


Exposing more workload-specific semantic
information to the substrate
Transactional Memory support
 Reliability-oriented extensions
 Many more…

24
Department of Computer Science
Questions?
Additional resources
(detailed report and raw data spreadsheet)
available at http://research.cs.wisc.edu/vertical/isa-power-struggles
25
Download