Processor Microarchitecture

advertisement
Processor Microarchitecture
Fetch
Decode
Execute/Writeback
Register
Files
ALU
MUL
Instruction
Cache
Fetch
Queue
Instruction
Decoder
Instruction
Queue
FPU
LD
Branch
Prediction
Instruction
TLB
ST
Data
TLB
L1 Data
Cache
Network
Memory
L2 Data Cache
NoC
Router
On-Chip
Network
Energy/Power Calculation
• How do we calculate energy or power dissipation at
given microarchitecture?
• Energy/Power varies between:
– Different ISA; ARM vs Intel x86
– Different microarchitecture; in-order vs out-of-order
– Different applications; memory vs compute-bound
– Different technologies; 90nm vs 22nm technology
– Different operation conditions; frequency, temperature
Architecture Activity (1)
icache.read++; fbuffer.write++;
Activity 1: Instruction Fetch
Register
Files
ALU
MUL
Instruction
Cache
Fetch
Queue
Instruction
Decoder
Instruction
TLB
• Activity counts at each
component differs between
applications.
FPU
LD
Branch
Prediction
• Collect activity counts of each
architecture component (through
simulation or measurement).
• List of components differs
between microarchitectures.
Instruction
Queue
ST
Data
TLB
L1 Data
Cache
L2 Data Cache
NoC
Router
On-Chip
Network
Architecture Activity (2)
fbuffer.read++; idecoder.logic++;
Activity 2: Instruction Decode
Register
Files
ALU
MUL
Instruction
Cache
Instruction
TLB
Fetch
Queue
Instruction
Decoder
Instruction
Queue
FPU
LD
Branch
Prediction
• Read/write accesses to caches,
buffers, etc.
• Logical accesses to logic blocks
such as decoder, ALUs, etc.
• Tradeoff of differentiating more
access types (accuracy) vs
simulation speed (complexity).
ST
Data
TLB
L1 Data
Cache
L2 Data Cache
NoC
Router
On-Chip
Network
Power and Architecture Activity
• For example, At nth clock cycle, collected counters
are:
– Data cache:
•
•
•
•
•
•
read = 20, write = 12;
per-read energy = 0.5nJ; per-write energy = 0.6nJ;
Read energy = read*per-read energy = 10nJ
Write energy = write*per-write energy = 7.2nJ
Total activity energy = read+write energies = 17.2nJ
If n = 50th clock cycle and clock frequency = 2GHz,
Total activity power = energy*clock_freq/n = 688mW
*Note: n/clock_freq = n clock periods in sec
power = time average of energy
Things to consider (1)
1. How do we calculate per-read/write energies?
• Per-access energies can be estimated from circuit-level
designs and analyses.
• There are various open-source tools for this.
Architecture
Specification
Technology
Parameters
Circuit-level
Estimation
Tool
Estimation Results:
Area, Energy,
Timing, etc.
Things to consider (2)
2. Is per-access energy always the same?
• Per-access energy in fact depends on:
• how many bits are switching
• how they are switching (0→1 or 1→0)
• It is reasonable to assume constant per-access energy in
long-term observation (e.g., n = 1M clock cycles); the
number of switching bits are averaged (e.g., 50% of bits are
switching).
• Most architecture simulators do not capture bit-level details
due to simulation complexity.
Things to consider (3)
3. If a register file didn’t have read/write accesses
but held data, what is the energy dissipation?
• Energy (or power) is largely comprised of dynamic and static
dissipations.
• Dynamic (or switching) energy refers to energy dissipation
due to switching activities.
• Static (or leakage) energy is dissipation to keep the
electronic system turned on.
• In this case, the register file has no dynamic energy
dissipation but consumes static energy.
Download