Processor Microarchitecture Fetch Decode Execute/Writeback Register Files ALU MUL Instruction Cache Fetch Queue Instruction Decoder Instruction Queue FPU LD Branch Prediction Instruction TLB ST Data TLB L1 Data Cache Network Memory L2 Data Cache NoC Router On-Chip Network Energy/Power Calculation • How do we calculate energy or power dissipation at given microarchitecture? • Energy/Power varies between: – Different ISA; ARM vs Intel x86 – Different microarchitecture; in-order vs out-of-order – Different applications; memory vs compute-bound – Different technologies; 90nm vs 22nm technology – Different operation conditions; frequency, temperature Architecture Activity (1) icache.read++; fbuffer.write++; Activity 1: Instruction Fetch Register Files ALU MUL Instruction Cache Fetch Queue Instruction Decoder Instruction TLB • Activity counts at each component differs between applications. FPU LD Branch Prediction • Collect activity counts of each architecture component (through simulation or measurement). • List of components differs between microarchitectures. Instruction Queue ST Data TLB L1 Data Cache L2 Data Cache NoC Router On-Chip Network Architecture Activity (2) fbuffer.read++; idecoder.logic++; Activity 2: Instruction Decode Register Files ALU MUL Instruction Cache Instruction TLB Fetch Queue Instruction Decoder Instruction Queue FPU LD Branch Prediction • Read/write accesses to caches, buffers, etc. • Logical accesses to logic blocks such as decoder, ALUs, etc. • Tradeoff of differentiating more access types (accuracy) vs simulation speed (complexity). ST Data TLB L1 Data Cache L2 Data Cache NoC Router On-Chip Network Power and Architecture Activity • For example, At nth clock cycle, collected counters are: – Data cache: • • • • • • read = 20, write = 12; per-read energy = 0.5nJ; per-write energy = 0.6nJ; Read energy = read*per-read energy = 10nJ Write energy = write*per-write energy = 7.2nJ Total activity energy = read+write energies = 17.2nJ If n = 50th clock cycle and clock frequency = 2GHz, Total activity power = energy*clock_freq/n = 688mW *Note: n/clock_freq = n clock periods in sec power = time average of energy Things to consider (1) 1. How do we calculate per-read/write energies? • Per-access energies can be estimated from circuit-level designs and analyses. • There are various open-source tools for this. Architecture Specification Technology Parameters Circuit-level Estimation Tool Estimation Results: Area, Energy, Timing, etc. Things to consider (2) 2. Is per-access energy always the same? • Per-access energy in fact depends on: • how many bits are switching • how they are switching (0→1 or 1→0) • It is reasonable to assume constant per-access energy in long-term observation (e.g., n = 1M clock cycles); the number of switching bits are averaged (e.g., 50% of bits are switching). • Most architecture simulators do not capture bit-level details due to simulation complexity. Things to consider (3) 3. If a register file didn’t have read/write accesses but held data, what is the energy dissipation? • Energy (or power) is largely comprised of dynamic and static dissipations. • Dynamic (or switching) energy refers to energy dissipation due to switching activities. • Static (or leakage) energy is dissipation to keep the electronic system turned on. • In this case, the register file has no dynamic energy dissipation but consumes static energy.