EEL 5708 High Performance Computer Architecture Lecture 3 Review: Instruction Sets Sept 1, 2004 Lotzi Bölöni Fall 2004 Fall 2004 EEL5708/Bölöni Lec 3.1 Acknowledgements • All the lecture slides were adopted from the slides of David Patterson (1998, 2001) and David E. Culler (2001), Copyright 1998-2002, University of California Berkeley Fall 2004 EEL5708/Bölöni Lec 3.2 Review: Instruction sets Fall 2004 EEL5708/Bölöni Lec 3.3 The Instruction Set: a Critical Interface software instruction set hardware Fall 2004 EEL5708/Bölöni Lec 3.4 Levels of Representation temp = v[k]; High Level Language Program Compiler Assembly Language Program Assembler Machine Language Program v[k] = v[k+1]; v[k+1] = temp; lw $15,0($2) lw $16,4($2) sw $16, 0($2) sw $15, 4($2) 0000 1010 1100 0101 1001 1111 0110 1000 1100 0101 1010 0000 0110 1000 1111 1001 1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111 Machine Interpretation Control Signal Specification ALUOP[0:3] <= InstReg[9:11] & MASK ° ° Fall 2004 EEL5708/Bölöni Lec 3.5 Instruction Set Architecture ... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. – Amdahl, Blaaw, and Brooks, 1964 SOFTWARE -- Organization of Programmable Storage -- Data Types & Data Structures: Encodings & Representations -- Instruction Formats -- Instruction (or Operation Code) Set -- Modes of Addressing and Accessing Data Items and Instructions -- Exceptional Conditions Fall 2004 EEL5708/Bölöni Lec 3.6 Review: MIPS R3000 (core) r0 r1 ° ° ° r31 PC lo hi 0 Programmable storage Data types ? 2^32 x bytes Format ? 31 x 32-bit GPRs (R0=0) Addressing Modes? 32 x 32-bit FP regs (paired DP) HI, LO, PC Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV Memory Access LB, LBU, LH, LHU, LW, LWL,LWR SB, SH, SW, SWL, SWR Control 32-bit instructions on word boundary J, JAL, JR, JALR Fall 2004 BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL EEL5708/Bölöni Lec 3.7 Review: Basic ISA Classes Accumulator: 1 address 1+x address Stack: 0 address General Purpose 2 address 3 address Load/Store: 3 address Fall 2004 add A addx A add Register: add A B add A B C add Ra Rb Rc load Ra Rb store Ra Rb acc acc + mem[A] acc acc + mem[A + x] tos tos + next EA(A) EA(A) + EA(B) EA(A) EA(B) + EA(C) Ra Rb + Rc Ra mem[Rb] mem[Rb] Ra EEL5708/Bölöni Lec 3.8 Instruction Formats Variable: … Fixed: Hybrid: •Addressing modes –each operand requires address specifier => variable format •code size => variable length instructions •performance => fixed length instructions –simple decoding, predictable operations •With load/store instruction arch, only one memory address and few addressing modes •=> simple format, address mode given by opcode Fall 2004 EEL5708/Bölöni Lec 3.9 MIPS Addressing Modes & Formats • Simple addressing modes • All instructions 32 bits wide Register (direct) op rs rt rd register Immediate Base+index op rs rt immed op rs rt immed register PC-relative op rs PC rt Memory + immed Memory + • Register Indirect? Fall 2004 EEL5708/Bölöni Lec 3.10 Execution Cycle Instruction Obtain instruction from program storage Fetch Instruction Determine required actions and instruction size Decode Operand Locate and obtain operand data Fetch Execute Result Compute result value or status Deposit results in storage for later use Store Next Instruction Fall 2004 Determine successor instruction EEL5708/Bölöni Lec 3.11 Review: Measuring performance Fall 2004 EEL5708/Bölöni Lec 3.12 Which is faster? Plane DC to Paris Speed Passengers Throughput (pmph) Boeing 747 6.5 hours 610 mph 470 286,700 BAD/Sud Concorde 3 hours 1350 mph 132 178,200 • Time to run the task (ExTime) – Execution time, response time, latency • Tasks per day, hour, week, sec, ns … (Performance) – Throughput, bandwidth Fall 2004 EEL5708/Bölöni Lec 3.13 Definitions • Performance is in units of things per sec – bigger is better • If we are primarily concerned with response time – performance(x) = 1 execution_time(x) " X is n times faster than Y" means Execution_time(Y) Performance(X) n = = Performance(Y) Fall 2004 Execution_time(X) EEL5708/Bölöni Lec 3.14 CPI Computer Performance inst count CPU time = Seconds = Instructions x Program CPI Program Compiler X (X) Inst. Set. X X Technology x Seconds Instruction Inst Count X Organization Fall 2004 Program Cycles X Cycle time Cycle Clock Rate X X EEL5708/Bölöni Lec 3.15 Cycles Per Instruction (Throughput) “Average Cycles per Instruction” CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count n CPU time Cycle Time CPI j I j j 1 n CPI CPI j Fj j 1 where Fj Ij Instruction Count “Instruction Frequency” Fall 2004 EEL5708/Bölöni Lec 3.16 Example: Calculating CPI bottom up Base Machine Op ALU Load Store Branch (Reg / Freq 50% 20% 10% 20% Reg) Cycles 1 2 2 2 Typical Mix of instruction types in program Fall 2004 CPI(i) .5 .4 .2 .4 1.5 (% Time) (33%) (27%) (13%) (27%) EEL5708/Bölöni Lec 3.17 Example: Branch Stall Impact • Assume CPI = 1.0 ignoring branches (ideal) • Assume solution was stalling for 3 cycles • If 30% branch, Stall 3 cycles on 30% • Op • Other • Branch Freq 70% 30% Cycles CPI(i) (% Time) 1 .7 (37%) 4 1.2 (63%) • => new CPI = 1.9 • New machine is 1/1.9 = 0.52 times faster (i.e. slow!) Fall 2004 EEL5708/Bölöni Lec 3.18