CS:APP Chapter 4 Computer Architecture Sequential Implementation Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP2e Y86 Instruction Set #1 Byte 0 halt 0 0 nop 1 0 cmovXX rA, rB irmovl V, rB rmmovl rA, D(rB) 1 2 3 4 2 fn rA rB 3 4 0 8 rB rrmovl 2 0 cmovle 2 1 cmovl 2 2 cmove 2 3 cmovne 2 4 cmovge 2 5 cmovg 2 6 5 V 0 rA rB D mrmovl D(rB), rA 5 0 rA rB OPl rA, rB 6 fn rA rB jXX Dest 7 fn Dest call Dest 8 0 Dest ret 9 0 pushl rA A 0 rA 8 popl rA –2– B 0 rA 8 D CS:APP2e Y86 Instruction Set #2 Byte 0 1 2 3 4 5 halt 0 0 addl 6 0 nop 1 0 subl 6 1 cmovXX rA, rB 2 fn rA rB andl 6 2 irmovl V, rB 3 0 8 rB V xorl 6 3 rmmovl rA, D(rB) 4 0 rA rB D mrmovl D(rB), rA 5 0 rA rB D OPl rA, rB 6 fn rA rB jXX Dest 7 fn Dest call Dest 8 0 Dest ret 9 0 pushl rA A 0 rA 8 popl rA –3– B 0 rA 8 CS:APP2e Y86 Instruction Set #3 Byte 0 halt 0 0 nop 1 0 rrmovl rA, rB 2 fn rA rB irmovl V, rB 3 0 8 rB V rmmovl rA, D(rB) 4 0 rA rB D mrmovl D(rB), rA OPl rA, rB jXX Dest call Dest ret pushl rA popl rA –4– 5 1 2 3 0 rA rB 4 5 jmp 7 0 jle 7 1 jl 7 2 je 7 3 jne 7 4 jge 7 5 jg 7 6 D 6 fn rA rB 7 fn 8 9 A B 0 Dest Dest 0 0 rA 8 0 rA 8 CS:APP2e Building Blocks fun Combinational Logic –5– Store bits Addressable memories Non-addressable registers Loaded only as clock rises = A L U Compute Boolean functions of inputs B Continuously respond to input changes Operate on data and implement control Storage Elements A 0 MUX 1 valA srcA A valW Register file valB srcB B W dstW Clock Clock CS:APP2e Hardware Control Language Very simple hardware description language Can only express limited aspects of hardware operation Parts we want to explore and modify Data Types bool: Boolean a, b, c, … int: words A, B, C, … Does not specify word size---bytes, 32-bit words, … Statements –6– bool a = bool-expr ; int A = int-expr ; CS:APP2e HCL Operations Classify by type of value returned Boolean Expressions Logic Operations a && b, a || b, !a Word Comparisons A == B, A != B, A < B, A <= B, A >= B, A > B Set Membership A in { B, C, D } » Same as A == B || A == C || A == D Word Expressions Case expressions [ a : A; b : B; c : C ] Evaluate test expressions a, b, c, … in sequence Return word expression A, B, C, … for first successful test –7– CS:APP2e SEQ Hardware Structure newPC PC valE, valM Write back valM State Program counter register (PC) Condition code register (CC) Register File Memories Access same memory space Data Data memory memory Memory Addr, Data valE Cnd Execute CC CC aluA, aluB Data: for reading/writing program data Instruction: for reading instructions Instruction Flow Read instruction at address specified by PC Process through stages Update program counter ALU ALU valA, valB srcA, srcB dstA, dstB Decode A B Register RegisterM file file E icode ifun rA , rB valC valP , Fetch Instruction Instruction memory memory PC PC increment increment PC –8– CS:APP2e newPC SEQ Stages PC valE, valM Write back valM Fetch Read instruction from instruction memory Data Data memory memory Memory Addr, Data Decode Read program registers Execute valE Cnd Execute CC CC aluA, aluB Compute value or address valA, valB Memory Read or write data icode ifun rA , rB valC B valP , Fetch A Register RegisterM file file E Write program registers PC srcA, srcB dstA, dstB Decode Write Back ALU ALU Instruction Instruction memory memory PC PC increment increment Update program counter PC –9– CS:APP2e Instruction Decoding Optional 5 0 rA rB Optional D icode ifun rA rB valC Instruction Format – 10 – Instruction byte icode:ifun Optional register byte rA:rB Optional constant word valC CS:APP2e Executing Arith./Logical Operation OPl rA, rB Fetch Memory Read 2 bytes Decode Read operand registers Execute – 11 – 6 fn rA rB Perform operation Set condition codes Do nothing Write back Update register PC Update Increment PC by 2 CS:APP2e Stage Computation: Arith/Log. Ops OPl rA, rB icode:ifun M1[PC] Read instruction byte rA:rB M1[PC+1] Read register byte valP PC+2 Compute next PC valA R[rA] Read operand A valB R[rB] Read operand B valE valB OP valA Perform ALU operation Set CC Set condition code register Memory Write R[rB] valE Write back result back PC update PC valP Update PC Fetch Decode Execute – 12 – Formulate instruction execution as sequence of simple steps Use same general form for all instructions CS:APP2e Executing rmmovl rmmovl rA, D(rB) 4 0 rA rB Fetch Memory Read 6 bytes Decode Read operand registers Execute – 13 – D Compute effective address Write to memory Write back Do nothing PC Update Increment PC by 6 CS:APP2e Stage Computation: rmmovl rmmovl rA, D(rB) Fetch Decode Execute Memory Write back PC update – 14 – icode:ifun M1[PC] Read instruction byte rA:rB M1[PC+1] Read register byte valC M4[PC+2] Read displacement D valP PC+6 Compute next PC valA R[rA] Read operand A valB R[rB] Read operand B valE valB + valC Compute effective address M4[valE] valA Write value to memory PC valP Update PC Use ALU for address computation CS:APP2e Executing popl popl rA Fetch Memory Read 2 bytes Decode Read stack pointer Execute b 0 rA 8 Increment stack pointer by 4 Write back Update stack pointer Write result to register PC Update – 15 – Read from old stack pointer Increment PC by 2 CS:APP2e Stage Computation: popl popl rA icode:ifun M1[PC] Read instruction byte rA:rB M1[PC+1] Read register byte valP PC+2 valA R[%esp] Compute next PC valB R [%esp] Read stack pointer valE valB + 4 Increment stack pointer Memory Write valM M4[valA] R[%esp] valE Read from stack back PC update R[rA] valM Write back result PC valP Update PC Fetch Decode Execute Read stack pointer Update stack pointer Use ALU to increment stack pointer Must update two registers Popped value – 16 – New stack pointer CS:APP2e Executing Jumps jXX Dest 7 fn fall thru: XX XX Not taken target: XX XX Taken Fetch Memory Read 5 bytes Increment PC by 5 Decode Do nothing Execute – 17 – Dest Determine whether to take branch based on jump condition and condition codes Do nothing Write back Do nothing PC Update Set PC to Dest if branch taken or to incremented PC if not branch CS:APP2e Stage Computation: Jumps jXX Dest Fetch icode:ifun M1[PC] Read instruction byte valC M4[PC+1] Read destination address valP PC+5 Fall through address Cnd Cond(CC,ifun) Take branch? PC Cnd ? valC : valP Update PC Decode Execute Memory Write back PC update – 18 – Compute both addresses Choose based on setting of condition codes and branch condition CS:APP2e Executing call 8 0 call Dest return: XX XX target: XX XX Fetch Memory Read 5 bytes Increment PC by 5 Decode Read stack pointer Execute – 19 – Dest Decrement stack pointer by 4 Write incremented PC to new value of stack pointer Write back Update stack pointer PC Update Set PC to Dest CS:APP2e Stage Computation: call call Dest icode:ifun M1[PC] Read instruction byte valC M4[PC+1] Read destination address valP PC+5 Compute return point valB R[%esp] Read stack pointer valE valB + –4 Decrement stack pointer Memory Write M4[valE] valP R[%esp] valE Write return value on stack back PC update PC valC Set PC to destination Fetch Decode Execute – 20 – Update stack pointer Use ALU to decrement stack pointer Store incremented PC CS:APP2e Executing ret 9 0 ret return: XX XX Fetch Memory Read 1 byte Decode Read stack pointer Execute Increment stack pointer by 4 Write back Update stack pointer PC Update – 21 – Read return address from old stack pointer Set PC to return address CS:APP2e Stage Computation: ret ret icode:ifun M1[PC] Read instruction byte valA R[%esp] Read operand stack pointer valB R[%esp] Read operand stack pointer valE valB + 4 Increment stack pointer Memory Write valM M4[valA] R[%esp] valE Read return address back PC update PC valM Set PC to return address Fetch Decode Execute – 22 – Update stack pointer Use ALU to increment stack pointer Read return address from memory CS:APP2e Computation Steps OPl rA, rB Fetch Decode Execute icode:ifun M1[PC] Read instruction byte rA,rB rA:rB M1[PC+1] Read register byte valC [Read constant word] valP valP PC+2 Compute next PC valA, srcA valA R[rA] Read operand A valB, srcB valB R[rB] Read operand B valE valE valB OP valA Perform ALU operation Cond code Set CC Set condition code register Memory Write valM [Memory read/write] back PC update dstM – 23 – icode,ifun dstE PC R[rB] valE Write back ALU result [Write back memory result] PC valP Update PC All instructions follow same general pattern Differ in what gets computed on each step CS:APP2e Computation Steps call Dest icode,ifun Fetch Decode Execute Read instruction byte [Read register byte] valC valC M4[PC+1] Read constant word valP valP PC+5 Compute next PC valA, srcA [Read operand A] valB, srcB valB R[%esp] Read operand B valE valE valB + –4 Perform ALU operation Cond code valM back PC update dstM – 24 – rA,rB Memory Write icode:ifun M1[PC] dstE PC [Set condition code reg.] M4[valE] valP R[%esp] valE [Memory read/write] [Write back ALU result] Write back memory result PC valC Update PC All instructions follow same general pattern Differ in what gets computed on each step CS:APP2e Computed Values Fetch Execute icode ifun rA Instruction code Instruction function Instr. Register A rB valC valP Instr. Register B Instruction constant Incremented PC valE Cnd ALU result Branch/move flag Memory valM Value from memory Decode – 25 – srcA srcB dstE dstM valA Register ID A Register ID B Destination Register E Destination Register M Register value A valB Register value B CS:APP2e SEQ Hardware Key Blue boxes: predesigned hardware blocks E.g., memories, ALU Gray boxes: control logic Describe in HCL White ovals: labels for signals Thick lines: 32-bit word values Thin lines: 4-8 bit values – 26 – Dotted lines: 1-bit values CS:APP2e icode Fetch Logic ifun rA rB valC valP Need valC Instr valid Need regids icode ifun Split Align Byte 0 imem_error Predefined Blocks PC increment Bytes 1-5 Instruction memory PC PC: Register containing PC Instruction memory: Read 6 bytes (PC to PC+5) Signal invalid address – 27 – Split: Divide instruction byte into icode and ifun Align: Get fields for rA, rB, and valC CS:APP2e icode Fetch Logic ifun rA rB valC valP Need valC Instr valid Need regids icode ifun Split Align Byte 0 imem_error Control Logic – 28 – PC increment Bytes 1-5 Instruction memory PC Instr. Valid: Is this instruction valid? icode, ifun: Generate no-op if invalid address Need regids: Does this instruction have a register byte? Need valC: Does this instruction have a constant word? CS:APP2e Fetch Control Logic in HCL icode ifun Split Byte 0 # Determine instruction code int icode = [ imem_error: INOP; 1: imem_icode; ]; imem_error Instruction memory PC # Determine instruction function int ifun = [ imem_error: FNONE; 1: imem_ifun; ]; – 29 – CS:APP2e Fetch Control Logic in HCL halt 0 0 nop 1 0 cmovXX rA, rB 2 fn rA rB irmovl V, rB 3 0 8 rB V rmmovl rA, D(rB) 4 0 rA rB D mrmovl D(rB), rA 5 0 rA rB D OPl rA, rB 6 fn rA rB jXX Dest 7 fn Dest call Dest 8 0 Dest ret 9 0 pushl rA A 0 rA 8 popl rA B 0 rA 8 bool need_regids = icode in { IRRMOVL, IOPL, IPUSHL, IPOPL, IIRMOVL, IRMMOVL, IMRMOVL }; bool instr_valid = icode in { INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL, IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL }; – 30 – CS:APP2e Decode Logic Register File Read ports A, B Write ports E, M Addresses are register IDs or 15 (0xF) (no access) Cnd valA valB A B Register file dstE dstM srcA valM valE M E srcB Control Logic srcA, srcB: read port addresses dstE, dstM: write port addresses dstE dstM srcA srcB icode rA rB Signals – 31 – Cnd: Indicate whether or not to perform conditional move Computed in Execute stage CS:APP2e OPl rA, rB A Source Decode valA R[rA] Read operand A cmovXX rA, rB Decode valA R[rA] Read operand A rmmovl rA, D(rB) Decode valA R[rA] Read operand A popl rA Decode valA R[%esp] Read stack pointer jXX Dest Decode No operand call Dest Decode No operand ret Decode valA R[%esp] int srcA = [ icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL icode in { IPOPL, IRET } : RESP; 1 : RNONE; # Don't need register ]; – 32 – Read stack pointer } : rA; CS:APP2e E Destination OPl rA, rB Write-back R[rB] valE cmovXX rA, rB Write-back R[rB] valE Write back result Conditionally write back result rmmovl rA, D(rB) Write-back None popl rA Write-back R[%esp] valE Update stack pointer jXX Dest Write-back None call Dest Write-back R[%esp] valE Update stack pointer ret Write-back R[%esp] valE Update stack pointer int dstE = [ icode in { IRRMOVL } && Cnd : rB; icode in { IIRMOVL, IOPL} : rB; icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP; 1 : RNONE; # Don't write any register – 33]; – CS:APP2e Execute Logic Units ALU Implements 4 required functions Generates condition code values Cnd CC Register with 3 condition code cond bits valE CC cond ALU fun. ALU Computes conditional Set CC jump/move flag ALU A ALU B Control Logic – 34 – Set CC: Should condition code register be loaded? ALU A: Input A to ALU ALU B: Input B to ALU ALU fun: What function should ALU compute? icode ifun valC valA valB CS:APP2e ALU A Input Execute OPl rA, rB valE valB OP valA Perform ALU operation Execute cmovXX rA, rB valE 0 + valA Pass valA through ALU rmmovl rA, D(rB) Execute valE valB + valC Compute effective address popl rA Execute valE valB + 4 Increment stack pointer jXX Dest Execute No operation call Dest Execute valE valB + –4 Decrement stack pointer ret Execute valE valB + 4 Increment stack pointer int aluA = [ icode in { IRRMOVL, IOPL } : valA; icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC; icode in { ICALL, IPUSHL } : -4; icode in { IRET, IPOPL } : 4; # Other instructions don't need ALU – 35 – ]; CS:APP2e ALU Operation OPl rA, rB Execute valE valB OP valA Perform ALU operation cmovXX rA, rB Execute valE 0 + valA Pass valA through ALU rmmovl rA, D(rB) Execute valE valB + valC Compute effective address popl rA Execute valE valB + 4 Increment stack pointer jXX Dest Execute No operation call Dest Execute valE valB + –4 Decrement stack pointer ret Execute – 36 – valE valB + 4 int alufun = [ icode == IOPL : ifun; 1 : ALUADD; ]; Increment stack pointer CS:APP2e Memory Logic Stat Memory Reads or writes memory word valM stat dmem_error data out Mem. read instr_valid read Data memory imem_error Mem. write Control Logic – 37 – stat: What is instruction status? Mem. read: should word be read? Mem. write: should word be written? Mem. addr.: Select address Mem. data.: Select data write data in Mem. addr icode valE Mem. data valA valP CS:APP2e Instruction Status Stat Control Logic stat: What is instruction status? valM stat dmem_error data out Mem. read instr_valid read Data memory imem_error Mem. write write data in Mem. addr icode valE Mem. data valA valP ## Determine instruction status int Stat = [ imem_error || dmem_error : SADR; !instr_valid: SINS; icode == IHALT : SHLT; 1 : SAOK; ]; – 38 – CS:APP2e Memory Address OPl rA, rB Memory No operation rmmovl rA, D(rB) Memory M4[valE] valA Write value to memory popl rA Memory valM M4[valA] Read from stack jXX Dest Memory No operation call Dest Memory M4[valE] valP Write return value on stack ret Memory valM M4[valA] Read return address int mem_addr = [ icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : valE; icode in { IPOPL, IRET } : valA; # Other instructions don't need address CS:APP2e – 39 – ]; Memory Read OPl rA, rB Memory No operation rmmovl rA, D(rB) Memory M4[valE] valA Write value to memory popl rA Memory valM M4[valA] Read from stack jXX Dest Memory No operation call Dest Memory M4[valE] valP Write return value on stack ret Memory valM M4[valA] Read return address bool mem_read = icode in { IMRMOVL, IPOPL, IRET }; – 40 – CS:APP2e PC Update Logic PC New PC New PC Select next value of PC icode – 41 – Cnd valC valM valP CS:APP2e PC Update OPl rA, rB PC update PC valP Update PC rmmovl rA, D(rB) PC update PC valP Update PC popl rA PC update PC valP Update PC jXX Dest PC update PC Cnd ? valC : valP Update PC call Dest PC update PC valC Set PC to destination ret PC update – 42 – PC valM Set PC to return address int new_pc = [ icode == ICALL : valC; icode == IJXX && Cnd : valC; icode == IRET : valM; 1 : valP; ]; CS:APP2e SEQ Operation State PC register Cond. Code register Data memory Register file All updated as clock rises Combinational Logic Read Write Data Data memory memory CC CC Read Ports Write Ports Register Register file file Combinational Logic PC 0x00c ALU Control logic Memory reads Instruction memory Register file Data memory – 43 – CS:APP2e SEQ Operation #2 Combinational Logic CC CC 100 100 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Clock Cycle 1: 0x000: irmovl $0x100,%ebx # %ebx <-- 0x100 Cycle 2: 0x006: irmovl $0x200,%edx # %edx <-- 0x200 Cycle 3: 0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000 Cycle 4: 0x00e: je dest # Not taken Write Read Data Data memory memory Write Ports Read Ports Register Register file file state set according to second irmovl instruction combinational logic starting to react to state changes %ebx 0x100 %ebx == 0x100 PC 0x00c – 44 – CS:APP2e SEQ Operation #3 Combinational Logic CC CC 100 100 000 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Clock Cycle 1: 0x000: irmovl $0x100,%ebx # %ebx <-- 0x100 Cycle 2: 0x006: irmovl $0x200,%edx # %edx <-- 0x200 Cycle 3: 0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000 Cycle 4: 0x00e: je dest # Not taken Write Read Data Data memory memory Write Ports Read Ports Register Register file file state set according to second irmovl instruction combinational logic generates results for addl instruction %ebx 0x100 %ebx == 0x100 0x00e PC 0x00c – 45 – CS:APP2e SEQ Operation #4 Cycle 1 Cycle 3 Cycle 4 Cycle 1: 0x000: irmovl $0x100,%ebx # %ebx <-- 0x100 Cycle 2: 0x006: irmovl $0x200,%edx # %edx <-- 0x200 Cycle 3: 0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000 Cycle 4: 0x00e: je dest # Not taken Read Write Combinational Logic CC CC 000 000 Cycle 2 Clock Data Data memory memory Read Ports Write Ports Register Register file file %ebx %ebx == 0x300 0x300 state set according to addl instruction combinational logic starting to react to state changes PC 0x00e – 46 – CS:APP2e SEQ Operation #5 Cycle 1 Cycle 3 Cycle 4 Cycle 1: 0x000: irmovl $0x100,%ebx # %ebx <-- 0x100 Cycle 2: 0x006: irmovl $0x200,%edx # %edx <-- 0x200 Cycle 3: 0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000 Cycle 4: 0x00e: je dest # Not taken Read Write Combinational Logic CC CC 000 000 Cycle 2 Clock Data Data memory memory Read Ports Write Ports Register Register file file %ebx %ebx == 0x300 0x300 state set according to addl instruction combinational logic generates results for je instruction 0x013 PC 0x00e – 47 – CS:APP2e SEQ Summary Implementation Express every instruction as series of simple steps Follow same general flow for each instruction type Assemble registers, memories, predesigned combinational blocks Connect with control logic Limitations – 48 – Too slow to be practical In one cycle, must propagate through instruction memory, register file, ALU, and data memory Would need to run clock very slowly Hardware units only active for fraction of clock cycle CS:APP2e