Y86 Instruction Set Architecture – SEQ processor CENG331: Introduction to Computer Systems 8th Lecture Instructor: Erol Sahin Acknowledgement: Most of the slides are adapted from the ones prepared by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ. Instruction Set Architecture Assembly Language View Processor state Registers, memory, … Instructions addl, movl, leal, … How instructions are encoded as bytes Layer of Abstraction Above: how to program machine Processor executes instructions in a sequence Application Program Compiler OS ISA CPU Design Circuit Design Below: what needs to be built Use variety of tricks to make it run fast E.g., execute multiple instructions simultaneously Chip Layout –2– Y86 Processor State Program registers %eax %esi %ecx %edi %edx %esp %ebx %ebp Condition codes Memory OF ZF SF PC Program Registers Same 8 as with IA32. Each 32 bits Condition Codes Single-bit flags set by arithmetic or logical instructions » OF: Overflow ZF: Zero SF:Negative Program Counter Indicates address of instruction Memory Byte-addressable storage array Words stored in little-endian byte order –3– Y86 Instructions Format 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state –4– Encoding Registers Each register has 4-bit ID %eax %ecx %edx %ebx 0 1 2 3 %esi %edi %esp %ebp 6 7 4 5 Same encoding as in IA32 Register ID 8 indicates “no register” Will use this in our hardware design in multiple places –5– %eax %ecx %edx %ebx Instruction Example 0 1 2 3 %esi %edi %esp %ebp 6 7 4 5 Addition Instruction Generic Form Encoded Representation addl rA, rB 6 0 rA rB Add value in register rA to that in register rB Store result in register rB Note that Y86 only allows addition to be applied to register data Set condition codes based on result e.g., addl %eax,%esi Encoding: 60 06 Two-byte encoding First indicates instruction type Second gives source and destination registers –6– Arithmetic and Logical Operations Instruction Code Add addl rA, rB Function Code 6 0 rA rB Refer to generically as “OPl” Encodings differ only by “function code” Low-order 4 bytes in first Subtract (rA from rB) subl rA, rB 6 1 rA rB instruction word Set condition codes as side effect And andl rA, rB 6 2 rA rB Exclusive-Or xorl rA, rB 6 3 rA rB –7– Move Operations rrmovl rA, rB Register --> Register 2 0 rA rB 3 0 8 rB V rmmovl rA, D(rB) 4 0 rA rB D 5 0 rA rB D irmovl V, rB mrmovl D(rB), rA Like the IA32 movl instruction Simpler format for memory addresses Give different names to keep them distinct Immediate --> Register Register --> Memory Memory --> Register –8– Move Instruction Examples %eax %ecx %edx %ebx 0 1 2 3 %esi %edi %esp %ebp 6 7 4 5 IA32 Y86 Encoding movl $0xabcd, %edx irmovl $0xabcd, %edx 30 82 cd ab 00 00 movl %esp, %ebx rrmovl %esp, %ebx 20 43 movl -12(%ebp),%ecx mrmovl -12(%ebp),%ecx 50 15 f4 ff ff ff movl %esi,0x41c(%esp) rmmovl %esi,0x41c(%esp) 40 64 1c 04 00 00 movl $0xabcd, (%eax) — movl %eax, 12(%eax,%edx) — movl (%ebp,%eax,4),%ecx — –9– Jump Instructions Jump Unconditionally jmp Dest 7 0 Dest Jump When Less or Equal jle Dest 7 1 7 2 7 3 7 4 Encodings differ only by “function code” Based on values of condition codes Same as IA32 counterparts Encode full destination address Dest Dest Jump When Not Equal jne Dest Jump When Equal je Dest Refer to generically as “jXX” Dest Jump When Less jl Dest Unlike PC-relative addressing seen in IA32 Dest Jump When Greater or Equal jge Dest 7 5 Dest Jump When Greater jg Dest 7 6 Dest – 10 – Y86 Program Stack Stack “Bottom” • Increasing Addresses Region of memory holding program data Used in Y86 (and IA32) for supporting procedure calls Stack top indicated by %esp Address of top stack element • Stack grows toward lower addresses Top element is at highest address in • %esp the stack When pushing, must first decrement stack pointer When popping, increment stack pointer Stack “Top” – 11 – Stack Operations pushl rA a 0 rA 8 Decrement %esp by 4 Store word from rA to memory at %esp Like IA32 popl rA b 0 rA 8 Read word from memory at %esp Save in rA Increment %esp by 4 Like IA32 – 12 – Subroutine Call and Return call Dest ret 8 0 Dest Push address of next instruction onto stack Start executing instructions at Dest Like IA32 9 0 Pop value from stack Use as address for next instruction Like IA32 – 13 – Miscellaneous Instructions 0 0 nop Don’t do anything halt 1 0 Stop executing instructions IA32 has comparable instruction, but can’t execute it in user mode We will use it to stop the simulator – 14 – Y86 Code Generation Example First Try Write typical array code Problem Hard to do array indexing on Y86 Since don’t have scaled addressing modes /* Find number of elements in null-terminated list */ int len1(int a[]) { int len; for (len = 0; a[len]; len++) ; return len; } L18: incl %eax cmpl $0,(%edx,%eax,4) jne L18 Compile with gcc -O2 -S – 15 – Y86 Code Generation Example #2 Second Try Write with pointer code /* Find number of elements in null-terminated list */ int len2(int a[]) { int len = 0; while (*a++) len++; return len; } Result Don’t need to do indexed addressing L24: movl (%edx),%eax incl %ecx L26: addl $4,%edx testl %eax,%eax jne L24 Compile with gcc -O2 -S – 16 – Y86 Code Generation Example #3 IA32 Code Setup len2: pushl %ebp xorl %ecx,%ecx movl %esp,%ebp movl 8(%ebp),%edx movl (%edx),%eax jmp L26 Y86 Code Setup len2: pushl %ebp # xorl %ecx,%ecx # rrmovl %esp,%ebp # mrmovl 8(%ebp),%edx # mrmovl (%edx),%eax # jmp L26 # Save %ebp len = 0 Set frame Get a Get *a Goto entry – 17 – Y86 Code Generation Example #4 IA32 Code Loop + Finish L24: movl (%edx),%eax incl %ecx L26: addl $4,%edx testl %eax,%eax jne L24 movl %ebp,%esp movl %ecx,%eax popl %ebp ret Y86 Code Loop + Finish L24: mrmovl (%edx),%eax irmovl $1,%esi addl %esi,%ecx L26: irmovl $4,%esi addl %esi,%edx andl %eax,%eax jne L24 rrmovl %ebp,%esp rrmovl %ecx,%eax popl %ebp ret # Get *a # len++ # Entry: # # # # # a++ *a == 0? No--Loop Pop Rtn len – 18 – Y86 Program Structure irmovl Stack,%esp rrmovl %esp,%ebp irmovl List,%edx pushl %edx call len2 halt .align 4 List: .long 5043 .long 6125 .long 7395 .long 0 # Set up stack # Set up frame # Push argument # Call Function # Halt Program starts at address 0 Must set up stack Make sure don’t overwrite code! # List of elements Must initialize data Can use symbolic names # Function len2: . . . # Allocate space for stack .pos 0x100 Stack: – 19 – Assembling Y86 Program unix> yas eg.ys Generates “object code” file eg.yo Actually looks like disassembler output 0x000: 0x006: 0x008: 0x00e: 0x010: 0x015: 0x018: 0x018: 0x018: 0x01c: 0x020: 0x024: 308400010000 2045 308218000000 a028 8028000000 10 b3130000 ed170000 e31c0000 00000000 | | | | | | | | | | | | irmovl Stack,%esp rrmovl %esp,%ebp irmovl List,%edx pushl %edx call len2 halt .align 4 List: .long 5043 .long 6125 .long 7395 .long 0 # Set up stack # Set up frame # Push argument # Call Function # Halt # List of elements – 20 – Simulating Y86 Program unix> yis eg.yo Instruction set simulator Computes effect of each instruction on processor state Prints changes in state from original Stopped in 41 steps at PC = 0x16. Exception 'HLT', CC Z=1 S=0 O=0 Changes to registers: %eax: 0x00000000 0x00000003 %ecx: 0x00000000 0x00000003 %edx: 0x00000000 0x00000028 %esp: 0x00000000 0x000000fc %ebp: 0x00000000 0x00000100 %esi: 0x00000000 0x00000004 Changes to memory: 0x00f4: 0x00f8: 0x00fc: 0x00000000 0x00000000 0x00000000 0x00000100 0x00000015 0x00000018 – 21 – CISC versus RISC CENG331: Introduction to Computer Systems 8th Lecture Instructor: Erol Sahin Acknowledgement: Most of the slides are adapted from the ones prepared by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ. CISC Instruction Sets Complex Instruction Set Computer Dominant style through mid-80’s Stack-oriented instruction set Use stack to pass arguments, save program counter (IA32 but not int x8664) Explicit push and pop instructions Arithmetic instructions can access memory addl %eax, 12(%ebx,%ecx,4) requires memory read and write Complex address calculation Condition codes Set as side effect of arithmetic and logical instructions Philosophy Add instructions to perform “typical” programming tasks – 23 – RISC Instruction Sets Reduced Instruction Set Computer Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley) Fewer, simpler instructions Might take more to get given task done Can execute them with small and fast hardware Register-oriented instruction set Many more (typically 32) registers Use for arguments, return pointer, temporaries Only load and store instructions can access memory Similar to Y86 mrmovl and rmmovl No Condition codes Test instructions return 0/1 in register – 24 – MIPS Registers $0 $0 $1 $at $2 $v0 $3 $v1 $4 $a0 $5 $a1 $6 $a2 $7 Constant 0 Reserved Temp. $16 $s0 $17 $s1 $18 $s2 $19 $s3 $20 $s4 $21 $s5 $22 $s6 $a3 $23 $s7 $8 $t0 $24 $t8 $9 $t1 $25 $t9 $10 $t2 $26 $k0 $11 $t3 $27 $k1 $12 $t4 $28 $gp $13 $t5 $29 $sp $14 $t6 $30 $s8 $15 $t7 $31 $ra Return Values Procedure arguments Caller Save Temporaries: May be overwritten by called procedures Callee Save Temporaries: May not be overwritten by called procedures Caller Save Temp Reserved for Operating Sys Global Pointer Stack Pointer Callee Save Temp Return Address – 25 – MIPS Instruction Examples R-R Op Ra addu $3,$2,$1 R-I Op Ra addu $3,$2, 3145 sll $3,$2,2 Branch Op Ra beq $3,$2,dest Load/Store Op Ra Rb Rd 00000 Fn # Register add: $3 = $2+$1 Rb Immediate # Immediate add: $3 = $2+3145 # Shift left: $3 = $2 << 2 Rb Offset # Branch when $3 = $2 Rb Offset lw $3,16($2) # Load Word: $3 = M[$2+16] sw $3,16($2) # Store Word: M[$2+16] = $3 – 26 – CISC vs. RISC Original Debate Strong opinions! CISC proponents---easy for compiler, fewer code bytes RISC proponents---better for optimizing compilers, can make run fast with simple chip design Current Status For desktop processors, choice of ISA not a technical issue With enough hardware, can make anything run fast Code compatibility more important For embedded processors, RISC makes sense Smaller, cheaper, less power – 27 – Summary Y86 Instruction Set Architecture Similar state and instructions as IA32 Simpler encodings Somewhere between CISC and RISC How Important is ISA Design? Less now than before With enough hardware, can make almost anything go fast AMD/Intel moved away from IA32 Does not allow enough parallel execution x86-64 » 64-bit word sizes (overcome address space limitations) » Radically different style of instruction set with explicit parallelism » Requires sophisticated compilers – 28 – Logic Design and HCL CENG331: Introduction to Computer Systems 8th Lecture Instructor: Erol Sahin Acknowledgement: Most of the slides are adapted from the ones prepared by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ. Computing with Logic Gates a b out out = a && b Not Or And a b out out a out = !a out = a || b Outputs are Boolean functions of inputs Respond continuously to changes in inputs With some, small delay Rising Delay Falling Delay a && b b Voltage a Time – 30 – Combinational Circuits Acyclic Network Primary Inputs Primary Outputs Acyclic Network of Logic Gates Continously responds to changes on primary inputs Primary outputs become (after some delay) Boolean functions of primary inputs – 31 – Bit Equality Bit equal a HCL Expression eq bool eq = (a&&b)||(!a&&!b) b Generate 1 if a and b are equal Hardware Control Language (HCL) Very simple hardware description language Boolean operations have syntax similar to C logical operations We’ll use it to describe control logic for processors – 32 – Word Equality Word-Level Representation b31 Bit equal eq31 B = a31 b30 Bit equal A eq30 a30 HCL Representation Eq b1 Bit equal a0 Bit equal bool Eq = (A == B) eq1 a1 b0 Eq eq0 32-bit word size HCL representation Equality operation Generates Boolean value – 33 – Bit-Level Multiplexor s Bit MUX HCL Expression bool out = (s&&a)||(!s&&b) b out a Control signal s Data signals a and b Output a when s=1, b when s=0 – 34 – Word Multiplexor Word-Level Representation s s B b31 out31 a31 Out A HCL Representation b30 out30 a30 int Out = [ s : A; 1 : B; ]; Select input word A or B depending on control signal s HCL representation Case expression b0 out0 a0 MUX Series of test : value pairs Output value for first successful test – 35 – HCL Word-Level Examples Minimum of 3 Words C B A MIN3 Min3 int Min3 = [ A < B && A < C : A; B < A && B < C : B; 1 : C; ]; Find minimum of three input words HCL case expression Final case guarantees match 4-Way Multiplexor s1 s0 D0 D1 D2 D3 MUX4 Out4 int Out4 = [ !s1&&!s0: D0; !s1 : D1; !s0 : D2; 1 : D3; ]; Select one of 4 inputs based on two control bits HCL case expression Simplify tests by assuming sequential matching – 36 – Arithmetic Logic Unit 0 Y A X B 1 A L U Y A X+Y OF ZF CF X B A L U 2 Y A X-Y OF ZF CF X B 3 Y A L U A X&Y OF ZF CF X B A L U X^Y OF ZF CF Combinational logic Continuously responding to inputs Control signal selects function computed Corresponding to 4 arithmetic/logical operations in Y86 Also computes values for condition codes – 37 – Registers Structure i7 D C Q+ o7 i6 D C Q+ o6 i5 D C Q+ o5 i4 D C Q+ o4 i3 D C Q+ o3 i2 D C Q+ o2 i1 D C Q+ o1 i0 D C Q+ o0 I O Clock Clock Stores word of data Different from program registers seen in assembly code Collection of edge-triggered latches Loads input on rising edge of clock – 38 – Register Operation State = x Input = y Output = x x State = y Rising clock Output = y y Stores data bits For most of time acts as barrier between input and output As clock rises, loads input – 39 – State Machine Example Comb. Logic 0 A L U 0 Out Accumulator circuit Load or accumulate on each cycle MUX In 1 Load Clock Clock Load In Out x0 x1 x0 x0+x1 x2 x0+x1+x2 x3 x4 x3 x3+x4 x5 x3+x4+x5 – 40 – Random-Access Memory valA srcA A valW Register file Read ports valB srcB W dstW Write port B Clock Stores multiple words of memory Address input specifies which word to read or write Register file Holds values of program registers %eax, %esp, etc. Register identifier serves as address » ID 8 implies no read or write performed Multiple Ports Can read and/or write multiple words in one cycle » Each has separate address and data input/output – 41 – Register File Timing Reading valA srcA x A valB srcB x Register file 2 Like combinational logic Output data generated based on input address After some delay B 2 Writing 2 Like register Update only as clock rises x valW Register file W Clock dstW y 2 Rising clock 2 y valW Register file W dstW Clock – 42 – Hardware Control Language Very simple hardware description language Can only express limited aspects of hardware operation Parts we want to explore and modify Data Types bool: Boolean a, b, c, … int: words A, B, C, … Does not specify word size---bytes, 32-bit words, … Statements bool a = bool-expr ; int A = int-expr ; – 43 – HCL Operations Classify by type of value returned Boolean Expressions Logic Operations a && b, a || b, !a Word Comparisons A == B, A != B, A < B, A <= B, A >= B, A > B Set Membership A in { B, C, D } » Same as A == B || A == C || A == D Word Expressions Case expressions [ a : A; b : B; c : C ] Evaluate test expressions a, b, c, … in sequence Return word expression A, B, C, … for first successful test – 44 – Summary Computation Performed by combinational logic Computes Boolean functions Continuously reacts to input changes Storage Registers Hold single words Loaded as clock rises Random-access memories Hold multiple words Possible multiple read or write ports Read word when address input changes Write word as clock rises – 45 – SEQuential processor implementation CENG331: Introduction to Computer Systems 8th Lecture Instructor: Erol Sahin Acknowledgement: Most of the slides are adapted from the ones prepared by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ. Y86 Instruction Set Byte 0 nop 0 0 halt 1 0 rrmovl rA, rB 2 0 rA rB irmovl V, rB 3 0 8 rB V rmmovl rA, D(rB) 4 0 rA rB D mrmovl D(rB), rA OPl rA, rB jXX Dest call Dest ret pushl rA popl rA 5 1 2 3 0 rA rB 4 5 addl 6 0 subl 6 1 andl 6 2 xorl 6 3 jmp 7 0 jle 7 1 jl 7 2 je 7 3 jne 7 4 jge 7 5 jg 7 6 D 6 fn rA rB 7 fn 8 9 A B 0 Dest Dest 0 0 rA 8 0 rA 8 – 47 – newPC PC SEQ Hardware StructureWrite back valE, valM valM State Program counter register (PC) Condition code register (CC) Register File Memories Access same memory space Data Data memory memory Memory Addr, Data valE Bch Execute CC CC ALU ALU aluA, aluB Data: for reading/writing program data Instruction: for reading instructions valA, valB Instruction Flow srcA, srcB Read instruction at address specified by Decode dstA, dstB PC icode ifun valP Process through stages rA, , rB valC Update program counter Instruction PC Instruction PC Fetch memory memory A B Register RegisterM file file E increment increment PC – 48 – newPC SEQ Stages PC valE, valM Write back valM Fetch Read instruction from instruction memory Data Data memory memory Memory Addr, Data Decode Read program registers Execute valE Bch Execute CC CC aluA, aluB Compute value or address valA, valB Memory Read or write data icode ifun rA , rB valC B valP , Fetch A Register RegisterM file file E Write program registers PC srcA, srcB dstA, dstB Decode Write Back ALU ALU Instruction Instruction memory memory PC PC increment increment Update program counter PC – 49 – Instruction Decoding Optional 5 Optional 0 rA rB D icode ifun rA rB valC Instruction Format Instruction byte Optional register byte Optional constant word icode:ifun rA:rB valC – 50 – Executing Arith./Logical Operation OPl rA, rB Fetch Memory Read 2 bytes Decode 6 fn rA rB Read operand registers Execute Perform operation Set condition codes Do nothing Write back Update register PC Update Increment PC by 2 – 51 – Stage Computation: Arith/Log. Ops OPl rA, rB icode:ifun M1[PC] Read instruction byte rA:rB M1[PC+1] Read register byte valP PC+2 Compute next PC valA R[rA] Read operand A valB R[rB] Read operand B valE valB OP valA Perform ALU operation Set CC Set condition code register Memory Write R[rB] valE Write back result back PC update PC valP Update PC Fetch Decode Execute Formulate instruction execution as sequence of simple steps Use same general form for all instructions – 52 – Executing rmmovl rmmovl rA, D(rB) 4 0 rA rB Fetch Memory Read 6 bytes Decode Read operand registers Execute D Compute effective address Write to memory Write back Do nothing PC Update Increment PC by 6 – 53 – Stage Computation: rmmovl rmmovl rA, D(rB) Fetch Decode Execute Memory Write back PC update icode:ifun M1[PC] Read instruction byte rA:rB M1[PC+1] Read register byte valC M4[PC+2] Read displacement D valP PC+6 Compute next PC valA R[rA] Read operand A valB R[rB] Read operand B valE valB + valC Compute effective address M4[valE] valA Write value to memory PC valP Update PC Use ALU for address computation – 54 – Executing popl popl rA Fetch Memory Read 2 bytes Decode Read stack pointer Execute b 0 rA 8 Increment stack pointer by 4 Read from old stack pointer Write back Update stack pointer Write result to register PC Update Increment PC by 2 – 55 – Stage Computation: popl popl rA icode:ifun M1[PC] Read instruction byte rA:rB M1[PC+1] Read register byte valP PC+2 valA R[%esp] Compute next PC valB R [%esp] Read stack pointer valE valB + 4 Increment stack pointer Memory Write valM M4[valA] R[%esp] valE Read from stack back PC update R[rA] valM Write back result PC valP Update PC Fetch Decode Execute Read stack pointer Update stack pointer Use ALU to increment stack pointer Must update two registers Popped value New stack pointer – 56 – Executing Jumps jXX Dest 7 fn fall thru: XX XX Not taken target: XX XX Taken Fetch Memory Read 5 bytes Increment PC by 5 Decode Do nothing Execute Dest Determine whether to take branch based on jump condition and condition codes Do nothing Write back Do nothing PC Update Set PC to Dest if branch taken or to incremented PC if not branch – 57 – Stage Computation: Jumps jXX Dest Fetch icode:ifun M1[PC] Read instruction byte valC M4[PC+1] Read destination address valP PC+5 Fall through address Bch Cond(CC,ifun) Take branch? PC Bch ? valC : valP Update PC Decode Execute Memory Write back PC update Compute both addresses Choose based on setting of condition codes and branch condition – 58 – Executing call call Dest XX XX target: XX XX Memory Read 5 bytes Increment PC by 5 Decode Read stack pointer Execute Dest return: Fetch 8 0 Decrement stack pointer by 4 Write incremented PC to new value of stack pointer Write back Update stack pointer PC Update Set PC to Dest – 59 – Stage Computation: call call Dest icode:ifun M1[PC] Read instruction byte valC M4[PC+1] Read destination address valP PC+5 Compute return point valB R[%esp] Read stack pointer valE valB + –4 Decrement stack pointer Memory Write M4[valE] valP R[%esp] valE Write return value on stack back PC update PC valC Set PC to destination Fetch Decode Execute Update stack pointer Use ALU to decrement stack pointer Store incremented PC – 60 – Executing ret 9 0 ret return: Fetch XX XX Memory Read 1 byte Decode Read stack pointer Execute Increment stack pointer by 4 Read return address from old stack pointer Write back Update stack pointer PC Update Set PC to return address – 61 – Stage Computation: ret ret icode:ifun M1[PC] Read instruction byte valA R[%esp] Read operand stack pointer valB R[%esp] Read operand stack pointer valE valB + 4 Increment stack pointer Memory Write valM M4[valA] R[%esp] valE Read return address back PC update PC valM Set PC to return address Fetch Decode Execute Update stack pointer Use ALU to increment stack pointer Read return address from memory – 62 – Computation Steps OPl rA, rB Fetch Decode Execute icode,ifun icode:ifun M1[PC] Read instruction byte rA,rB rA:rB M1[PC+1] Read register byte valC [Read constant word] valP valP PC+2 Compute next PC valA, srcA valA R[rA] Read operand A valB, srcB valB R[rB] Read operand B valE valE valB OP valA Perform ALU operation Cond code Set CC Set condition code register Memory Write valM [Memory read/write] back PC update dstM dstE PC R[rB] valE Write back ALU result [Write back memory result] PC valP Update PC All instructions follow same general pattern Differ in what gets computed on each step – 63 – Irmovl: Computation Steps irmovl V, rB Fetch Decode Execute icode,ifun icode:ifun M1[PC] Read instruction byte rA,rB rA:rB M1[PC+1] Read register byte valC valC M4[PC+2] [Read constant word] valP valP PC+6 Compute next PC valE 0 + valC Perform ALU operation R[rB] valE Write back ALU result PC valP Update PC valA, srcA valB, srcB valE Cond code Memory Write valM back PC update dstM dstE PC All instructions follow same general pattern Differ in what gets computed on each step – 64 – Computation Steps call Dest icode,ifun Fetch Decode Execute rA,rB [Read register byte] valC M4[PC+1] Read constant word valP valP PC+5 Compute next PC valA, srcA [Read operand A] valB, srcB valB R[%esp] Read operand B valE valE valB + –4 Perform ALU operation Cond code valM back PC update dstM Read instruction byte valC Memory Write icode:ifun M1[PC] dstE PC [Set condition code reg.] M4[valE] valP R[%esp] valE [Memory read/write] [Write back ALU result] Write back memory result PC valC Update PC All instructions follow same general pattern Differ in what gets computed on each step – 65 – Computed Values Fetch Execute icode Instruction code ifun Instruction function rA Instr. Register A rB Instr. Register B valC Instruction constant valP Incremented PC valE Bch ALU result Branch flag Memory valM Value from memory Decode srcA srcB dstE dstM valA Register ID A Register ID B Destination Register E Destination Register M Register value A valB Register value B – 66 – newPC SEQ Hardware New PC PC Key valM data out Blue boxes: predesigned hardware blocks read Data Data memory memory Mem. control Memory write Addr Data E.g., memories, ALU Gray boxes: logic control Execute Bch valE CC CC ALU ALU Describe in HCL White ovals: for signals Thick lines: word values Thin lines: values Dotted lines: values ALU A ALU fun. ALU B labels valA valB 32-bit Decode A dstE dstM srcA srcB dstE dstM srcA srcB B Register Register M file file E 4-8 bit Write back icode ifun rA rB valC valP 1-bit Fetch Instruction Instruction memory memory PC PC increment increment PC – 67 – Fetch Logic icode ifun rA rB valC valP Need valC Instr valid Need regids Split Split PC PC increment increment Align Align Byte 0 Bytes 1-5 Instruction Instruction memory memory Predefined Blocks PC PC: Register containing PC Instruction memory: Read 6 bytes (PC to PC+5) Split: Divide instruction byte into icode and ifun Align: Get fields for rA, rB, and valC – 68 – Fetch Logic icode ifun rA rB valC valP Need valC Instr valid Need regids Split Split PC PC increment increment Align Align Byte 0 Bytes 1-5 Instruction Instruction memory memory PC Control Logic Instr. Valid: Is this instruction valid? Need regids: Does this instruction have a register bytes? Need valC: Does this instruction have a constant word? – 69 – Fetch Control Logic nop 0 0 halt 1 0 rrmovl rA, rB 2 0 rA rB irmovl V, rB 3 0 8 rB V rmmovl rA, D(rB) 4 0 rA rB D mrmovl D(rB), rA 5 0 rA rB D OPl rA, rB 6 fn rA rB jXX Dest 7 fn Dest call Dest 8 0 Dest ret 9 0 pushl rA A 0 rA 8 popl rA B 0 rA 8 bool need_regids = icode in { IRRMOVL, IOPL, IPUSHL, IPOPL, IIRMOVL, IRMMOVL, IMRMOVL }; bool instr_valid = icode in { INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL, IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL }; – 70 – Decode Logic valA valB A B valM valE Register File Read ports A, B Write ports E, M Addresses are register IDs or 8 (no access) Register Register file file dstE dstM srcA dstE dstM srcA M E srcB srcB Control Logic srcA, srcB: read port addresses dstA, dstB: write port addresses icode rA rB – 71 – A Source OPl rA, rB Decode valA R[rA] Read operand A rmmovl rA, D(rB) Decode valA R[rA] Read operand A popl rA Decode valA R[%esp] Read stack pointer jXX Dest Decode No operand call Dest Decode No operand ret Decode valA R[%esp] Read stack pointer int srcA = [ icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL icode in { IPOPL, IRET } : RESP; 1 : RNONE; # Don't need register ]; } : rA; – 72 – E Destination OPl rA, rB Write-back R[rB] valE Write back result rmmovl rA, D(rB) Write-back None popl rA Write-back R[%esp] valE Update stack pointer jXX Dest Write-back None call Dest Write-back R[%esp] valE Update stack pointer ret Write-back R[%esp] valE Update stack pointer int dstE = [ icode in { IRRMOVL, IIRMOVL, IOPL} : rB; icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP; 1 : RNONE; # Don't need register ]; – 73 – Execute Logic Units ALU Implements 4 required functions Generates condition code values bcond bcond bcond Control Logic Set CC: Should condition code register be loaded? ALU A: Input A to ALU ALU B: Input B to ALU Set CC icode ifun ALU fun. ALU ALU CC CC Computes branch flag valE CC Register with 3 condition code bits Bch ALU A valC ALU B valA valB ALU fun: What function should ALU compute? – 74 – ALU A Input Execute OPl rA, rB valE valB OP valA Perform ALU operation rmmovl rA, D(rB) Execute valE valB + valC Compute effective address popl rA Execute valE valB + 4 Increment stack pointer jXX Dest Execute No operation call Dest Execute valE valB + –4 Decrement stack pointer ret Execute valE valB + 4 Increment stack pointer int aluA = [ icode in { IRRMOVL, IOPL } : valA; icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC; icode in { ICALL, IPUSHL } : -4; icode in { IRET, IPOPL } : 4; # Other instructions don't need ALU ]; – 75 – ALU Operation OPl rA, rB Execute valE valB OP valA Perform ALU operation rmmovl rA, D(rB) Execute valE valB + valC Compute effective address popl rA Execute valE valB + 4 Increment stack pointer jXX Dest Execute No operation call Dest Execute valE valB + –4 Decrement stack pointer ret Execute valE valB + 4 int alufun = [ icode == IOPL : ifun; 1 : ALUADD; ]; Increment stack pointer – 76 – Memory Logic Memory valM Reads or writes memory word data out Mem. read Control Logic Mem. read: should word be read? Mem. write: should word be written? Mem. addr.: Select address Mem. data.: Select data Mem. write read Data Data memory memory write data in Mem addr icode valE Mem data valA valP – 77 – Memory Address OPl rA, rB Memory No operation rmmovl rA, D(rB) Memory M4[valE] valA Write value to memory popl rA Memory valM M4[valA] Read from stack jXX Dest Memory No operation call Dest Memory M4[valE] valP Write return value on stack ret Memory valM M4[valA] Read return address int mem_addr = [ icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : valE; icode in { IPOPL, IRET } : valA; # Other instructions don't need address ]; – 78 – Memory Read OPl rA, rB Memory No operation rmmovl rA, D(rB) Memory M4[valE] valA Write value to memory popl rA Memory valM M4[valA] Read from stack jXX Dest Memory No operation call Dest Memory M4[valE] valP Write return value on stack ret Memory valM M4[valA] Read return address bool mem_read = icode in { IMRMOVL, IPOPL, IRET }; – 79 – PC Update Logic PC New PC New PC Select next value of PC icode Bch valC valM valP – 80 – PC Update OPl rA, rB PC update PC valP Update PC rmmovl rA, D(rB) PC update PC valP Update PC popl rA PC update PC valP Update PC jXX Dest PC update PC Bch ? valC : valP Update PC call Dest PC update PC valC Set PC to destination ret PC update PC valM Set PC to return address int new_pc = [ icode == ICALL : valC; icode == IJXX && Bch : valC; icode == IRET : valM; 1 : valP; ]; – 81 – SEQ Operation State PC register Cond. Code register Data memory Register file All updated as clock rises Combinational Logic Read Write Data Data memory memory CC CC Read Ports Write Ports Register Register file file Combinational Logic PC 0x00c ALU Control logic Memory reads Instruction memory Register file Data memory – 82 – Cycle 1 SEQ Operation #2 Combinational Logic CC CC 100 100 Cycle 2 Cycle 3 Cycle 4 Clock Cycle 1: 0x000: irmovl $0x100,%ebx # %ebx <-- 0x100 Cycle 2: 0x006: irmovl $0x200,%edx # %edx <-- 0x200 Cycle 3: 0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000 Cycle 4: 0x00e: je dest # Not taken Write Read Data Data memory memory Write Ports Read Ports Register Register file file state set according to second irmovl instruction combinational logic starting to react to state changes %ebx 0x100 %ebx == 0x100 PC 0x00c – 83 – Cycle 1 SEQ Operation #3 Combinational Logic CC CC 100 100 000 Cycle 2 Cycle 3 Cycle 4 Clock Cycle 1: 0x000: irmovl $0x100,%ebx # %ebx <-- 0x100 Cycle 2: 0x006: irmovl $0x200,%edx # %edx <-- 0x200 Cycle 3: 0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000 Cycle 4: 0x00e: je dest # Not taken Write Read state set according to second irmovl instruction combinational logic generates results for addl instruction Data Data memory memory Write Ports Read Ports Register Register file file %ebx 0x100 %ebx == 0x100 0x00e PC 0x00c – 84 – Cycle 1 SEQ Operation #4 Cycle 3 Cycle 4 Cycle 1: 0x000: irmovl $0x100,%ebx # %ebx <-- 0x100 Cycle 2: 0x006: irmovl $0x200,%edx # %edx <-- 0x200 Cycle 3: 0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000 Cycle 4: 0x00e: je dest # Not taken Read Combinational Logic CC CC 000 000 Cycle 2 Clock Write state set according to addl instruction combinational logic starting to react to state changes Data Data memory memory Read Ports Write Ports Register Register file file %ebx %ebx == 0x300 0x300 PC 0x00e – 85 – Cycle 1 SEQ Operation #5 Cycle 3 Cycle 4 Cycle 1: 0x000: irmovl $0x100,%ebx # %ebx <-- 0x100 Cycle 2: 0x006: irmovl $0x200,%edx # %edx <-- 0x200 Cycle 3: 0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000 Cycle 4: 0x00e: je dest # Not taken Read Combinational Logic CC CC 000 000 Cycle 2 Clock Write state set according to addl instruction combinational logic generates results for je instruction Data Data memory memory Read Ports Write Ports Register Register file file %ebx %ebx == 0x300 0x300 0x013 PC 0x00e – 86 – SEQ Summary Implementation Express every instruction as series of simple steps Follow same general flow for each instruction type Assemble registers, memories, predesigned combinational blocks Connect with control logic Limitations Too slow to be practical In one cycle, must propagate through instruction memory, register file, ALU, and data memory Would need to run clock very slowly Hardware units only active for fraction of clock cycle – 87 –