Lecture 7A Y86 Processor State Computer Architecture I Program registers Instruction Set Architecture %eax %ecx %edx %ebx Assembly Language View Application Program Processor state Registers, memory, … Compiler OS PC Single-bit flags set by arithmetic or logical instructions CPU Design Layer of Abstraction Above: how to program machine » OF: Overflow sequence Use variety of tricks to make it run fast Indicates address of instruction Byte-addressable storage array Words stored in little-endian byte order E.g., execute multiple instructions Datorarkitektur 2007 simultaneously Y86 Instructions Format 1--6 bytes of information read from memory » Can determine instruction length from first byte » Not as many instruction types, and simpler encoding than with IA32 Each accesses and modifies some part(s) of the program state Datorarkitektur 2007 F7A – 2 – Instruction Example Addition Instruction Generic Form Encoded Representation addl rA, rB Encoding Registers 6 0 rA rB Add value in register rA to that in register rB Store result in register rB Each register has 4-bit ID %esi %edi %esp %ebp SF:Negative Memory Chip Layout Below: what needs to be built ZF: Zero Program Counter Circuit Design Processor executes instructions in a Note that Y86 only allows addition to be applied to register data 6 7 4 5 Set condition codes based on result e.g., addl %eax,%esi Encoding: 60 06 Two-byte encoding Same encoding as in IA32 First indicates instruction type Register ID 8 indicates “no register” Second gives source and destination registers Will use this in our hardware design in multiple places F7A – 3 – OF ZF SF Condition Codes ISA How instructions are encoded as bytes 0 1 2 3 Memory Same 8 as with IA32. Each 32 bits addl, movl, andl, … %eax %ecx %edx %ebx Condition codes Program Registers Instructions F7A – 1 – %esi %edi %esp %ebp Datorarkitektur 2007 F7A – 4 – Datorarkitektur 2007 Arithmetic and Logical Operations Instruction Code Add addl rA, rB Function Code 6 0 rA rB Refer to generically as “OPl” Subtract (rA from rB) Low-order 4 bytes in first 6 1 rA rB 2 0 rA rB Register --> Register irmovl V, rB 3 0 8 rB V Immediate --> Register rmmovl rA, D(rB) 4 0 rA rB D Register --> Memory mrmovl D(rB), rA 5 0 rA rB D Memory --> Register instruction word Set condition codes as side And andl rA, rB rrmovl rA, rB Encodings differ only by “function code” subl rA, rB Move Operations effect 6 2 rA rB Exclusive-Or xorl rA, rB Like the IA32 movl instruction 6 3 rA rB Simpler format for memory addresses Give different names to keep them distinct Datorarkitektur 2007 F7A – 5 – Move Instruction Examples Datorarkitektur 2007 F7A – 6 – Jump Instructions Jump Unconditionally Little endian IA32 Y86 Encoding movl $0xabcd, %edx irmovl $0xabcd, %edx 30 82 cd ab 00 00 jmp Dest 7 0 Dest Jump When Less or Equal jle Dest 7 1 Encodings differ only by Dest Jump When Less movl %esp, %ebx rrmovl %esp, %ebx 20 43 movl -12(%ebp),%ecx mrmovl -12(%ebp),%ecx 50 15 f4 ff ff ff movl %esi,0x41c(%esp) rmmovl %esi,0x41c(%esp) 40 64 1c 04 00 00 jl Dest 7 2 Dest 7 3 movl $0xabcd, (%eax) — Jump When Not Equal 7 4 jne Dest movl %eax, 12(%eax,%edx) — Jump When Greater or Equal movl (%ebp,%eax,4),%ecx — jge Dest 7 5 “function code” Based on values of condition codes Jump When Equal je Dest Refer to generically as “jXX” Dest Same as IA32 counterparts Encode full destination Dest address Unlike PC-relative addressing Dest seen in IA32 Jump When Greater jg Dest F7A – 7 – Datorarkitektur 2007 F7A – 8 – 7 6 Dest Datorarkitektur 2007 Y86 Program Stack Stack “Bottom” Stack Operations Region of memory holding program pushl rA a 0 rA 8 data • Increasing • Addresses • Used in Y86 (and IA32) for Decrement %esp by 4 supporting procedure calls Store word from rA to memory at %esp Stack top indicated by %esp Like IA32 Address of top stack element Stack grows toward lower addresses popl rA Top element is at highest address in %esp Stack “Top” the stack Read word from memory at %esp Save in rA stack pointer Increment %esp by 4 pointer Like IA32 Datorarkitektur 2007 Subroutine Call and Return call Dest 8 0 0 rA 8 When pushing, must first decrement When popping, increment stack F7A – 9 – b Datorarkitektur 2007 F7A – 10 – Miscellaneous Instructions nop Dest Push address of next instruction onto stack 0 0 Don’t do anything Start executing instructions at Dest Like IA32 ret halt 9 1 0 Stop executing instructions 0 IA32 has comparable instruction, but can’t execute it in user Pop value from stack mode Use as address for next instruction We will use it to stop the simulator Like IA32 F7A – 11 – Datorarkitektur 2007 F7A – 12 – Datorarkitektur 2007 Writing Y86 Code Y86 Code Generation Example Try to Use C Compiler as Much as Possible First Try Write code in C /* Find number of elements in null-terminated list */ int len1(int a[]) { int len; for (len = 0; a[len]; len++) ; return len; } Transliterate into Y86 Coding Example Find number of elements in null-terminated list int len1(int a[]); 5043 6125 7395 Hard to do array indexing on Y86 Compile for IA32 with gcc -S a Problem Write typical array code ⇒ 3 Since don’t have scaled addressing modes Similar to SPARC loop: incl %eax entry: cmpl $0,(%edx,%eax,4) jne L18 Compile with gcc -O2 -S 0 Datorarkitektur 2007 F7A – 13 – Datorarkitektur 2007 F7A – 14 – Y86 Code Generation Example #2 Y86 Code Generation Example #3 Second Try IA32 Code Write with pointer code Result Don’t need to do indexed addressing /* Find number of elements in null-terminated list */ int len2(int a[]) { int len = 0; while (*a++) len++; return len; } Setup len2: pushl %ebp xorl %ecx,%ecx movl %esp,%ebp movl 8(%ebp),%edx movl (%edx),%eax jmp entry loop: movl (%edx),%eax incl %ecx entry: addl $4,%edx testl %eax,%eax jne loop Y86 Code Setup len2: pushl %ebp # xorl %ecx,%ecx # rrmovl %esp,%ebp # mrmovl 8(%ebp),%edx # mrmovl (%edx),%eax # jmp entry # Save %ebp len = 0 Set frame Get a Get *a Goto entry Compile with gcc -O2 -S F7A – 15 – Datorarkitektur 2007 F7A – 16 – Datorarkitektur 2007 Y86 Program Structure Y86 Code Generation Example #4 IA32 Code Y86 Code Loop + Finish Loop + Finish loop: movl (%edx),%eax loop: mrmovl (%edx),%eax irmovl $1,%esi addl %esi,%ecx entry: irmovl $4,%esi addl %esi,%edx andl %eax,%eax jne loop rrmovl %ebp,%esp rrmovl %ecx,%eax popl %ebp ret incl %ecx entry: addl $4,%edx testl %eax,%eax jne loop movl %ebp,%esp movl %ecx,%eax popl %ebp ret # Get *a # len++ # # # # # a++ *a == 0? No--Loop Pop Rtn len Assembling Y86 Program F7A – 19 – address 0 Must set up stack Make sure don’t overwrite code! Must initialize data Can use symbolic names Datorarkitektur 2007 F7A – 18 – unix> yis eg.yo Instruction set simulator Actually looks like disassembler output b3130000 ed170000 e31c0000 00000000 Program starts at # List of elements # Function len2: . . . Generates “object code” file eg.yo 308400010000 2045 308218000000 a028 8028000000 10 # Push argument # Call Function # Halt Simulating Y86 Program unix> yas eg.ys 0x000: 0x006: 0x008: 0x00e: 0x010: 0x015: 0x018: 0x018: 0x018: 0x01c: 0x020: 0x024: # Set up stack # Set up frame # Allocate space for stack .pos 0x100 Stack: Datorarkitektur 2007 F7A – 17 – irmovl Stack,%esp rrmovl %esp,%ebp irmovl List,%edx pushl %edx call len2 halt .align 4 List: .long 5043 .long 6125 .long 7395 .long 0 | | | | | | | | | | | | irmovl Stack,%esp rrmovl %esp,%ebp irmovl List,%edx pushl %edx call len2 halt .align 4 List: .long 5043 .long 6125 .long 7395 .long 0 Computes effect of each instruction on processor state Prints changes in state from original # Set up stack # Set up frame Stopped in 41 steps at PC = 0x16. Exception 'HLT', CC Z=1 S=0 O=0 Changes to registers: %eax: 0x00000000 0x00000003 %ecx: 0x00000000 0x00000003 %edx: 0x00000000 0x00000028 %esp: 0x00000000 0x000000fc %ebp: 0x00000000 0x00000100 %esi: 0x00000000 0x00000004 # Push argument # Call Function # Halt # List of elements Changes to memory: 0x00f4: 0x00f8: 0x00fc: Datorarkitektur 2007 F7A – 20 – 0x00000000 0x00000000 0x00000000 0x00000100 0x00000015 0x00000018 Datorarkitektur 2007 RISC Instruction Sets CISC Instruction Sets Reduced Instruction Set Computer Complex Instruction Set Computer Internal project at IBM, later popularized by Hennessy (Stanford) Dominant style through mid-80’s and Patterson (Berkeley) Stack-oriented instruction set Fewer, simpler instructions Use stack to pass arguments, save program counter Might take more to get given task done Explicit push and pop instructions Can execute them with small and fast hardware Arithmetic instructions can access memory Register-oriented instruction set addl %eax, 12(%ebx,%ecx,4) Many more (typically 32) registers requires memory read and write Use for arguments, return pointer, temporaries Complex address calculation Only load and store instructions can access memory Condition codes Similar to Y86 mrmovl and rmmovl Set as side effect of arithmetic and logical instructions No Condition codes Philosophy Test instructions return 0/1 in register Add instructions to perform “typical” programming tasks SPARC has condition codes Datorarkitektur 2007 F7A – 21 – Datorarkitektur 2007 F7A – 22 – CISC vs. RISC Summary Original Debate Y86 Instruction Set Architecture Similar state and instructions as IA32 Strong opinions! Simpler encodings CISC proponents---easy for compiler, fewer code bytes Somewhere between CISC and RISC RISC proponents---better for optimizing compilers, can make run How Important is ISA Design? fast with simple chip design Less now than before Current Status With enough hardware, can make almost anything go fast For desktop processors, choice of ISA not a technical issue Intel is moving away from IA32 With enough hardware, can make anything run fast Does not allow enough parallel execution Code compatibility more important Introduced IA64 » 64-bit word sizes (overcome address space limitations) » Radically different style of instruction set with explicit parallelism » Requires sophisticated compilers For embedded processors, RISC makes sense Smaller, cheaper, less power F7A – 23 – Datorarkitektur 2007 F7A – 24 – Datorarkitektur 2007