Introduction to Computer Architecture Course number: CS141 Who? Tarun Soni ( tsoni@cs.ucsd.edu ) TA: Wenjing Rao (wrao@cs) and Eric Liu (xeliu@cs) Where? CENTR: 119 When? M,W @ 6-8:50pm Textbook: Patterson and Hennessy, Computer Organization & Design The hardware software interface, 2nd edition. Web-page: http://www-cse.ucsd.edu/users/tsoni/cse141 (slides, homework questions, other pointers and information) Office hours: Tarun: Mon. 4pm-6pm: AP&M 3151 Yang Yu and Wenjing Rao: TBD, look on the webpage CS141-L1-1 Tarun Soni, Summer ‘03 Todays Agenda Administrivia Technology trends Computer organization: concept of abstraction Instruction Set Architectures: Definition, types, examples Instruction formats: operands, addressing modes Operations: load, store, arithmetic, logical Control instructions: branch, jump, procedures Stacks Examples: in-line code, procedure, nested-procedures Other architectures CS141-L1-2 Tarun Soni, Summer ‘03 Schedule-sort of 1 6/30 Intro., Technology, ISA 2 7/2 Performance, Cost, Arithmetic 3 7/7 Multiply, Divide?, FP numbers 4 7/9 Single cycle: Datapath, Control 5 7/14 Multiple Cycle CPU, Microprogramming 6 7/16 Mid-term quiz; 7 7/21 Pipelining: intro, control, exceptions 8 7/23 Memory systems, Cache, Virtual memory 9 7/28 I/O Devices 10 7/30 Superscalars, Parallel machines 11 ?? Overview, wrapup, catchup .. ** ?? Final, 7-10 pm, Friday CS141-L1-3 Tarun Soni, Summer ‘03 Grading • Grade breakdown – Mid-term (1.5 hours) 30% – Final (3 hours) 40% – Pop-Quizzes (3, 45 min each, only 2 high scores cout) 30% – Class Participation: Extras?? • Can’t make exams: tell us early and we will work something out • Homeworks do not need to be turned in. However, pop-quizzes will be based on hw. • What is cheating? – Studying together in groups is encouraged – Work must be your own – Common examples of cheating: copying an exam question from other material or other person... – Better off to skip question (small fraction of grade.) • Written/email request for changes to grades – average grade will be a B or B+; set expectations accordingly CS141-L1-4 Tarun Soni, Summer ‘03 Why? • You may become a practitioner someday ? • Keeper of Moore’s law • Architecture concepts are core to other sub-systems • Video-processors • Security engines • Routing/Networking etc. • Even if you become a software geek? • Architecture enables a way of thinking • Understanding leads to breadth and better implementation of software CS141-L1-5 Tarun Soni, Summer ‘03 ‘Computer” of the day Jacquard loom late 1700’s for weaving silk “Program” on punch cards “Microcode”: each hole lifts a set of threads “Or gate”: thread lifted if any controlling hole punched CS141-L1-6 Tarun Soni, Summer ‘03 Trends: Moores law CS141-L1-7 Tarun Soni, Summer ‘03 Trends: $1000 will buy you… CS141-L1-8 Tarun Soni, Summer ‘03 Trends: Densities CS141-L1-9 Tarun Soni, Summer ‘03 Technology Source: Intel Journal, May 2002 CS141-L1-10 Tarun Soni, Summer ‘03 Other technology trends • Processor Physics-advancement – logic capacity: about 30% per year Architecture-advancement – clock rate: about 20% per year • Memory – DRAM capacity: about 60% per year (4x every 3 years) – Memory speed: about 10% per year – Cost per bit: about 25% per year 100000000 • Disk 10000000 1000000 – capacity: about 60% per year 100000 1000 CPU logic capacity DRAM capacity 10000 Disk capacity 100 1000 CPU Speed DRAM Speed 100 10 10 CS141-L1-11 34 31 28 25 22 19 16 13 10 7 4 34 31 28 25 22 19 16 13 7 10 4 1 Speed 1 1 1 Capacity Tarun Soni, Summer ‘03 SPEC Performance 350 300 RISC Performance 250 200 150 100 RISC introduction Intel x86 35%/yr 50 0 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 Year performance now improves 50% per year (2x every 1.5 years) CS141-L1-12 Tarun Soni, Summer ‘03 Organization: A Basic Computer Every computer has 5 basic components Computer Control Input Memory Datapath CS141-L1-13 Output Tarun Soni, Summer ‘03 Organization: A Basic Computer • Not all “memory” are created equally – Cache: fast (expensive) memory are placed closer to the processor – Main memory: less expensive memory--we can have more Proc Caches Busses adapters Memory Controllers I/O Devices: Disks Displays Keyboards Networks • Input and output (I/O) devices have the messiest organization – Wide range of speed: graphics vs. keyboard – Wide range of requirements: speed, standard, cost ... – Least amount of research (so far) CS141-L1-14 Tarun Soni, Summer ‘03 What is “Computer Architecture” Computer Architecture = Instruction Set Architecture + Machine Organization How you talk to the machine What the machine looks like Computer Architecture and Engineering Instruction Set Design Computer Organization Interfaces Hardware Components Compiler/System View Logic Designer’s View CS141-L1-15 Tarun Soni, Summer ‘03 Architecture? Application Operating System Compiler Firmware Instr. Set Proc. I/O system Instruction Set Architecture Datapath & Control Digital Design Circuit Design Layout • Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, Measurement, and Evaluation CS141-L1-16 Tarun Soni, Summer ‘03 Levels of abstraction? temp = v[k]; High Level Language Program v[k] = v[k+1]; v[k+1] = temp; Compiler lw lw sw sw Assembly Language Program Assembler Machine Language Program 0000 1010 1100 0101 1001 1111 0110 1000 $15, $16, $16, $15, 1100 0101 1010 0000 0110 1000 1111 1001 0($2) 4($2) 0($2) 4($2) 1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111 Machine Interpretation Control Signal Specification CS141-L1-17 ALUOP[0:3] <= InstReg[9:11] & MASK Tarun Soni, Summer ‘03 Instruction Set Architecture ISA is the agreed-upon interface between all the software that runs on the machine and the hardware that executes it. software instruction set hardware CS141-L1-18 Tarun Soni, Summer ‘03 Example ISAs • • • • • • IBM360, VAX etc. Digital Alpha (v1, v3) HP PA-RISC (v1.1, v2.0) Sun Sparc (v8, v9) SGI MIPS (MIPS I, II, III, IV, V) Intel (8086,80286,80386, 80486,Pentium, MMX, ...) • ARM ARM7,8,StrongARM 1992-97 1986-96 1987-95 1986-96 1978-96 1995- Digital Signal Processors also have an ISA TMS320, Motorola, OAK etc. CS141-L1-19 Tarun Soni, Summer ‘03 ISAs Instruction Set Architecture “How to talk to computers if you aren’t in Star Trek” CS141-L1-20 Tarun Soni, Summer ‘03 ISAs • • • • Language of the Machine More primitive than higher level languages e.g., no sophisticated control flow Very restrictive e.g., MIPS Arithmetic Instructions We’ll be working with the MIPS instruction set architecture – similar to other architectures developed since the 1980's – used by NEC, Nintendo, Silicon Graphics, Sony Design goals: maximize performance and minimize cost, reduce design time CS141-L1-21 Tarun Soni, Summer ‘03 ISAs Ideally the only part of the machine visible to the programmer/compiler • • • • • CS141-L1-22 Available instructions (Opcodes) Formats Registers, number and type Addressing modes, access mechanisms Exception conditions etc. Tarun Soni, Summer ‘03 Instruction Set Architecture: What Must be Specified? Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction CS141-L1-23 ° Instruction Format or Encoding – how is it decoded? ° Location of operands and result – where other than memory? – how many explicit operands? – how are memory operands located? – which can or cannot be in memory? ° Data type and Size ° Operations – what are supported ° Successor instruction – jumps, conditions, branches fetch-decode-execute is implicit! Tarun Soni, Summer ‘03 Vocabulary • • • • superscalar processor -- can execute more than one instructions per cycle. cycle -- smallest unit of time in a processor. parallelism -- the ability to do more than one thing at once. pipelining -- overlapping parts of a large task to increase throughput without decreasing latency CS141-L1-24 Tarun Soni, Summer ‘03 ISA Decisions destination operand • • • operations – how many? – which ones operands – how many? – location – types – how to specify? instruction format – size – how many formats? CS141-L1-25 operation y=x+b (add r1, r2, r5) how does the computer know what 0001 0100 1101 1111 means? Tarun Soni, Summer ‘03 Crafting an ISA • • • • We’ll look at some of the decisions facing an instruction set architect, and how those decisions were made in the design of the MIPS instruction set. MIPS, like SPARC, PowerPC, and Alpha AXP, is a RISC (Reduced Instruction Set Computer) ISA. – fixed instruction length – few instruction formats – load/store architecture RISC architectures worked because they enabled pipelining. They continue to thrive because they enable parallelism. CS141-L1-26 Tarun Soni, Summer ‘03 Basic types of ISAs Accumulator (1 register): 1 address add A acc acc + mem[A] 1+x address addx A acc acc + mem[A + x] Stack: 0 address add tos tos + next General Purpose Register: 2 address add A B EA(A) EA(A) + EA(B) 3 address add A B C EA(A) EA(B) + EA(C) Load/Store: 3 address add Ra Rb Rc Ra Rb + Rc load Ra Rb Ra mem[Rb] store Ra Rb mem[Rb] Ra Comparison: Bytes per instruction? Number of Instructions? Cycles per instruction? CS141-L1-27 Tarun Soni, Summer ‘03 Instruction Count C = A+B Accumulator (1 register): Load A Add B Store C Stack: Push A Push B Add Pop C CS141-L1-28 General Purpose Register: (Register-Memory) Load R1,A Add R1,B Store C,R1 Load/Store: Load R1,A Load R2,B Add R3,R1,R2 Store C,R3 Tarun Soni, Summer ‘03 Instruction Length Variable: … Fixed: Hybrid: MIPS Instructions • • All instructions have 3 operands Operand order is fixed (destination first) C code: MIPS code: CS141-L1-29 A = B + C add $s0, $s1, $s2 (associated with variables by compiler) Tarun Soni, Summer ‘03 Instruction Length • • Variable-length instructions (Intel 80x86, VAX) require multi-step fetch and decode, but allow for a much more flexible and compact instruction set. Fixed-length instructions allow easy fetch and decode, and simplify pipelining and parallelism. All MIPS instructions are 32 bits long. – this decision impacts every other ISA decision we make because it makes instruction bits scarce. • If code size is most important, use variable length instructions • If performance is most important, use fixed length CS141-L1-30 Recent embedded machines (ARM, MIPS) added optional mode to execute subset of 16-bit wide instructions (Thumb, MIPS16) choose performance or density per procedure Tarun Soni, Summer ‘03 MIPS Instruction Format 6 bits 5 bits 5 bits 5 bits 6 bits rd sa funct OP rs rt OP rs rt OP • • 5 bits immediate target the opcode tells the machine which format so add r1, r2, r3 has – opcode=0, funct=32, rs=2, rt=3, rd=1, sa=0 – 000000 00010 00011 00001 00000 100000 CS141-L1-31 Tarun Soni, Summer ‘03 Operands • • • operands are generally in one of two places: – registers (32 int, 32 fp) – memory (232 locations) registers are – easy to specify – close to the processor (fast access) the idea that we want to access registers whenever possible led to load-store architectures. – normal arithmetic instructions only access registers – only access memory with explicit loads and stores CS141-L1-32 Tarun Soni, Summer ‘03 Load Store Architectures Load-store architectures can do: add r1=r2+r3 and load r3, M(address) can’t do add r1 = r2 + M(address) -more instructions +fast implementation (e.g., easy pipelining) forces heavy dependence on registers, which is exactly what you want in today’s CPUs Expect new instruction set architecture to use general purpose register Pipelining => Expect it to use load store variant of GPR ISA CS141-L1-33 Tarun Soni, Summer ‘03 General Purpose Registers ° Advantages of registers • registers are faster than memory • registers are easier for a compiler to use - e.g., (A*B) – (C*D) – (E*F) multiplies in any order vs. stack • registers can hold variables - memory traffic is reduced, so program is sped up - code density improves (since register named with fewer bits than memory location) MIPS Registers • Programmable storage – 2^32 x bytes of memory – 31 x 32-bit GPRs (R0 = 0) – 32 x 32-bit FP regs (paired DP) – HI, LO, PC CS141-L1-34 r0 r1 ° ° ° r31 PC lo hi 0 Tarun Soni, Summer ‘03 Memory Organization • • • Viewed as a large, single-dimension array, with an address. A memory address is an index into the array "Byte addressing" means that the index points to a byte of memory. CS141-L1-35 0 8 bits of data 1 8 bits of data 2 8 bits of data 3 8 bits of data 4 8 bits of data 5 8 bits of data 6 8 bits of data Tarun Soni, Summer ‘03 Memory Organization • • Bytes are nice, but most data items use larger "words" For MIPS, a word is 32 bits or 4 bytes. 0 32 bits of data 4 32 bits of data 8 32 bits of data 12 32 bits of data Registers hold 32 bits of data • • • ... 232 bytes with byte addresses from 0 to 232-1 230 words with byte addresses 0, 4, 8, ... 232-4 Words are aligned i.e., what are the least 2 significant bits of a word address? CS141-L1-36 Tarun Soni, Summer ‘03 Data Types Bit: 0, 1 Bit String: sequence of bits of a particular length 4 bits is a nibble 8 bits is a byte 16 bits is a half-word 32 bits is a word 64 bits is a double-word Character: ASCII 7 bit code Decimal: digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers: 2's Complement Floating Point: Single Precision Double Precision Extended Precision CS141-L1-37 exponent MxR mantissa E base How many +/- #'s? Where is decimal pt? How are +/- exponents represented? Tarun Soni, Summer ‘03 Operand Usage Doubleword 0% 69% 74% Word Halfword Byte Int Avg. 31% 19% FP Avg. 0% 7% 0% 0% 20% 40% 60% 80% Frequency of reference by size Support data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers CS141-L1-38 Tarun Soni, Summer ‘03 Addressing: Endian-ness and alignment • Big Endian: address of most significant byte = word address (xx00 = Big End of word) – IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA • Little Endian: address of least significant byte = word address (xx00 = Little End of word) – Intel 80x86, DEC Vax, DEC Alpha (Windows NT) little endian byte 0 3 2 1 0 msb lsb 0 0 big endian byte 0 1 2 1 2 3 3 Aligned Alignment: require that objects fall on address that is multiple of their size. Not Aligned CS141-L1-39 Tarun Soni, Summer ‘03 Addressing Modes how do we specify the operand we want? – Register direct – Immediate (literal) – Direct (absolute) – – – – – – – Register indirect M[R3] Base+Displacement M[R3 + 10000] if register is the program counter, this is PC-relative Base+Index M[R3 + R4] Scaled Index M[R3 + R4*d + 10000] Autoincrement M[R3++] Autodecrement M[R3 - -] – Memory Indirect CS141-L1-40 R3 #25 M[10000] M[ M[R3] ] Tarun Soni, Summer ‘03 Addressing Modes Addressing mode Example Meaning Register Add R4,R3 R4R4+R3 Immediate Add R4,#3 R4 R4+3 Displacement Add R4,100(R1) R4 R4+Mem[100+R1] Register indirect Add R4,(R1) Indexed / Base Add R3,(R1+R2) R3 R3+Mem[R1+R2] Direct or absolute Add R1,(1001) R1 R1+Mem[1001] Memory indirect Add R1,@(R3) R1 R1+Mem[Mem[R3]] Auto-increment Add R1,(R2)+ R1 R1+Mem[R2]; R2 R2+d Auto-decrement Add R1,–(R2) R2 R2–d; R1 R1+Mem[R2] Scaled CS141-L1-41 Add R1,100(R2)[R3] R4 R4+Mem[R1] R1 R1+Mem[100+R2+R3*d] Tarun Soni, Summer ‘03 Addressing Modes: Usage 3 programs measured on machine with all address modes (VAX) --- Displacement: 42% avg, 32% to 55% --- Immediate: 33% avg, 17% to 43% 75% 85% --- Register deferred (indirect): 13% avg, 3% to 24% --- Scaled: 7% avg, 0% to 16% --- Memory indirect: 3% avg, 1% to 6% --- Misc: 2% avg, 0% to 3% 75% displacement & immediate 88% displacement, immediate & register indirect similar measurements: - 16 bits is enough for the immediate address 75 to 80% of the time - 16 bits is enough of a displacement 99% of the time. CS141-L1-42 Tarun Soni, Summer ‘03 Addressing mode usage: Application Specific Program Base + Displacement Immediate Scaled Index Memory Indirect All Others TEX 56% 43% 0 1 0 Spice 58% 17% 16% 6% 3% GCC 51% 39% 6% 1% 3% CS141-L1-43 Tarun Soni, Summer ‘03 MIPS Addressing Modes register direct OP rs add $1, $2, $3 immediate OP rs add $1, $2, #35 base + displacement lw $1, disp($2) rt rt CS141-L1-44 sa funct immediate rs immediate rt rd register indirect disp = 0 absolute (rs) = 0 Tarun Soni, Summer ‘03 MIPS ISA-so far • • • • • • • fixed 32-bit instructions 3 instruction formats 3-operand, load-store architecture 32 general-purpose registers (integer, floating point) – R0 always equals 0. 2 special-purpose integer registers, HI and LO, because multiply and divide produce more than 32 bits. registers are 32-bits wide (word) register, immediate, and base+displacement addressing modes But what about the actual instructions themselves ?? CS141-L1-45 Tarun Soni, Summer ‘03 Typical Operations (little change since 1960) Data Movement Load (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack) Arithmetic integer (binary + decimal) or FP Add, Subtract, Multiply, Divide Shift shift left/right, rotate left/right Logical not, and, or, set, clear Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return Interrupt trap, return Synchronization test & set (atomic r-m-w) String search, translate Graphics (MMX) parallel subword ops (4 16bit add) CS141-L1-46 Tarun Soni, Summer ‘03 80x86 Instruction usage ° Rank instruction Integer Av erage Percent total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 mov e register-register 4% 9 call 1% 10 return 1% Total 96% ° Simple instructions dominate instruction frequency CS141-L1-47 Tarun Soni, Summer ‘03 Instruction usage • Support the simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return; Compiler Issues orthogonality: no special registers, few special cases, all operand modes available with any data type or instruction type completeness: support for a wide range of operations and target applications regularity: no overloading for the meanings of instruction fields streamlined: resource needs easily determined Register Assignment is critical too Easier if lots of registers CS141-L1-48 Tarun Soni, Summer ‘03 MIPS Instructions • • • • • CS141-L1-49 arithmetic – add, subtract, multiply, divide logical – and, or, shift left, shift right data transfer – load word, store word conditional Branch unconditional Jump Tarun Soni, Summer ‘03 MIPS Instructions • arithmetic – add, subtract, multiply, divide Instruction add subtract add immediate add unsigned subtract unsigned add imm. unsign. multiply multiply unsigned divide Example add $1,$2,$3 sub $1,$2,$3 addi $1,$2,100 addu $1,$2,$3 subu $1,$2,$3 addiu $1,$2,100 mult $2,$3 multu$2,$3 div $2,$3 divide unsigned divu $2,$3 move from Hi move from Lo mfhi $1 mflo $1 CS141-L1-50 Meaning $1 = $2 + $3 $1 = $2 – $3 $1 = $2 + 100 $1 = $2 + $3 $1 = $2 – $3 $1 = $2 + 100 Hi, Lo = $2 x $3 Hi, Lo = $2 x $3 Lo = $2 ÷ $3, Hi = $2 mod $3 Lo = $2 ÷ $3, Hi = $2 mod $3 $1 = Hi $1 = Lo Comments 3 operands; exception possible 3 operands; exception possible + constant; exception possible 3 operands; no exceptions 3 operands; no exceptions + constant; no exceptions 64-bit signed product 64-bit unsigned product Lo = quotient, Hi = remainder Unsigned quotient & remainder Used to get copy of Hi Used to get copy of Lo Tarun Soni, Summer ‘03 MIPS Instructions • logical – and, or, shift left, shift right Instruction and or xor nor and immediate or immediate xor immediate shift left logical shift right logical shift right arithm. shift left logical shift right logical shift right arithm. CS141-L1-51 Example and $1,$2,$3 or $1,$2,$3 xor $1,$2,$3 nor $1,$2,$3 andi $1,$2,10 ori $1,$2,10 xori $1, $2,10 sll $1,$2,10 srl $1,$2,10 sra $1,$2,10 sllv $1,$2,$3 srlv $1,$2, $3 srav $1,$2, $3 Meaning $1 = $2 & $3 $1 = $2 | $3 $1 = $2 Å $3 $1 = ~($2 |$3) $1 = $2 & 10 $1 = $2 | 10 $1 = ~$2 &~10 $1 = $2 << 10 $1 = $2 >> 10 $1 = $2 >> 10 $1 = $2 << $3 $1 = $2 >> $3 $1 = $2 >> $3 Comment 3 reg. operands; Logical AND 3 reg. operands; Logical OR 3 reg. operands; Logical XOR 3 reg. operands; Logical NOR Logical AND reg, constant Logical OR reg, constant Logical XOR reg, constant Shift left by constant Shift right by constant Shift right (sign extend) Shift left by variable Shift right by variable Shift right arith. by variable Tarun Soni, Summer ‘03 MIPS Instructions • data transfer – load word, store word Instruction SW 500(R4), R3 SH 502(R2), R3 SB 41(R3), R2 Comment Store word Store half Store byte LW R1, 30(R2) LH R1, 40(R3) LHU R1, 40(R3) LB R1, 40(R3) LBU R1, 40(R3) Load word Load halfword Load halfword unsigned Load byte Load byte unsigned LUI R1, 40 Load Upper Immediate (16 bits shifted left by 16) Why need LUI? LUI R5 CS141-L1-52 R5 0000 … 0000 Tarun Soni, Summer ‘03 MIPS Control Instructions • • • How do you specify the destination of a branch/jump? studies show that almost all conditional branches go short distances from the current program counter (loops, if-then-else). – we can specify a relative address in much fewer bits than an absolute address – e.g., beq $1, $2, 100 => if ($1 == $2) PC = PC + 100 * 4 How do we specify the condition of the branch? ° Condition Codes Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions. add r1, r2, r3 bz label ° Condition Register cmp r1, r2, r3 bgt r1, label ° Compare and Branch bgt r1, r2, label CS141-L1-53 Tarun Soni, Summer ‘03 Conditional Branch Distance Int. Avg. FP Avg. 40% 30% 20% 10% 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0% Bits of Branch Dispalcement CS141-L1-54 Tarun Soni, Summer ‘03 Conditional Branching • PC-relative since most branches are relatively close to the current PC address • At least 8 bits suggested (± 128 instructions) • Compare Equal/Not Equal most important for integer programs (86%) 7% LT/GE 40% Int Avg. 7% GT/LE 23% FP Avg. 86% EQ/NE 37% 0% 50% 100% Freque ncy of comparison types in branches CS141-L1-55 Tarun Soni, Summer ‘03 Conditional Branching • Compare and Branch – BEQ rs, rt, offset if R[rs] == R[rt] then PC-relative branch – BNE rs, rt, offset <> • Compare to zero and Branch – BLEZ rs, offset if R[rs] <= 0 then PC-relative branch – BGTZ rs, offset > – BLT < – BGEZ >= – BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31) – BGEZAL >= • Remaining set of compare and branch take two instructions • Almost all comparisons are against zero! MIPS Branch Instructions • • • beq, bne beq r1, r2, addr => if (r1 == r2) goto addr slt $1, $2, $3 => if ($2 < $3) $1 = 1; else $1 = 0 these, combined with $0, can implement all fundamental branch conditions – Always, never, !=, ==, >, <=, >=, <, >(unsigned), <= (unsigned), ... CS141-L1-56 Tarun Soni, Summer ‘03 Jumps • • need to be able to jump to an absolute address sometime need to be able to do procedure calls and returns • • jump -- j 10000 => PC = 10000 jump and link -- jal 100000 => $31 = PC + 4; PC = 10000 – used for procedure calls • jump register -- jr $31 => PC = $31 – used for returns, but can be useful for lots of other things. CS141-L1-57 Tarun Soni, Summer ‘03 Jumps MIPS Instruction Formats 6 bits 5 bits 5 bits R OP rs rt I OP rs rt J OP 5 bits 5 bits 6 bits rd sa funct Immediate (16 bits) target MIPS Addressing Formats: Branches and Jumps • • Branch (e.g., beq) uses PC-relative addressing mode (few bits if addr typically close) uses base+displacement mode, with the PC being the base. Jump uses pseudo-direct addressing mode. 26 bits of the address is in the instruction, the rest is taken from the PC. instruction 6 CS141-L1-58 26 program counter 6 26 jump destination address Tarun Soni, Summer ‘03 MIPS Branch & Jump Instructions Instruction branch on equal branch on not eq. set on less than set less than imm. set less than uns. set l. t. imm. uns. jump jump register jump and link CS141-L1-59 Example Meaning beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch bne $1,$2,100 if ($1!= $2) go to PC+4+100 Not equal test; PC relative slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; 2’s comp. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; 2’s comp. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; natural numbers sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; natural numbers j 10000 go to 10000 Jump to target address jr $31 go to $31 For switch, procedure return jal 10000 $31 = PC + 4; go to 10000 For procedure call Tarun Soni, Summer ‘03 Stacks Stacking of Subroutine Calls & Returns and Environments: A A: CALL B B: A B A B A B CALL C C: RET RET C A Some machines provide a memory stack as part of the architecture (e.g., VAX) Sometimes stacks are implemented via software convention (e.g., MIPS) CS141-L1-60 Tarun Soni, Summer ‘03 Stacks Useful for stacked environments/subroutine call & return even if operand stack not part of architecture Stacks that Grow Up vs. Stacks that Grow Down: Next Empty? SP Last Full? c b a inf. Big 0 Little grows up grows down 0 Little inf. Big Memory Addresses Little --> Big/Last Full Little --> Big/Next Empty POP: Read from Mem(SP) Decrement SP POP: Decrement SP Read from Mem(SP) PUSH: Increment SP Write to Mem(SP) PUSH: Write to Mem(SP) Increment SP CS141-L1-61 Tarun Soni, Summer ‘03 Stack Frames High Mem ARGS Callee Save Registers Reference args and local variables at fixed (positive) offset from FP (old FP, RA) Local Variables FP Grows and shrinks during expression evaluation SP Low Mem • Many variations on stacks possible (up/down, last pushed / next ) • Block structured languages contain link to lexically enclosing frame • Compilers normally keep scalar variables in registers, not memory! CS141-L1-62 Tarun Soni, Summer ‘03 MIPS Software Register Conventions 0 zero constant 0 16 s0 callee saves 1 at . . . (caller can clobber) 2 v0 expression evaluation & 23 s7 3 v1 function results 24 t8 4 a0 arguments 25 t9 5 a1 26 k0 reserved for OS kernel 6 a2 27 k1 7 a3 28 gp Pointer to global area 8 t0 ... 15 t7 CS141-L1-63 reserved for assembler temporary (cont’d) temporary: caller saves 29 sp Stack pointer (callee can clobber) 30 fp frame pointer 31 ra Return Address (HW) Tarun Soni, Summer ‘03 MIPS Branch & Jump Instructions MIPS operands Name 32 regi sters 2 3 0 memory words Example $s0-$s7, $t0-$t9, $zero, $a0-$a3, $v0-$v1, $gp, $fp, $sp, $ra, $at Memory[0], Memory[4], ..., Memory[4294967292] Three operands; data in registers subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; data in registers add immediate addi $s1, $s2, 100 lw $s1, 100($s2) sw $s1, 100($s2) lb $s1, 100($s2) sb $s1, 100($s2) lui $s1, 100 $s1 = $s2 + 100 $s1 = Memory[ $s2 + 100] Memory[ $s2 + 100] = $s1 $s1 = Memory[ $s2 + 100] Memory[ $s2 + 100] = $s1 $s1 = 100 * 2 16 Used to add constants store word load byte store byte load upper immediate Condi ti onal branch Uncondi tional j ump CS141-L1-64 sequential words differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls. add load word Data transfer Accessed only by data transfer instructions. MIPS uses byte addresses, so MIPS assembly language Meani ng Example $s1 = $s2 + $s3 add $s1, $s2, $s3 Instructi on Category Arithmetic Comments Fast locations for data. In MIPS, data must be in registers to perform arithmetic. MIPS register $zero always equals 0. Register $at is reserved for the assembler to handle large constants. Comments Word from memory to register Word from register to memory Byte from memory to register Byte from register to memory Loads constant in upper 16 bits branch on equal beq $s1, $s2, 25 if ($s1 == $s2) go to PC + 4 + 100 Equal test; PC-relative branch branch on not equal bne $s1, $s2, 25 if ($s1 != $s2) go to PC + 4 + 100 Not equal test; PC-relative set on less than slt $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1; else $s1 = 0 Compare less than; for beq, bne set less than immediate slti if ($s2 < 100) $s1 = 1; else $s1 = 0 Compare less than constant jump j jr jal go to 10000 go to $ra $ra = PC + 4; go to 10000 Jump to target address jump register jump and link $s1, $s2, 100 2500 $ra 2500 For switch, procedure return For procedure call Tarun Soni, Summer ‘03 Example: Swap() • swap(int v[], int k); { int temp; temp = v[k] v[k] = v[k+1]; v[k+1] = temp; } Can we figure out the code? swap: muli $2, $5, 4 add $2, $4, $2 lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) jr $31 CS141-L1-65 // // // // // // // // $4=v, $5=k $2 = k*4 $2 = v+(4*k) $15=temp= *($2+0)=*(v+k) $16 = *($2+4) = *(v+k+1) *(v+k) = $16 = *(v+k+1) *(v+k+1) = $15 = temp return; Tarun Soni, Summer ‘03 Example: Leaf_procedure() • Procedures? int PairDiff(int a, int b, int c,int d); { int temp; temp = (a+b)-(c+d); return temp; } Assume caller puts $a0-$a3 = a,b,c,d and wants result in $v0 PairDiff: // sub $sp,$sp,12 // Make space for 3 temp locations sw $t1, 8($sp) // save $t1 (optional if MIPS convention) sw $t0, 4($sp) // save $t0 (optional if MIPS convention) sw $s0, 0($sp) // save $s0 add $t0,$a0,$a1 // (t0=a+b) add $t1,$a2,$a3 // (t1=c+d) sub $s0,$t0,$t1 // (s0=t0-t1) add $v0,$s0,$zero // store return value in $v0 lw $s0,0($sp) // restore registers lw $t0,4($sp) // (optional if MIPS convention) lw $t1,8($sp) // (optional if MIPS convention) add $sp,$sp,12 // ‘pop’ the stack jr $ra // The actual return to calling routine CS141-L1-66 Tarun Soni, Summer ‘03 Example: Nested_procedure() • • What about nested procedures? $ra ?? Recursive procedures? Assume $a0 = n fact: sub $sp,$sp,8 sw $ra, 4($sp) sw $a0, 4($sp) int fact(int n); { if(n<1) return(1); else return (n*fact(n-1)); } // // Make space for 2 temp locations // save return address // save argument n slt $t0,$a0,1 // test for n<1 beq $t0,$zero, L1 // if (n>=1) goto L1 add $v0,$zero,1 add $sp,$sp,8 jr $ra L1: sub $a0,$a0,1 jal fact; lw $a0,0($sp) lw $ra,4($sp) add $sp,$sp,8 mult $v0,$a0,$v0 jr $ra CS141-L1-67 // $v0=1 // ‘pop’ the stack // return // n--; // call fact again. // // // // // (n<1) case (n>=1) case fact() returns here. Restore n restore return address ‘pop’ stack $v0 = n*fact(n-1) return to caller Tarun Soni, Summer ‘03 Other Architectures • • • Design alternative: – provide more powerful operations (e.g., DSP, Encryption engines, Java Processors) – goal is to reduce number of instructions executed – danger is a slower cycle time and/or a higher CPI Sometimes referred to as “RISC vs. CISC” – virtually all new instruction sets since 1982 have been RISC – VAX: minimize code size, make assembly language easy instructions from 1 to 54 bytes long! We’ll look at PowerPC and 80x86 CS141-L1-68 Tarun Soni, Summer ‘03 Power PC • • • Indexed addressing – example: lw $t1,$a0+$s3 – What do we have to do in MIPS? // $t1=Memory[$a0+$s3] Update addressing – update a register as part of load (for marching through arrays) – example: lwu $t0,4($s3) // $t0=Memory[$s3+4];$s3=$s3+4 – What do we have to do in MIPS? Others: – load multiple/store multiple – a special counter register “bc Loop” decrement counter, if not 0 goto loop CS141-L1-69 Tarun Soni, Summer ‘03 x86: Volume is beautiful • • • • • • 1978: The Intel 8086 is announced (16 bit architecture) 1980: The 8087 floating point coprocessor is added 1982: The 80286 increases address space to 24 bits, +instructions 1985: The 80386 extends to 32 bits, new addressing modes 1989-1995: The 80486, Pentium, Pentium Pro add a few instructions (mostly designed for higher performance) 1997: MMX is added “This history illustrates the impact of the “golden handcuffs” of compatibility” “adding new features as someone might add clothing to a packed bag” “an architecture that is difficult to explain and impossible to love” “what the 80x86 lacks in style is made up in quantity, making it beautiful from the right perspective” CS141-L1-70 Tarun Soni, Summer ‘03 x86: Complex Instruction Set • • • See text for a detailed description…. Complexity: – Instructions from 1 to 17 bytes long – one operand must act as both a source and destination – one operand can come from memory – complex addressing modes e.g., “base or scaled index with 8 or 32 bit displacement” Saving grace: – the most frequently used instructions are not too difficult to build – compilers avoid the portions of the architecture that are slow CS141-L1-71 Tarun Soni, Summer ‘03 Comparing Instruction Set Architectures Design-time metrics: ° Can it be implemented, in how long, at what cost? ° Can it be programmed? Ease of compilation? Static Metrics: ° How many bytes does the program occupy in memory? Dynamic Metrics: ° How many instructions are executed? ° How many bytes does the processor fetch to execute the program? ° How many clocks are required per instruction? ° How "lean" a clock is practical? Best Metric: Time to execute the program! CPI This depends on instruction set, processor organization, and compilation techniques. Inst. Count CS141-L1-72 Cycle Time Tarun Soni, Summer ‘03 Instruction Set Architectures: What did we learn today? • • • MIPS is a general-purpose register, load-store, fixed-instruction-length architecture. MIPS is optimized for fast pipelined performance, not for low instruction count Four principles of IS architecture – simplicity favors regularity – smaller is faster – good design demands compromise – make the common case fast CS141-L1-73 Tarun Soni, Summer ‘03 Todays Agenda Administrivia Technology trends Computer organization: concept of abstraction Instruction Set Architectures: Definition, types, examples Instruction formats: operands, addressing modes Operations: load, store, arithmetic, logical Control instructions: branch, jump, procedures Stacks Examples: in-line code, procedure, nested-procedures Other architectures CS141-L1-74 Tarun Soni, Summer ‘03