Announcements groups homework 1st day reading (2.6-2.10) 2nd day readingInstruction (1.4) ** Bring green sheets Set Architecture or “How to talk to computers if you aren’t in Star Trek” Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. 1 What is Computer Architecture? Computer Architecture = What the machine Machine Organization + looks like Instruction Set Architecture How you talk to the machine 2 How to speak computer? lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) 1000110001100010000000000000000 1000110011110010000000000000100 1010110011110010000000000000000 1010110001100010000000000000100 1 2 ALUOP[0:3] <= InstReg[9:11] & MASK 3 temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; 4 Selection High Level Assembly Language Machine Language Control Signal Spec A 4 2 1 3 B 4 1 2 3 C 3 1 2 4 D 3 2 1 4 E None of the above 3 How to Speak Computer High Level Language Program Discuss what classes cover these parts + what $15, 0($2) $16, they 4($2)will do in the lab Compiler lw lw sw $16, sw $15, Assembly Language Program 0($2) 4($2) Assembler 1000110001100010000000000000000 1000110011110010000000000000100 1010110011110010000000000000000 1010110001100010000000000000100 Machine Language Program Machine Interpretation Control Signal Spec ALUOP[0:3] <= InstReg[9:11] & MASK 4 The Instruction Set Architecture ° is the agreed-upon interface between all the software that runs on the machine and the hardware that executes it. Application Operating System Compiler Instr. Set Proc. I/O system Instruction Set Architecture Digital Design Circuit Design More in Chapter 2 Talk about impact of legacy code. 5 The Instruction Set Architecture • that part of the architecture that is visible to the programmer - opcodes (available instructions) number and types of registers instruction formats storage access, addressing modes exceptional conditions You will make this in the lab! 6 Which of the following statement is generally true about ISAs? Select Statement A B Many models of processors can support one ISA. An ISA is unique to one model of processor. C Every processor supports multiple ISAs. D Each processor manufacturer has its own unique ISA. E None of the above 7 Examples of ISAs • • • • • • • • • Intel 80x86/pentium VAX MIPS SPARC Alpha AXP IBM 360 Intel IA-64 (Itanium) PowerPC IBM Cell SPE Main point – many machines map to 1 ISA 8 Computer Organization • Once you have decided on an ISA, you must decide how to design the hardware to execute those programs written in the ISA as fast as possible - (or as cheaply as possible, or using as little power as possible, …). • This must be done every time a new implementation of the architecture is released, with typically very different technological constraints. 9 You’ve learned ISAs – now we’re focusing on DESIGNING ISAs Key ISA decisions destination operand • operations how many? which ones • operands how many? location types how to specify? • instruction format operation y=x+b source operands (add r1, r2, r5) how does the computer know what 0001 0100 1101 1111 means? size how many formats? 10 Your architecture supports 16 instructions and 16 registers (0-15). You have fixed width instructions which are 16 bits. How many register operands can you specify (explicitly) in an add instruction? Selection operands A <= 1 B <= 2 C <= 3 D <= 4 E None of the above LAB!!! (Specify vs. have --- implicit) Just have them discuss for ~1 min. 11 Your architecture supports 16 instructions and 32 registers (0-31). You have fixed width instructions which are 16 bits. How many register operands can you specify (explicitly) in an add instruction? Selection operands A <= 1 B <= 2 C <= 3 D <= 4 E None of the above LAB!!! Just have them discuss for ~1 min. Also point out impact of 32 instructions and MIPS funct trick 12 Instruction Formats -what does each bit mean? • Having many different instruction formats... • complicates decoding • uses more instruction bits (to specify the format) VAX 11 instruction format Serial decoding 13 MIPS Instruction Formats 6 bits 5 bits opcode rs rt opcode rs rt opcode 5 bits 5 bits 5 bits 6 bits rd sa funct immediate target Would be really hard to have less than three Much more than three makes decoding more serial, again 14 MIPS Instruction Formats 6 bits 5 bits opcode rs rt opcode rs rt opcode R-type A B C D E 5 bits 5 bits 5 bits 6 bits rd sa funct immediate target I-type addi sw addi sub add sw add sub None of the above J-type jr jal jr jal 15 Convert this MIPS machine instruction to assembly: ISOMORPHIC A 0010 0001 0001 0000 0000 0000 0010 0010 Selection A B C D E Instruction addi r16, r8, #34 addi r8, r16, #34 sub r8, r16, r31 sub r31, r8, r16 None of the above 16 Convert this MIPS machine instruction to assembly: ISOMORPHIC C 1000 1101 0001 0000 0000 0000 0000 1000 Selection A B C D E Instruction jr r16 jr r8 lw r16, 8 (r8) lw r8, 8 (r16) None of the above 17 Convert this MIPS machine instruction to assembly: A is correct 0000 0001 0001 0000 1111 1000 0010 0100 Selection A B C D E Instruction and r31, r8, r16 and r8, r16, r31 add r8, r16, r31 sub r31, r8, r16 None of the above 18 Accessing the Operands There are typically two locations for operands – registers (internal storage - $t0, $a0) and memory. In each column we have which - reg or mem - is better. Which row is correct? A Faster access Mem Fewer bits to More specify locations Mem Reg B Mem Reg Mem C Reg Mem Reg D Reg Reg Mem E None of the above 19 Explain each – but then point out how this leads to load/store Load-store architectures can do: add r1=r2+r3 and load r3, M(address) can’t do add r1 = r2 + M(address) forces heavy dependence on registers, which is exactly what you want in today’s CPUs - more instructions + fast implementation (e.g., easy pipelining) Which is it that we care about? Why else can’t do address? (hint – fixed instruction length) 20 How Many Operands? Basic ISA Classes Accumulator: 1 address Stack: 0 address add A acc acc + mem[A] Talk through each one – just describe what it is. Not in the text – but on the CD. Be sure to mention stacks tos often use internal registers add tos + next General Purpose Register: 2 address 3 address add A B add A B C EA(A) EA(A) + EA(B) EA(A) EA(B) + EA(C) add Ra Rb Rc load Ra Rb store Ra Rb Ra Rb + Rc Ra mem[Rb] mem[Rb] Ra Load/Store: 3 address 21 A = BC+BY Stack Push B pushY mult pushb pushC mult add popA Acc Load B multY store temp Load B Mult C add temp store A Reg-Mem R1=B*Y R2=B*C A=R1+R2 Reg-Reg R1=B R2=C R3=Y R4=R1*R2 R5=R1*R3 R6=R4+R5 A=R6 22 A = BC+XY ISOMORPHIC Stack Acc Reg-Mem Reg-Reg push B Load B R1 = B*C R1 = B push C Mult C R2 = X*Y R2 = C push X Store temp A = R1+R2 R3 = X push Y Load X R4 = Y Mult Mult Y R5 = R1*R2 Mult Add temp R6 = R3*R4 add Store A R7 = R5+R6 pop A A = R7 In an alternative universe, memory is VERY SLOW to access relative to registers (internal storage). Which ISA would you most likely find in this universe? A. Stack B. Accumulator Explain that these are C. Reg-Reg sample codes – but the D. Accumulator and Reg-mem question is in general 23 E. Stack and Accumulator A = BC+XY Stack Acc Reg-Mem Reg-Reg pushB Load B R1 = B*C R1 = B pushC Mult C R2 = X*Y R2 = C push X Store temp A = R1+R2 R3 = X push Y Load X R4 = Y Mult Mult Y R5 = R1*R2 Mult Add temp R6 = R3*R4 add Store A R7 = R5+R6 pop A A = R7 In an alternative universe – registers (internal storage) are very expensive, and memory is not as slow. Which ISA would you most likely find in this universe? A.Stack B.Accumulator C.Reg-Reg D.Reg-Mem and Stack E.Both Reg-Reg and Reg-Mem 24 Option Example/description 1 Memory[R7] 2 R7 3 4 5 A B C D E Match the addressing mode with its example: I’m not a big fan of memorizing “terms” but Memory[R7 + 1000] these are used frequently Memory[1000] enough that you need to know them. Immediate Register Memory Register Base + Direct Direct Indirect displacement 1 2 3 4 5 3 4 5 1 2 3 1 2 4 5 2 4 5 1 3 25 None of the above 1000 MIPS addressing modes give flexibility immediate OP Get reg ind and absolute for free rs rt add $1, $2, #35 rs rt base + displacement lw $1, disp($2) (R1 = M[R2 + disp]) immediate immediate register indirect disp/immediate = 0 absolute (rs) = 0 26 Memory Organization • • • Viewed as a large, single-dimension array, with an address. A memory address is an index into the array Byte addressing “______________" means that the index points to a byte of memory. 0 1 2 3 4 5 6 ... 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 27 Processor X is 16 bit byte-addressable. If you have a pointer at address 0000 0000 0000 1000 and you increment it by one (0000 0000 0000 1001). What does the new pointer (0000 0000 0000 1001) point to, relative to the original pointer (0000 0000 0000 1000)? ISOMORPHIC A) The next word in memory B is correct B) The next byte in memory C) Either the next word or byte – depends on if you use that address for a load byte or load word D) Pointers are a high level construct – they don’t make sense pointing to raw memory addresses. E) None of the above. 28 Processor Y is 14 bit word-addressable. If you have a pointer at address 00 0000 0000 1000 and you increment it by one (00 0000 0000 1001). What does the new pointer (00 0000 0000 1001) point to, relative to the original pointer (00 0000 0000 1000)? ISOMORPHIC A) The next word in memory A correct B) The next byte in memory C) Either the next word or byte – depends on if you use that address for a load byte or load word D) Pointers are a high level construct – they don’t make sense pointing to raw memory addresses. E) None of the above. 29 Reading Quiz Variant • You have the following code in C: for(int i = 0; i< 10;i++){ A[i]=i; } Where A is an array of shorts (16 bit/half-word). Let's suppose that the base address of A is in $s0 and has the value of 1000 (base ten). What byte address(es) correspond to A[5]? Selection Byte(s) A 1005 B 1010 C 1005-1006 D 1010-1011 E None of the above Byte word addressing can be problematic to think about – the book Likes to jump around between them. I’ll try to be consistent in class. 30 Memory Organization • • Bytes are nice, but most data items use larger "words" For MIPS, a word is 32 bits or 4 bytes. 0 4 8 12 • • • 32 bits of data 32 bits of data 32 bits of data Registers hold 32 bits of data 32 bits of data 232 bytes with byte addresses from 0 to 2^32-1 230 words with byte addresses 0, 4, 8, ... 2^32-4 Words are aligned i.e., what are the least 2 significant bits of a word address? 31 MIPS Review All our processors will be mips based – so just wanna go over it. 32 The MIPS ISA, so far • • • • fixed 32-bit instructions 3 instruction formats 3-operand, load-store architecture 32 general-purpose registers (integer, floating point) - R0 always equals 0. • • • 2 special-purpose integer registers, HI and LO, because multiply and divide produce more than 32 bits. registers are 32-bits wide (word) register, immediate, and base+displacement addressing modes 33 What’s left • which instructions? • odds and ends 34 Which instructions? • • • • • arithmetic logical data transfer conditional branch unconditional jump 35 Which instructions (integer) • arithmetic - add, subtract, multiply, divide • logical - and, or, shift left, shift right • data transfer - load word, store word 36 Control Flow • • • Jumps Procedure call (jump subroutine) Conditional Branch - Used to implement, for example, if-then-else logic, loops, etc. • A conditional branch must specify two things - Condition under which the branch is taken - Location that the branch jumps to if taken (target) 37 What form of addressing is used by branch instructions? Addressing Best explanation Mode A Absolute Branch instructions require a full 32-bit address to know the branch target B Absolute A 32-bit immediate gives us enough bits to specify a full address C Relative Branches tend to be backward branches which require a negative immediate Branch targets tend to be close to the branch instruction We can load a 32 bit full address in a register – which lets us branch anywhere D Relative E Register Indirect Mention C isn’t incorrect – and branches tend to be backward, but if they weren’t close – we couldn’t do relative 38 High level code often has code like this: if(i<j) { i++; } Assume $t0 has i and $t1 has j. (slt rd, rs, rt does: R[rd]=1 if R[rs]<R[rt], else R[rd] = 0.) Which of the following is the correct translation of the above code to MIPS assembly (recall $zero is always 0): slt $t2, $t0, $t1 bne $t2, $zero, false addi $t0, $t0, 1 false: next instruction A slt $t2, $t0, $t1 beq $t2, $zero, false addi $t0, $t0, 1 false: next instruction D slt $t2, $t1, $t0 bne $t2, $zero, true true: addi $t0, $t0, 1 next instruction B slt $t2, $t1, $t0 beq $t2, $zero, false addi $t0, $t0, 1 false: next instruction C Mention useless bne in B D is correct None of the above E 39 Jump instructions have a 26-bit immediate. Since an address must be word aligned we can always use 00 as the lowest bits. To make a 32 bit address, what are the top 4 bits? Top 4 bits Why (best answer) A 0000 All instructions are at low addresses B pc[3:0] Low order bits of the current program counter best represent the high bits of the new pc C pc[31:28] High order bits of the current program counter since most jumps go to (relatively) nearby instructions D pc[31:28] All instructions must be in the same 256MB segment. E None of the above 40 What is the most common use of a jal instruction and why? Most common use Best answer A Procedure call Jal stores the next instruction in your current function so the called function knows where to return to. B Procedure call Jal enables a long jump and most procedures are a fairly long distance away C If/else Jal lets you go to the if while storing pc+4 (else) Jal enables a long branch and most if statements are a fairly long distance away D If/else E None of the above Mention jr 41 To summarize: MIPS operands Name 32 registers 2 30 memory words Category Arithmetic Example $s0-$s7, $t0-$t9, $zero, $a0-$a3, $v0-$v1, $gp, $fp, $sp, $ra, $at Memory[0], Memory[4], .. ., Memory[4294967292] add Three operands ; data in regis ters s ubtract sub $s1, $ s2, $s3 $s1 = $s2 - $s3 Three operands ; data in regis ters add immediate load word addi $s1, $s2, 100 lw $s1, 1 00($s2) sw $s1, 1 00($s2) lb $s1, 1 00($s2) sb $s1, 1 00($s2) lui $s1, 1 00 $s1 = $s2 + 100 $s1 = Memory[ $s2 + 100] Memory[ $s2 + 100] = $s1 $s1 = Memory[ $s2 + 100] Memory[ $s2 + 100] = $s1 $s1 = 100 * 2 16 Us ed to add constants Word from memory to regis ter load by te s tore by te load upper immediate Unconditional jump Comments Word from regis ter to memory Byte from memor y to register Byte from register to memory Loads c onstant in upper 16 bits beq $s1, $s2, 25 if ($s1 == $s2 ) go to PC + 4 + 100 Equal tes t; PC-relative branch branch on not equal bne $s1, $s2, 25 if ($s1 != $s2 ) go to PC + 4 + 100 Not equal test; PC -relativ e $s1, $s2, $s3 if ($s2 < $s3 ) $s1 = 1; els e $s1 = 0 Compare less than; for beq, bne branch on equal Conditional branch Acc es sed only by data trans fer instruc tions. MIPS us es byte addres ses, s o s equential words differ by 4. Memory holds data struc tures , suc h as array s, and s pilled regis ters, s uc h as thos e saved on procedure calls. MIPS assemb ly language Meaning Example $s1 = $s2 + $s3 add $s1, $ s2, $s3 Instruction s tore word Data transfer Comments Fas t locations for data. In MIPS, data mus t be in registers to perform arithmetic. MIPS register $z ero alway s equals 0. Regis ter $at is reserved for the as sembler to handle large constants . s et on less than slt s et les s than immediate slti jump j jr jal jump register jump and link $s1, $s2, 100 if ($s2 < 100 ) $s1 = 1; Compare less than c onstant els e $s1 = 0 2500 $ra 2500 J ump to target addres s go to 10000 For switc h, proc edure return go to $ra $ra = PC + 4; go to 10000 For procedure c all 42 Review -- Instruction Execution in a CPU Memory2 Registers10 R0 R1 R2 R3 R4 R5 ... 0 36 60000 45 198 12 ... CPU Program Counter address10: Watch lw typo On cheat sheet 10000 10001100010000110100111000100000 10004 00000000011000010010100000100000 10000 Instruction Buffer 57 op rs rt rd shamt 80000 00000000000000000000000000111001 immediate/disp in1 operation in2 ALU out addr Load/Store Unit data What happens when you execute the next instruction? A. PC=10004 B. R[R3] = 80000 C. R[R3] = 57 D. Both A and B E. Both A and C 43 Review -- Instruction Execution in a CPU Memory2 Registers10 R0 R1 R2 R3 R4 R5 ... 0 36 60000 45 198 12 ... op CPU UPDATE! 10004 and R3=57 Program Counter address10: 10000 10001100010000110100111000100000 10004 00000000011000010010100000100000 10000 Instruction Buffer rs rt rd shamt 80000 00000000000000000000000000111001 immediate/disp in1 operation in2 ALU out addr Load/Store Unit data What happens when you execute the next instruction? A. PC=10008 B. R[R5] = 93 C. R[R3] = 48 D. Both A and B E. Both A and C 44 Key Points • • • • MIPS is a general-purpose register, loadstore, fixed-instruction-length architecture. MIPS is optimized for fast pipelined performance, not for low instruction count Historic architectures favored code size over parallelism. MIPS most complex addressing mode, for both branches and loads/stores is base + displacement. 45