Document

Chapter 1 An Introduction to Processor Design 부산대학교 컴퓨터공학과 1.1 Processor Architecture & Organization  All modern general-purpose computers employ “stored program concept”    IAS computer by von Neumann at Princeton Institute for Advanced Studies (in 1946) First implemented in ‘Baby Machine’ at Univ. of Manchester, England (in 1948) [Figure 1.1] The state in a stored-program digital computer FF.. FF16 ins truc tions regis ters addres s dat a processor ins truc tions and data 2015-04-09 memory PNU Computer Eng. 00. .0016 2 1.1 Processor Architecture & Organization  50 years of development:     performance of processors h cost i  cost-effective computers (principles of operation not changed much) Most of improvements:  Advances in technology of electronics   New insights:     2015-04-09 Vacuum tubes -> transistors -> ICs -> VLSI Virtual memory (early 1960s) Cache memory Pipelining RISC PNU Computer Eng. 3 1.2 Abstraction in Hardware Design Transistors (elementary component)   Logically act as inverters Logic gates   CMOS NAND gate (using 4 trs)     If A = B = Vdd, output = Vss If either A or B (or both) = Vss, output =Vdd => output = not(A.B) Transistor circuit, logic symbol, truth table Vdd A A.B A B out put A B Output 0 0 1 0 1 1 1 0 1 1 1 0 B Vss 2015-04-09 Logic sy mbol PNU Computer Eng. Truth table 4 1.2 Abstraction in Hardware Design  The gate abstraction    Simplify the process of designing circuits with great number of trs Removes the need to know that the gate is built from trs Free from implementation technology in function level    Eg. Field effect tr, bipolar tr, etc. However, performance difference exists Levels of abstraction         Trs Gates, memory cells Adder, MUX, decoder, registers ALUs, shifters, memory blocks Processors, peripherals, memories ICs PCBs PCs, controllers, mobile phones 2015-04-09 PNU Computer Eng. 5 1.3 MU0 – a simple processor  A simple form of processor can be built from a few basic components       PC (program counter) ACC (accumulator) ALU (arithmetic-logic unit) IR (instruction register) Instruction decoder, control logic The MU0 instruction set   A 16-bit machine with a 12-bit address space (4K x 2 bytes: 8K bytes memory) Instructions: 16 bits long (op: 4 bits, address field: 12 bits) 4 bits opcode 2015-04-09 12 bits S PNU Computer Eng. 6 1.3 MU0 – a simple processor  [Table 1.1] The MU0 instruction set Instruction Opcode Effect LDA S 0000 ACC := mem 16[S] STO S 0001 mem 16[S] := ACC ADD S 0010 ACC := ACC + mem 16[S] SUB S 0011 ACC := ACC - mem 16[S] JMP S 0100 PC := S JGE S 0101 if ACC >= 0 PC := S JNE S 0110 if ACC !=0 PC := S STP 0111 stop 2015-04-09 PNU Computer Eng. 7 1.3 MU0 – a simple processor  Datapath   A register transfer level (RTL) design style based on registers, MUXs, and so on [Figure 1.5] MU0 datapath example address bus PC control IR memory ALU ACC data bus 2015-04-09 PNU Computer Eng. 8 RTL level design   [Figure 1.6] MU0 register transfer level organization Control signals:  enables on all of regs function select lines to ALU select control lines for two MUXs control for a tri-state driver to send ACC value to memory MEMrq (memory request) RnW (read/write control lines) 2015-04-09 PNU Computer Eng.      9 1.4 Instruction set design   To build a high-performance processor (beyond MU0 inst. set), inst. set design is important. 4 address insts (the most general form)  Ex) add d, s1, s2, next_i; d := s1 + s2 f bits n bits f unc tion op 1 addr.  n bits op 2 addr. n bits n bits des t. addr. nex t_i addr. 3 address insts   Make address of the next inst. implicit using PC (except for branch) Ex) add d, s1, s2; d := s1 + s2 f bits n bits f unction op 1 addr. 2015-04-09 n bits op 2 addr. n bits dest. addr. PNU Computer Eng. 10 1.4 Instruction set design  2 address insts   Make destination reg. the same as one of source reg. Ex) add d, s1; d := d + s1 f bits n bits f unct ion op 1 addr.  n bits des t. addr. 1 address insts   AC is used as destination Ex) add s1; AC := AC + s1 f bits n bits f unction op 1 addr.  0 address insts (using a stack)  Ex) add; tos := tos + next on stack f bits f unc tion 2015-04-09 PNU Computer Eng. 11 1.4 Instruction set design  Addressing modes        Immediate addressing: immediate data Absolute addressing: inst. contains full address for data Indirect addressing: inst. contains address of location that contains address of data Register addressing: data is in a reg. Register indirect addressing Index addressing Stack addressing 2015-04-09 PNU Computer Eng. 12 1.4 Instruction set design  Control flow instructions     Subroutine calls & returns System calls   Branch, jump Conditional branch Branch to an operating system routine Exceptions  Error handling 2015-04-09 PNU Computer Eng. 13 1.5 Processor design trade-offs  CISC vs RISC  CISC     To reduce semantic gap b/w high level language & machine instruction Complex sequence of operations Make compiler’s job easy RISC    ARM’s middle name: from RISC Reducing semantic gap is not the right way to make an efficient computer [Table 1.3] Typical dynamic instruction usage Instruction type 2015-04-09 Dynamic usage Data movement Control flow 43% 23% Arithmetic operations 15% Comparisons 13% Logical operations 5% Other 1% PNU Computer Eng. 14 1.5 Processor design trade-offs    Data movement b/w regs and memory: almost half Control flow such as branches & procedure calls: almost quarter Arithmetic operations: only 15%  Complex arithmetic insts do not help much  The most important tech: pipelining, cache memory  To make processors go faster 2015-04-09 PNU Computer Eng. 15 1.5 Processor design trade-offs  Pipelines 1. 2. 3. 4. 5. 6.  Fetch Decode REG: get operands from register bank ALU MEM: access memory for an operand, if necessary RES: write result back to register bank [Figure 1.13] Pipelined instruction execution 1 f et ch dec 2 3 instruction reg f et ch dec ALU mem res reg f et ch dec ALU mem res reg ALU mem res time 2015-04-09 PNU Computer Eng. 16 1.5 Processor design trade-offs  Pipeline hazards  Read after write hazard (data hazard)   1 Result from one inst is used as an operand by the next inst => inst2 must stall until the result is available [Figure 1.14] Read-after-write pipeline hazard f etch dec 2 reg ALU mem res f etch dec stall reg ALU mem res instruction time 2015-04-09 PNU Computer Eng. 17 1.5 Processor design trade-offs  Branch hazard  Solution:     Compute branch target earlier (if possible) The target may be computed speculatively Delayed branch [Figure 1.15] Pipelined branch behavior 1 (branch) f et ch dec 2 reg f et ch dec 3 ALU mem res reg f et ch dec 4 5 (br anch tar get) ALU mem res reg f et ch dec ALU mem res reg f et ch dec ALU mem res reg ALU mem res instruction time  Pipeline efficiency  2015-04-09 The deeper the pipeline, the worse the problems get: RISC approach is better PNU Computer Eng. 18 1.6 RISC   In 1980, Patterson: RISCI project RISCI arch   Fixed (32-bit) inst size with few formats Load-store arch:     RISCI organization     Insts that process data operate only on regs Separate insts to access memory A large register bank (32 32-bit regs) to allow load-store arch to operate efficiently Hard-wired inst decode logic Pipelined execution Single cycle execution RISCI advantages    A smaller die size A shorter development time A higher performance (controversial) 2015-04-09 PNU Computer Eng. 19

Document

Related documents

Products

Support

Document

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib