Computer Architecture and Design – ECEN 350 Part 1 – Introduction Dr. G. Choi Dept. of Electrical and Computer Engineering [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson, P. Gratz, and others] Instructor Information: Dr. Gwan S. Choi Office 333G WERC Office Hours: TBA (tentative) (or by appointment) http://www.ece.tamu.edu/~gchoi/main.html Email: gwanchoi@gmail.com TA Information: Ehsan Rohani Office 332A WERC Office Hours: T 11- 12; R 10-11 (tentative) (or by appointment) http://people.tamu.edu/~ehsanrohani/ Email: ehsanohani@neo.tamu.edu Required textbook: Computer Organization and Design: The Hardware/Software Interface by Patterson and Hennessy. Morgan Kaufmann publishers, 4th Edition, Revised Printing, Morgan Kaufmann 2008 Other editions can be used as well Reference textbook: M. Ciletti. Starter's Guide to Verilog 2001 Course info Mailing list: Emails will be sent periodically to your TAMU email account Announcements: Lecture cancellations Deadline extension Updates, etc. Course information Some course information can be found at: http://people.tamu.edu/~ehsanrohani/ Syllabus, Lab assignments, HW assignments, lecture slides etc. Course description Computer architecture and design Use of register transfer languages and simulation tools to describe and simulate computer operation Central processing unit organization, Microprogramming; Input/output; Memory system architectures. Attention over time! t ~5 min Attention over time! t Labs 5 Assembly language labs 6 Hardware design labs Using SPIM simulator Verilog – Xilinx Webpack Can run from your home PC, using PCSPIM and Xilinx software. Labs First week’s lab covers orientation/procedures No recitation the first week No lab this Tuesday week Section # Lecture Time Lab Time Recitation Time 300 MWF: 12:00p - 1:15p TR: 12:20p - 2:10p F: 8:00a - 9:00a 301 MWF: 12:00p - 1:15p TR: 3:00p - 4:50p F: 8:00a - 9:00a 302 MWF: 12:00p - 1:15p TR: 5:00p - 6:50p F: 8:00a - 9:00a Assignments Assignments will be assigned most weeks. The purpose of the assignments is to prepare you for the midterm and the final exam. Grading scale A standard grading scale will be utilized. A 90-100% B 80-89% C 70-79% D 60-69% F Below 59% Full syllabus will be available on course website Course Goals Address topics such as What is a computer? How to program a computer? Assembly language programming How to build a computer? Verilog-based hardware design and verification Course Goals Main Goals Auxiliary Goals Course Goals Understand hardware architecture Learn design methodology Learn very general tools for the design of sophisticated digital systems Computer organization Application (ex: browser) Compiler Software Hardware Assembler Operating System (Mac OSX) Processor Memory I/O system ECEN 350 Instruction Set Architecture Datapath & Control Digital Design Circuit Design transistors * Coordination of many levels (layers) of abstraction Levels of Representation High Level Language Program (e.g., C) Compiler Assembly Language Program (e.g.,MIPS) Assembler Machine Language Program (MIPS) Machine Interpretation Hardware Architecture Description (Logic, Logisim, etc.) Architecture Implementation Logic Circuit Description (Logisim, etc.) Levels of Representation High Level Language Program (e.g., C) temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; Compiler Assembly Language Program (e.g.,MIPS) Assembler Machine Language Program (MIPS) Machine Interpretation Hardware Architecture Description (Logic, Logisim, etc.) Architecture Implementation Logic Circuit Description (Logisim, etc.) Levels of Representation High Level Language Program (e.g., C) Compiler Assembly Language Program (e.g.,MIPS) Assembler temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; lw lw sw sw Machine Language Program (MIPS) Machine Interpretation Hardware Architecture Description (Logic, Logisim, etc.) Architecture Implementation Logic Circuit Description (Logisim, etc.) $t0, 0($2) $t1, 4($2) $t1, 0($2) $t0, 4($2) Levels of Representation High Level Language Program (e.g., C) Compiler Assembly Language Program (e.g.,MIPS) Assembler Machine Language Program (MIPS) Machine Interpretation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; lw lw sw sw 0000 1010 1100 0101 Hardware Architecture Description (Logic, Logisim, etc.) Architecture Implementation Logic Circuit Description (Logisim, etc.) $t0, 0($2) $t1, 4($2) $t1, 0($2) $t0, 4($2) 1001 1111 0110 1000 1100 0101 1010 0000 0110 1000 1111 1001 1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111 Levels of Representation High Level Language Program (e.g., C) Compiler Assembly Language Program (e.g.,MIPS) Assembler Machine Language Program (MIPS) Machine Interpretation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; lw lw sw sw 0000 1010 1100 0101 Hardware Architecture Description (Logic, Logisim, etc.) Architecture Implementation Logic Circuit Description (Logisim, etc.) $t0, 0($2) $t1, 4($2) $t1, 0($2) $t0, 4($2) 1001 1111 0110 1000 1100 0101 1010 0000 0110 1000 1111 1001 1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111 Anatomy: Components of any Computer Computer Processor Control (“brain”) Datapath (“brawn”) Memory (where programs, data live when running) Devices Input Output Keyboard, Mouse Disk (where programs, data live when not running) Display, Printer Content MIPS instruction set Principles of computer architecture: CPU datapath and control unit design ALU Design Pipelined Datapath Memory hierarchies and design I/O organization and design Topics Introduction Computer organization Moore’s law Performance modeling Impact of advancing technology Operation of the computer hardware Topics Instruction Set Architectures (ISA) Representing instructions on the computer Arithmetical and logical instructions Memory access instructions Control flow instructions Function calls instructions Input-output instructions SPIM- instruction set simulator Topics Computer Arithmetic Signed and unsigned numbers Addition and subtraction Multiplication Division Floating point operations Topics Translating and starting a program Compilers, compiler optimization Object code generation, assemblers Linking Run-time execution environment Topics Performance evaluation CPU performance and its factors Performance metrics Performance factors Comparing performance SPEC benchmarks Topics Hardware Description Languages (HDL) Verilog hardware description language Design-Simulation Process Structural Designs in Verilog Behavioral HDL Description of Systems Topics Datapath and Control (5) ALU design Single-cycle implementation Multi-cycle implementation Microprogramming Topics Pipelining (5) Pipelined datapath Pipelined control Pipeline hazards: structural, control, data Hazard detection and resolution Exception handling Topics Memory Hierarchy Overview of SRAM and DRAM design Basic of caches Framework for memory hierarchy Measuring memory performance Peripherals Disk storage and dependability I/O devices and their interface to the processor Buses and other connections SPIM Assembler and Simulator SPIM is a self-contained assembler and simulator for the MIPS32 assembly language programs Provides a simple assembler, debugger and a simple set of operating system services Implements both a simple, terminal-style interface and a visual windowing interface SPIM Assembler and Simulator Available as xspim on unix, linux, and Mac OS X PCSpim on Windows QtSpim on both (we’ll use QtSpim in the lab) can be downloaded and installed on your own PC from www.cs.wisc.edu/~larus/spim.html Xilinx WebPack and ModelSim Allow users to enter digital logic designs, as either schematic or HDL, and simulate them. Xilinx ISE WebPACK, which includes Project Navigator, is used for design entry ModelSim is used for simulation Available from http://www.xilinx.com/tools/webpack.htm What you should already know How to write, compile and run programs in a higher level language (C, C++, Java, …) In this course we will use C as a high-level language How to represent and operate on positive and negative numbers in binary form (two’s complement, sign magnitude, etc.) Sample program float pow(float x, uint exp) { float result=1.0; int i; for (i=0; (i < exp); i++) { result = result * x; } return result; } int main(int argc, char **argv) { float p; p = pow(10.0, 5); printf(“p = %f\n”, p); return 0; } Sample C program fragment /* define an array of 10 chars */ char x[5] = {‘t’,’e’,’s’,’t’,’\0’}; /* accessing element 0 */ x[0] = ‘T’; /* pointer arithmetic to get elt 3 */ char elt3 = *(x+3); /* x[3] */ /* x[0] evaluates to the first element; * x evaluates to the address of the * first element, or &(x[0]) */ /* 0-indexed for loop idiom */ #define COUNT 10 char y[COUNT]; int i; for (i=0; i<COUNT; i++) { /* process y[i] */ printf(“%c\n”, y[i]); } What you should already know Logic design (ECEN 248) Design of combinatorial and sequential components Boolean algebra logic minimization decoders and multiplexors, latches and flipflops, registers, Mealy/Moore finite state machines, etc. The Underlying Technologies Year Technology Relative Perf./Unit Cost 1951 Vacuum Tube 1 1965 Transistor 35 1975 Integrated Circuit (IC) 900 1995 Very Large Scale IC (VLSI) 2,400,000 2005 Ultra VLSI 6,200,000,000 The PowerPC Introduced in 1999 3.65M transistors 366 MHz clock rate 40 mm2 die size 250nm technology Intel Pentium 4 4 Introduced in 2004 125M transistors 3.8 GHz clock 122 mm2 die 90nm tech Intel “Gulftown” i7 Introduced in 2010 1.17 billion transistors 3.3 GHz clock 248 mm2 die 32 nm tech 6 cores/12 threads # of transistors on an IC Technology Trends: Microprocessor Complexity 2X Transistors / Chip Every 1.5 years Called “Moore’s Law” Year Growth in processor performance Clock rate and power DRAM Capacity Impacts of Advancing Technology Processor logic capacity: increases about 30% per year performance: 2x every 1.5 years (slowing!) 100x performance in last decade Memory DRAM capacity: 4x every 3 years, about 60% per year memory speed: 1.5x every 10 years cost per bit: decreases about 25% per year Disk capacity: increases about 60% per year Computer Technology - Dramatic Change! Impacts of Advancing Technology State-of-the-art PC when you graduate: Processor clock speed: Memory capacity: Disk capacity: New units! Mega => Giga, Giga => Tera (Tera => Peta, Peta => Exa, Exa => Zetta Zetta => Yotta = 1024) 5000 MegaHertz (5.0 GigaHertz) 8000 MegaBytes (8.0 GigaBytes) 2000 GigaBytes (2.0 TeraBytes) Computer Organization and Design This course is all about how computers work But what do we mean by a computer? Different types: embedded, laptop, desktop, server Different uses: automobiles, graphics, finance, genomics… Different manufacturers: Intel, Apple, IBM, Sony, Sun… Different underlying technologies and different costs ! Best way to learn: Focus on a specific instance and learn how it works While learning general principles and historical perspectives Main focus Learn some of the big ideas in CS & engineering: 5 Classic components of a Computer Data can be anything (integers, floating point, characters): a program determines what it is Stored program concept: instructions just data Principle of Locality, exploited via a memory hierarchy (cache) Greater performance by exploiting parallelism Principle of abstraction, used to build systems as layers Compilation v. interpretation thru system layers Principles/Pitfalls of Performance Measurement Others Skills learned in ECEN 350 Learning C Assembly Language Programming If you know one, you should be able to learn another programming language largely on your own If you know C++ or Java, should be easy to pick up their ancestor, C This is a skill you will pick up, as a side effect of understanding the Big Ideas Hardware design We think of hardware at the abstract level, with only a little bit of physical logic to give things perspective Embedded Computers in You Car Growth of Sales of Embedded Computers Why Learn This Stuff? You want to call yourself a “computer scientist/engineer” You want to build HW/SW people use (so need performance) You need to make a purchasing decision or offer “expert” advice Both hardware and software affect performance Algorithm determines number of source-level statements Language/compiler/architecture determine the number of machine-level instruction (Chapter 2 and 3) Processor/memory determine how fast machine-level instructions are executed (Chapter 5, 6, and 7) What is a Computer? Components: processor (datapath, control) input (mouse, keyboard) output (display, printer) memory (cache (SRAM), main memory (DRAM), disk drive, CD/DVD) network Our primary focus: the processor (datapath and control) Implemented using millions of transistors Impossible to understand by looking at each transistor We need abstraction! Major Components of a Computer Below the Program High-level language program (in C) swap (int v[], int k) (int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; ) Assembly language program (for MIPS) swap: sll add lw lw sw sw jr $2, $5, 2 $2, $4, $2 $15, 0($2) $16, 4($2) $16, 0($2) $15, 4($2) $31 Machine (object) code (for MIPS) 000000 00000 00101 0001000010000000 000000 00100 00010 0001000000100000 . . . Below the Program High-level language program (in C) swap (int v[], int k) (int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; ) Assembly language program (for MIPS) swap: sll add lw lw sw sw jr $2, $5, 2 $2, $4, $2 $15, 0($2) $16, 4($2) $16, 0($2) $15, 4($2) $31 Machine (object) code (for MIPS) 000000 00000 00101 0001000010000000 000000 00100 00010 0001000000100000 . . . one-to-many C compiler one-to-one assembler Advantages of Higher-Level Languages ? Higher-level languages Allow the programmer to think in a more natural language and for their intended use (Fortran for scientific computation, Cobol for business programming, Lisp for symbol manipulation, Java for web programming, …) Improve programmer productivity – more understandable code that is easier to debug and validate Improve program maintainability Allow programs to be independent of the computer on which they are developed (compilers and assemblers can translate high-level language programs to the binary instructions of any machine) Emergence of optimizing compilers that produce very efficient assembly code optimized for the target machine As a result, very little programming is done today at the assembler level Machine Organization Capabilities and performance characteristics of the principal Functional Units (FUs) e.g., register file, ALU, multiplexors, memories, ... The ways those FUs are interconnected e.g., buses Logic and means by which information flow between FUs is controlled The machine’s Instruction Set Architecture (ISA) Register Transfer Level (RTL) machine description ISA Sales Major Components of a Computer: Dataflow Walkthrough Devices Processor Network Control Datapath Memory Input Output Below the Program High-level language program (in C) swap (int v[], int k) . . . Assembly language program swap: sll $2, $5, 2 add $2, $4, $2 lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) jr $31 (for MIPS) Machine (object) code (for MIPS) 000000 000000 100011 100011 101011 101011 000000 00000 00100 00010 00010 00010 00010 11111 00101 00010 01111 10000 10000 01111 00000 0001000010000000 0001000000100000 0000000000000000 0000000000000100 0000000000000000 0000000000000100 0000000000001000 C compiler assembler Below the Program High-level language program (in C) swap (int v[], int k) . . . Assembly language program swap: sll $2, $5, 2 add $2, $4, $2 lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) jr $31 (for MIPS) Machine (object) code (for MIPS) 000000 000000 100011 100011 101011 101011 000000 00000 00100 00010 00010 00010 00010 11111 00101 00010 01111 10000 10000 01111 00000 0001000010000000 0001000000100000 0000000000000000 0000000000000100 0000000000000000 0000000000000100 0000000000001000 C compiler assembler Input Device Inputs Object Code 000000 000000 100011 100011 101011 101011 000000 Devices Processor Network Control Datapath Memory Input Output 00000 00100 00010 00010 00010 00010 11111 00101 00010 01111 10000 10000 01111 00000 0001000010000000 0001000000100000 0000000000000000 0000000000000100 0000000000000000 0000000000000100 0000000000001000 Object Code Stored in Memory Memory Processor Control Datapath 000000 000000 100011 100011 101011 101011 000000 00000 00100 00010 00010 00010 00010 11111 00101 00010 01111 10000 10000 01111 00000 0001000010000000 0001000000100000 0000000000000000 0000000000000100 0000000000000000 0000000000000100 0000000000001000 Devices Network Input Output Processor Fetches an Instruction Processor fetches an instruction from memory Memory Processor Control Datapath 000000 000000 100011 100011 101011 101011 000000 00000 00100 00010 00010 00010 00010 11111 00101 00010 01111 10000 10000 01111 00000 0001000010000000 0001000000100000 0000000000000000 0000000000000100 0000000000000000 0000000000000100 0000000000001000 Devices Network Input Output Control Decodes the Instruction Control decodes the instruction to determine what to execute Devices Processor Network Control 000000 00100 00010 0001000000100000 Memory Input Datapath Output Datapath Executes the Instruction Datapath executes the instruction as directed by control Devices Processor Network Control 000000 00100 00010 0001000000100000 Memory Input Datapath contents Reg #4 ADD contents Reg #2 results put in Reg #2 Output What Happens Next? Memory Processor Control Datapath 000000 000000 100011 100011 101011 101011 000000 00000 00100 00010 00010 00010 00010 11111 00101 00010 01111 10000 10000 01111 00000 0001000010000000 0001000000100000 0000000000000000 0000000000000100 0000000000000000 0000000000000100 0000000000001000 Devices Network Input Output What Happens Next? Memory Processor Control Datapath 000000 000000 100011 100011 101011 101011 000000 00000 00100 00010 00010 00010 00010 11111 00101 00010 01111 10000 10000 01111 00000 0001000010000000 0001000000100000 0000000000000000 0000000000000100 0000000000000000 0000000000000100 0000000000001000 Fetch Exec Decode Devices Network Input Output Processor Fetches the Next Instruction Processor fetches the next instruction from memory Memory Processor Control Datapath 000000 000000 100011 100011 101011 101011 000000 00000 00100 00010 00010 00010 00010 11111 00101 00010 01111 10000 10000 01111 00000 0001000010000000 0001000000100000 0000000000000000 0000000000000100 0000000000000000 0000000000000100 0000000000001000 How does it know which location in memory to fetch from next? Devices Network Input Output Output Data Stored in Memory At program completion the data to be output resides in memory Processor Memory Devices Network Control Input Datapath 00000100010100000000000000000000 00000000010011110000000000000100 00000011111000000000000000001000 Output Output Device Outputs Data Devices Processor Network Control Datapath Memory Input Output 00000100010100000000000000000000 00000000010011110000000000000100 00000011111000000000000000001000 Processor Organization Control needs to have circuitry to Decide which is the next instruction and input it from memory Decode the instruction Issue signals that control the way information flows between datapath components Control what operations the datapath’s functional units perform Datapath needs to have circuitry to Execute instructions - functional units (e.g., adder) and storage locations (e.g., register file) Interconnect the functional units so that the instructions can be executed as required Load data from and store data to memory