EEE440 Computer Architecture Lecture 1: Introduction and Basics Dr. Haroon Ahmed Khan (Assistant Professor) 217, Academic Block 1 haroon.ahmed@comsats.edu.pk Course Description Fundamentals of computer design including current technology/cost trends, quantitative design principles, measuring & reporting performance, review of instruction set architecture, memory hierarchy design and performance issues including cache and virtual memory design, Instruction level parallelism (ILP) using hardware approaches to implement and software approaches to exploit it, Data level parallelism (DLP) in Vector, SIMD and GPU architectures & Thread level parallelism (TLP) with centralized, synchronized and distributed memory sharing. 2 Course Learning Outcomes (CLOs) At the completion of the course, the student should be: Understanding the fundamentals of computer architecture & Instruction Set Architectures (C2-PLO1) Analyse various memory architectures for improving performance (C4-PLO2) Compare various hardware and software approaches to improve instruction level parallelism (C4-PLO2) Analyse the techniques to implement data level and thread level parallelism (C4-PLO2) 3 Textbook Computer Architecture: A Quantitative Approach by Hennessy & Patterson Morgan & Kauffman Series (5th Edition or 6th Edition) Recommended Reading: 1. Digital Design and Computer Architecture by Harris. D and Harris, S, 2012, 2nd Edition 2. Modern Processor Design: Fundamentals of Superscalar Processors by Shen & Lipasti 4 View of an Architect SOFTWARE How does an assembly program end up executing as digital logic? What happens in-between? How is a computer designed using logic gates and wires to satisfy specific goals? Architect/microarchitect’s view: How to design a computer that meets system design goals. Choices critically affect both the SW programmer and the HW designer HARDWARE 5 The Power of Abstraction Levels of transformation create abstractions Abstraction improves productivity Abstraction: A higher level only needs to know about the interface to the lower level, not how the lower level is implemented E.g., high-level language programmer does not really need to know what the ISA is and how a computer executes instructions No need to worry about decisions made in underlying levels E.g., programming in Java vs. C vs. assembly vs. binary vs. by specifying control signals of each transistor every cycle Then, why would you want to know what goes on underneath or above? 6 Levels of Transformation “The purpose of computing is insight” (Richard Hamming) We gain and generate insight by solving problems How do we ensure problems are solved by electrons? Problem Algorithm Program/Language Runtime System (VM, OS, MM) ISA (Architecture) Microarchitecture Logic Circuits Electrons 7 Crossing the Abstraction Layers Two key goals of this course are to understand how a processor works underneath the software layer and how decisions made in hardware affect the software/programmer to enable you to be comfortable in making design and optimization decisions that cross the boundaries of different layers and system components 8 Move to multi-processor RISC Introduction Single Processor Performance Introduction Single Processor Performance Single Processor Performance 12 “Old” view of computer architecture: Instruction Set Architecture (ISA) design i.e. decisions regarding: registers, memory addressing, addressing modes, instruction operands, available operations, control flow instructions, instruction encoding “Real” computer architecture: Specific requirements of the target machine Design to maximize performance within constraints: cost, power, and availability Includes ISA, microarchitecture, hardware Defining Computer Architecture Defining Computer Architecture Personal Mobile Device (PMD) Desktop Computing Emphasis on availability, scalability, throughput Clusters / Warehouse Scale Computers Emphasis on price-performance Servers e.g. start phones, tablet computers Emphasis on energy efficiency and real-time Used for “Software as a Service (SaaS)” Emphasis on availability and price-performance Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks Embedded Computers Emphasis: price Classes of Computers Classes of Computers Crossing the Abstraction Layers As long as everything goes well, not knowing what happens in the underlying level (or above) is not a problem. What if What if The program you wrote is running slow? The program you wrote does not run correctly? The program you wrote consumes too much energy? The hardware you designed is too hard to program? The hardware you designed is too slow because it does not provide the right primitives to the software? What if You want to design a much more efficient and higher performance system? 15 Single Processor Architecture 16 Cannot continue to leverage Instruction-Level parallelism (ILP) New models for performance: Single processor performance improvement ended in 2003 as far as Instruction Set Architectures are concerned Data-level parallelism (DLP) Thread-level parallelism (TLP) Request-level parallelism (RLP) These require explicit restructuring of the application Introduction Current Trends in Architecture Multiprocessor Architecture(s) 18 Classes of parallelism in applications: Data-Level Parallelism (DLP) Task-Level Parallelism (TLP) Classes of parallelism in architecture : Instruction-Level Parallelism (ILP) Vector architectures/Graphic Processor Units (GPUs) One instruction on many data items in tandem Thread-Level Parallelism Pipelining and speculative execution Task level parallelism working with parallel threads of data Request-Level Parallelism Parallel tasks that are independent of on another Classes of Computers Parallelism Single instruction stream, single data stream (SISD) Single instruction stream, multiple data streams (SIMD) Multiple instruction streams, single data stream (MISD) Vector architectures Multimedia extensions Graphics processor units No commercial implementation Multiple instruction streams, multiple data streams (MIMD) Tightly-coupled MIMD Loosely-coupled MIMD Classes of Computers Flynn’s Taxonomy Classes of Computers Flynn’s Taxonomy Computer Architecture Today Today is a very exciting time to study computer architecture Industry is in a large paradigm shift (to multi-core and beyond) – many different potential system designs possible Many difficult problems motivating and caused by the shift Power/energy constraints multi-core? Complexity of design multi-core? Difficulties in technology scaling new technologies? Memory wall/gap Reliability wall/issues Programmability wall/problem Huge hunger for data and new data-intensive applications No clear, definitive answers to these problems 22 Computer Architecture Today (II) These problems affect all parts of the computing stack – if we do not change the way we design systems Many new demands from the top (Look Up) Problem Algorithm Program/Language Runtime System (VM, OS, MM) User Fast changing demands and personalities of users (Look Up) ISA Microarchitecture Many new issues at the bottom (Look Down) Logic Circuits Electrons No clear, definitive answers to these problems 23