TMF1214/TMC1214 Computer Architecture (Semester 2 2017/2018) Introduction to Computer Architecture Reference: Chapter 1 William Stallings Computer Organization and Architecture 8th Edition 1 Computer System: User’s View Image: http://www.coolnerds.com 2 Computer System Components: High Level View Input Keyboard Mouse Microphone Computer System unit Output Monitor Speaker 3 Architecture & Organization • Architecture is those attributes visible to the programmer —Instruction set, number of bits used for data representation, I/O mechanisms, addressing techniques. —e.g. Is there a multiply instruction? • Organization is how features are implemented —Control signals, interfaces, memory technology. —e.g. Is there a hardware multiply unit or is it done by repeated addition? Architecture & Organization (Cont...) • All Intel x86 family share the same basic architecture • The IBM System/370 family share the same basic architecture • This gives code compatibility —At least backwards • Organization differs between different versions Structure & Function • Structure is the way in which components relate to each other • Function is the operation of individual components as part of the structure 6 Function • All computer functions are: —Data processing —Data storage ->even if the computer processing data on the fly (eg data come in and get processed, and result go out immediately)<- the computer must temporarily store at least those pieces of data that are being worked on at any given moment —Data movement->I/O vs data communication —Control-> to control all the functions (outside/within the pc) individual(s) & CU 7 Functional view Johari Abdullah, FCSIT 8 Operations (1) Data movement Transferring data from one peripheral or communication line to another 9 Operations (2) Storage Transferring data from external environment to computer storage (read) and vice versa (write) 10 Operation (3) Processing from/to storage Data processing involving data in storage 11 Operation (4) Processing from storage to I/O Data processing involving en route data between storage and external environment 12 Structure - Top Level (The Computer) Computer Central Processing Unit (CPU) Computer Main Memory Systems Interconnection/Bus Input Output 13 Computer Structure • CPU: control the operation of the computer and performs its data processing functions, often simply referred to as processor. • Main Memory: Stores data • I/O: Move data between the computer and its external environment • System Interconnection: mechanism that provides for communication among CPU, main memory and I/O. Example: System bus (consisting of a number of conducting wires to attach all the other components) 14 Structure - The CPU CPU Computer Arithmetic and Logic Unit (ALU) Registers I/O System Bus Memory CPU Internal CPU Interconnection Control Unit 15 The CPU Structure • Control Unit: controls the operation of the CPU and hence the computer • ALU: Performs the computer’s data processing functions • Registers: Provides internal storage to the CPU • CPU interconnection/Internal bus: Some mechanism that provides for communication among the control unit, ALU and registers. 16 Structure - The Control Unit Control Unit CPU Sequencing Logic ALU Internal Bus Registers Control Unit Control Unit Registers and Decoders Control Memory 17 Recall Computer System: User’s View Image: http://www.coolnerds.com 18 Recall Computer System Components: High Level View Input Keyboard Mouse Microphone Computer System unit Output Monitor Speaker 19 CPU Motherboard 20 Computer Components: Interconnection I/O CPU MEMORY 21 CPU 22 CPU Organization Registers ALU CU 23 Memory I/O CPU MEMORY address content 0000000000 01010101010010101 0000000001 01110101010010101 1111111110 01010101011110101 1111111111 11010111010010101 24 Input/Output CPU I/O Module I/O Devices 25 Computer Systems Hierarchy A digital computer solves problems by carrying out instructions Results Instructions Computer A program: A sequence of instructions describing how to perform a certain task. 26 Computer Systems Hierarchy Human Language Difficult to implement Interpretation/Translation Machine Language Computer 27 Computer Systems Hierarchy Human Language Machine-like/Human-like language Interpretation/Translation Machine Language Computer 28 Computer Systems Hierarchy Programmers High-level language - C++, Java VB Assembly language OS - UNIX, Windows NT Systems programmers Instruction sets - Pentium, PowerPC Micro programs Hardware 29 TMC1214/TMC1213 Computer Architecture (Semester 2 2017/2018) Computer Evolution and Performance Reference: Chapter 2 William Stallings Computer Organization and Architecture 8th Edition 30 A (Very) Brief History of Computers The first Generation - Vacuum Tubes (1945 -1955) ENIAC (1943 - 1946) Intended for calculating range tables of aiming artillery Consisted more than 18000 vacuum tubes, 1500 square feet of floor space, weight 30 tons, consumed 140 KW Decimal machine Each digit represented by a ring of 10 vacuum tubes. Designed for artillery range table, but used to perform complex calculations to help determine the feasibility of hydrogen bomb - general purpose computer Programmed with multi-position switches and jumper cables. John von Neumann (1945 -1952) more later … Originally a member of the ENIAC development team. First to use binary arithmetic Architecture consists of : Memory, ALU, Program control, Input, Output Stored-program concept - main memory store both data and instructions 31 A (Very) Brief History of Computers (Cont…) Vacuum Tubes ENIAC 32 A (Very) Brief History of Computers (Cont…) The Second Generation - Transistors (1955 -1965) Transistors Transistor was invented in 1948 at Bell Labs by John Barden, Walter Brattain and William Shockley TX-0 (Transistorised eXperimental computer 0), first transistor computer, build at MIT Lincoln Labs DEC PDP-1, first affordable microcomputer ($120,000), performance half that of IBM 7090 (the fastest computer in the world at that time, which cost millions) PDP-8, cheap ($16,000), the first to use single bus 33 A (Very) Brief History of Computers (Cont…) The Third Generation - Integrated Circuits (1965 -1980) IBM System/360 Family of machines with same assembly language Designed for both scientific and commercial computing First to allowed microprogramming Very popular with universities 34 A (Very) Brief History of Computers (Cont…) The Fourth Generation – VLSI (1980- ?) • Very Large Scale Integration (VLSI) is the process of creating integrated circuits by combining thousands of transistors into a single chip • Led to PC revolution • High performance, low cost 35 Generations of Computer (Technology) • Vacuum tube - 1946-1957 • Transistor - 1958-1964 • Small scale integration - 1965 on —Up to 100 devices on a chip • Medium scale integration - to 1971 —100-3,000 devices on a chip • Large scale integration - 1971-1977 —3,000 - 100,000 devices on a chip • Very large scale integration - 1978 -1991 —100,000 - 100,000,000 devices on a chip • Ultra large scale integration – 1991 —Over 100,000,000 devices on a chip Moore’s Law Moore’s Law Computers double in power roughly every two years, but cost only half as much 37 Moore’s Law • • • • Increased density of components on chip Gordon Moore - cofounder of Intel Number of transistors on a chip will double every year Since 1970’s development has slowed a little — Number of transistors doubles every 18 months • Cost of a chip has remained almost unchanged • Higher packing density means shorter electrical paths, giving higher performance • Smaller size gives increased flexibility • Reduced power and cooling requirements • Fewer interconnections increases reliability 38 Growth in CPU Transistor Count 39 The IAS (von Neumann) Machine Main Memory Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing Input and output equipment operated by control unit 1946 ~ 1952 John von Neumann Arithmetic Princeton and Institute for Advanced Studies Logic Unit Input Output Equipment Program Control Unit The Structure of IAS Computer Almost all of today’s computers have the same general structure as the IAS referred to as von Neumann machines. 40 The IAS Machine: Control Unit The control unit operates the machine by fetching instructions from memory and executing them ONE at a time. Central Processing Unit Arithmetic and Logic Unit Accumulator MQ Arithmetic & Logic Circuits MBR Input Output Equipment Instructions & Data Main Memory PC IBR MAR IR Control Circuits Program Control Unit Address 41 The IAS Machine: Instruction Cycle The IAS operates by repetitively performing an instruction cycle. Two sub-cycles: •During the fetch cycle, the opcode of the NEXT instruction is loaded in to the IR and the address portion is loaded into the MAR •Once the opcode is in the IR, the execute cycle is performed. Control circuitry interprets the opcode and executes the instruction by sending out appropriate control signals to cause data to be moved or an operation to be performed by the ALU. 42 IAS - details • 1000 x 40 bit words —Binary number —2 x 20 bit instructions • Set of registers (storage in CPU) —Memory Buffer Register (MBR) —Memory Address Register (MAR) —Instruction Register (IR) —Instruction Buffer Register (IBR) —Program Counter (PC) —Accumulator (AC) —Multiplier Quotient (MQ) 43 Structure of IAS – detail , FCSIT 44 Evolution of Intel Microprocessor Source: http://www.intel.com/intel/museum/25anniv/hof/tspecs.htm 1970s Processors 4004 8008 8080 8086 8088 Introduced 11/15/71 4/1/72 4/1/74 6/8/78 6/1/79 Clock Speeds 200KHz 2MHz 5MHz, 8MHz, 10MHz 5MHz, 8MHz Bus Width 4 bits 8 bits 8 bits 16 bits 8 bits Number of 2,300 Transistor (10 microns) s 3,500 (10 microns) 6,000 (6 microns) 29,000 (3 microns) 29,000 (3 microns) Addressab 640 bytes le Memory 16 KBytes 64 KBytes 1 MB 1 MB Virtual Memory -- -- -- -- 10X the performance of the 8080 Identical to 8086 except for its 8-bit external bus 108KHz -- Brief First microcomputer Descriptio chip, Arithmetic n manipulation Data/character 10X the manipulation performance of the 8008 45 Evolution of Intel Microprocessor Source: http://www.intel.com/intel/museum/25anniv/hof/tspecs.htm 1980s Processors Intel386TM DX Microprocessor Intel386TM SX Microprocessor Intel486TM DX CPU Microproce ssor Introduced 2/1/82 10/17/85 6/16/88 4/10/89 Clock Speeds 16MHz, 20MHz, 25MHz, 33MHz 16MHz, 20MHz, 25MHz, 33MHz 25MHz, 33MHz, 50MHz Bus Width 16 bits 32 bits 16 bits 32 bits Number of 134,000 Transistor (1.5 microns) s 275,000 (1 micron) 275,000 (1 micron) 1.2 million (1 micron) (.8 micron with 50MHz) Addressab 16 megabytes le Memory 4 gigabytes 16 megabytes 4 gigabytes Virtual Memory 64 terabytes 64 terabytes 64 terabytes 80286 6MHz, 8MHz, 10MHz, 12.5MHz 1 gigabyte Brief 3-6X the Descriptio performance of the n 8086 First X86 chip to handle 32- 16-bit address bus enabled lowbit data sets cost 32-bit processing Level 1 cache on chip 46 Evolution of Intel Microprocessor Source: http://www.intel.com/intel/museum/25anniv/hof/tspecs.htm 1990s Processors Intel486TM SX Microprocessor Introduced Pentium® Processor Pentium® Pro Processor Pentium® II Processor 4/22/91 3/22/93 11/01/95 5/07/97 Clock Speeds 16MHz, 20MHz, 25MHz, 33MHz 60MHz,66MHz 150MHz, 166MHz, 180MHz, 200MHz 200MHz, 233MHz, 266MHz, 300MHz Bus Width 32 bits 64 bits 64 bits 64 bits Number of Transistors 1.185 million (1 micron) 3.1 million (.8 micron) 5.5 million (0.35 micron) 7.5 million (0.35 micron) Addressable 4 gigabytes Memory 4 gigabytes 64 gigabytes 64 gigabytes Virtual Memory 64 terabytes 64 terabytes 64 terabytes 64 terabytes Brief Description Identical in design to TM Intel486 DX but without math coprocessor Superscalar architecture brought 5X the performance of TM the 33-MHz Intel486 DX processor Dynamic execution architecture drives high-performing processor Dual independent bus, dynamic execution, Intel TM MMX technology 47 Pentium Evolution (1) • 8080 — first general purpose microprocessor — 8 bit data path — Used in first personal computer – Altair • 8086 — much more powerful — 16 bit — instruction cache, prefetch few instructions — 8088 (8 bit external bus) used in first IBM PC • 80286 — 16 Mbyte memory addressable — up from 1Mb • 80386 — 32 bit — Support for multitasking 48 Pentium Evolution (3) • Pentium II —MMX technology —graphics, video & audio processing • Pentium III —Additional floating point instructions for 3D graphics • Pentium 4 —Note Arabic rather than Roman numerals —Further floating point and multimedia enhancements • Itanium —64 bit —see chapter 15 • See Intel web pages for detailed information on processors 49 Speeding it up • • • • • • Pipelining On board cache On board L1 & L2 cache Branch prediction Data flow analysis Speculative execution 50 Performance Mismatch • Processor speed increased • Memory capacity increased • Memory speed lags behind processor speed 51 Logic and Memory Performance Gap Solutions • Increase number of bits retrieved at one time —Make DRAM “wider” rather than “deeper” • Change DRAM interface —Cache • Reduce frequency of memory access —More complex cache and cache on chip • Increase interconnection bandwidth —High speed buses —Hierarchy of buses I/O Devices • • • • • Peripherals with intensive I/O demands Large data throughput demands Processors can handle this Problem moving data Solutions: —Caching —Buffering —Higher-speed interconnection buses —More elaborate bus structures —Multiple-processor configurations Typical I/O Device Data Rates Key is Balance • • • • Processor components Main memory I/O devices Interconnection structures Improvements in Chip Organization and Architecture • Increase hardware speed of processor —Fundamentally due to shrinking logic gate size – More gates, packed more tightly, increasing clock rate – Propagation time for signals reduced • Increase size and speed of caches —Dedicating part of processor chip – Cache access times drop significantly • Change processor organization and architecture —Increase effective speed of execution —Parallelism Problems with Clock Speed and Login Density • Power — Power density increases with density of logic and clock speed — Dissipating heat • RC delay — Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them — Delay increases as RC product increases — Wire interconnects thinner, increasing resistance — Wires closer together, increasing capacitance • Memory latency — Memory speeds lag processor speeds • Solution: — More emphasis on organizational and architectural approaches Intel Microprocessor Performance Increased Cache Capacity • Typically two or three levels of cache between processor and main memory • Chip density increased —More cache memory on chip – Faster cache access • Pentium chip devoted about 10% of chip area to cache • Pentium 4 devotes about 50% More Complex Execution Logic • Enable parallel execution of instructions • Pipeline works like assembly line —Different stages of execution of different instructions at same time along pipeline • Superscalar allows multiple pipelines within single processor —Instructions that do not depend on one another can be executed in parallel Diminishing Returns • Internal organization of processors complex —Can get a great deal of parallelism —Further significant increases likely to be relatively modest • Benefits from cache are reaching limit • Increasing clock rate runs into power dissipation problem —Some fundamental physical limits are being reached New Approach – Multiple Cores • Multiple processors on single chip — Large shared cache • Within a processor, increase in performance proportional to square root of increase in complexity • If software can use multiple processors, doubling number of processors almost doubles performance • So, use two simpler processors on the chip rather than one more complex processor • With two processors, larger caches are justified — Power consumption of memory logic less than processing logic Internet Resources • http://www.intel.com/ —Search for the Intel Museum • • • • • http://www.ibm.com http://www.dec.com Charles Babbage Institute PowerPC Intel Developer Home 64