CMPT 334 Computer Organization Instructor: Tina Tian General Information • Email: tina.tian@manhattan.edu • Office: RLC 203A • Office Hours: Monday, Thursday 1:30 – 2:30 PM Wednesday 12:00 – 1:00 PM or by appointment • Website: home.manhattan.edu/~tina.tian About the Course • Mon, Thur 3:00–4:15PM • Textbook: ▫ David A. Patterson and John L. Hennessy, "Computer Organization and Design: The Hardware Software Interface". ▫ 5th edition ▫ or 4th edition revised printing About the Authors • David A. Patterson ▫ pioneer of Reduced Instruction Set Computer (RISC) • John L. Hennessy ▫ founder of MIPS Computer Systems Inc Grading • • • • 1st Midterm Exam (5th week) 2nd Midterm Exam (10th week) Final Exam Homework 15% 15% 30% 40% Homework • Homework assignments involve ▫ written problems ▫ assembly language programming ▫ logic design using tools • Late work has points deducted. • Not accepted a week after the deadline Homework Policy • You may discuss the homework with other students. • However, you must acknowledge the people you worked with. • And you must independently write up your own solutions. • Any written sources used (apart from the text) must also be acknowledged. What you will learn • A study of the internal architecture of a computer. ▫ ▫ ▫ ▫ ▫ Digital logic Machine-level representation of data Assembly level machine organization Memory system organization and architecture What determines program performance and how it can be improved ▫ ... Software • Assembly language simulator ▫ SPIM ▫ MARS assembler and runtime simulator • Digital logic simulator ▫ Logisim Advises • Take notes • Do NOT check emails, Facebook, etc. ▫ Turn the monitors off • Don’t use the printer in class. ▫ Print after class, no points off • Start the homework early Chapter 1 Computer Abstractions and Technology [Adapted from Computer Organization and Design 5th Edition, Patterson & Hennessy, © 2014, MK] Classes of Computers • Personal computers ▫ General purpose, variety of software ▫ Subject to cost/performance tradeoff • Server computers ▫ Network based ▫ High capacity, performance, reliability ▫ Range from small servers to building sized Classes of Computers • Supercomputers ▫ High-end scientific and engineering calculations ▫ Hundreds to thousands of processors, terabytes of memory and petabytes of storage • Embedded computers ▫ Hidden as components of systems ▫ Stringent power/performance/cost constraints Review: Some Basic Definitions • Kilobyte – 210 or 1,024 bytes • Megabyte– 220 or 1,048,576 bytes ▫ sometimes “rounded” to 106 or 1,000,000 bytes • Gigabyte – 230 or 1,073,741,824 bytes ▫ sometimes rounded to 109 or 1,000,000,000 bytes • Terabyte – 240 or 1,099,511,627,776 bytes ▫ sometimes rounded to 1012 or 1,000,000,000,000 bytes • Petabyte – 250 or 1024 terabytes ▫ sometimes rounded to 1015 or 1,000,000,000,000,000 bytes • Exabyte – 260 or 1024 petabytes ▫ Sometimes rounded to 1018 or 1,000,000,000,000,000,000 bytes The PostPC Era Embedded Processor Characteristics The largest class of computers spanning the widest range of applications and performance ▫ Often have minimum performance requirements. ▫ Often have stringent limitations on cost. ▫ Often have stringent limitations on power consumption. ▫ Often have low tolerance for failure. Below Your Program • Application software ▫ Written in high-level language • System software ▫ Compiler: translates HLL code to machine code ▫ Operating System: service code Handling input/output Managing memory and storage Scheduling tasks & sharing resources • Hardware ▫ Processor, memory, I/O controllers Levels of Program Code • High-level language ▫ Level of abstraction closer to problem domain ▫ Provides for productivity and portability • Assembly language ▫ Textual representation of instructions • Hardware representation ▫ Binary digits (bits) ▫ Encoded instructions and data Under the Covers • Five classic components of a computer – input, output, memory, datapath, and control datapath + control = processor (CPU) Moore’s Law • The number of transistors on integrated circuits doubles every 18 – 24 months. ▫ Predicted by Gordon Moore, co-founder of Intel Moore’s Law • Moore predicted the growth rate of the number of transistors per chip. • A transistor is an on/off switch controlled by electricity. • The integrated circuit combines dozens to hundreds of transistors into a single chip. • Very large-scale integrated circuit (VLSI) – millions of transistors Technology Scaling Road Map Year 2004 2006 2008 2010 2012 2014 Feature size (nm) 90 65 45 32 22 16 Intg. Capacity (BT) 2 4 6 16 32 64 • Fun facts about 45nm transistors ▫ 30 million can fit on the head of a pin ▫ You could fit more than 2,000 across the width of a human hair ▫ If car prices had fallen at the same rate as the price of a single transistor has since 1968, a new car today would cost about 1 cent Another Example of Moore’s Law Impact DRAM capacity growth over 3 decades But What Happened to Clock Rates and Why? The power wall Clock rates hit a “power wall” We can’t remove more heat How else can we improve performance? A Sea Change is at Hand • The power challenge has forced a change in the design of microprocessors ▫ Since 2002 the rate of improvement in the response time of programs on desktop computers has slowed from a factor of 1.5 per year to less than a factor of 1.2 per year • As of 2006 all desktop and server companies are shipping microprocessors with multiple processors – cores – per chip Plan of record is to double the number of cores per chip per generation (about every two years) Uniprocessor Performance Understanding Performance • Algorithm ▫ Determines number of operations executed • Programming language, compiler, architecture ▫ Determine number of machine instructions executed per operation • Processor and memory system ▫ Determine how fast instructions are executed • I/O system (including OS) ▫ Determines how fast I/O operations are executed Defining Performance • Which airplane has the best performance? Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud Concorde BAC/Sud Concorde Douglas DC-8-50 Douglas DC8-50 0 100 200 300 400 0 500 Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud Concorde BAC/Sud Concorde Douglas DC-8-50 Douglas DC8-50 500 1000 Cruising Speed (mph) 4000 6000 8000 10000 Cruising Range (miles) Passenger Capacity 0 2000 1500 0 100000 200000 300000 400000 Passengers x mph Throughput versus Response Time • Response time (execution time) – the time between the start and the completion of a task ▫ Important to individual users • Throughput (bandwidth) – the total amount of work done in a given time ▫ Important to data center managers Will need different performance metrics as well as a different set of applications to benchmark embedded and desktop computers, which are more focused on response time, versus servers, which are more focused on throughput Defining (Speed) Performance • To maximize performance, need to minimize execution time performanceX = 1 / execution_timeX If X is n times faster than Y, then performanceX execution_timeY -------------------- = --------------------- = n performanceY execution_timeX Decreasing response time almost always improves throughput Relative Performance Example • If computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds, how much faster is A than B? We know that A is n times faster than B if performanceA execution_timeB -------------------- = --------------------- = n performanceB execution_timeA The performance ratio is So A is 1.5 times faster than B 15 ------ = 1.5 10 Measuring Execution Time • Elapsed time ▫ Total response time, including all aspects Processing, I/O, OS overhead, idle time ▫ Determines system performance • CPU time ▫ Time spent processing a given job Discounts I/O time, other jobs’ shares ▫ Comprises user CPU time and system CPU time ▫ Different programs are affected differently by CPU and system performance CPU Clocking • Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0×109Hz CPU Time CPU Time CPU Clock Cycles Clock Cycle Time CPU Clock Cycles Clock Rate • Performance improved by ▫ Reducing number of clock cycles ▫ Increasing clock rate ▫ Hardware designer must often trade off clock rate against cycle count Improving Performance Example • A program runs on computer A with a 2 GHz clock in 10 seconds. What clock rate must computer B run at to run this program in 6 seconds? Unfortunately, to accomplish this, computer B will require 1.2 times as many clock cycles as computer A to run the program. CPU timeA= ------------------------------CPU clock cyclesA clock rateA CPU clock cyclesA = 10 sec x 2 x 109 cycles/sec = 20 x 109 cycles CPU timeB= ------------------------------1.2 x 20 x 109 cycles clock rateB clock rateB= ------------------------------1.2 x 20 x 109 cycles = 4 GHz 6 seconds Clock Cycles per Instruction • Not all instructions take the same amount of time to execute ▫ One way to think about execution time is that it equals the number of instructions executed multiplied by the average time per instruction # CPU clock cycles for a program = # Instructions x for a program Average clock cycles per instruction Clock cycles per instruction (CPI) – the average number of clock cycles each instruction takes to execute A way to compare two different implementations of the same ISA CPI CPI for this instruction class A B C 1 2 3 Instruction Count and CPI Clock Cycles Instruction Count Cycles per Instruction CPU Time Instruction Count CPI Clock Cycle Time Instruction Count CPI Clock Rate • Instruction Count for a program ▫ Determined by program, ISA and compiler • Average cycles per instruction ▫ Determined by CPU hardware ▫ If different instructions have different CPI Average CPI affected by instruction mix Using the Performance Equation • Computers A and B implement the same ISA. Computer A has a clock cycle time of 250 ps and an effective CPI of 2.0 for some program and computer B has a clock cycle time of 500 ps and an effective CPI of 1.2 for the same program. Which computer is faster and by how much? Each computer executes the same number of instructions, I, so CPU timeA = I x 2.0 x 250 ps = 500 x I ps CPU timeB = I x 1.2 x 500 ps = 600 x I ps Clearly, A is faster … by the ratio of execution times performanceA execution_timeB 600 x I ps ------------------- = --------------------- = ---------------- = 1.2 performanceB execution_timeA 500 x I ps THE Performance Equation • Our basic performance equation is then CPU time = Instruction_count x CPI x clock_cycle or CPU time = Instruction_count x CPI ----------------------------------------------clock_rate These equations separate the three key factors that affect performance Can measure the CPU execution time by running the program The clock rate is usually given Can measure overall instruction count by using profilers/ simulators without knowing all of the implementation details CPI varies by instruction type and ISA implementation for which we must know the implementation details Determinates of CPU Performance CPU time = Instruction_count x CPI x clock_cycle Instruction_ count CPI Algorithm X X Programming language X X Compiler X X ISA X X clock_cycle X Example • A given application written in Java runs 15 seconds on a desktop processor. A new Java compiler is released that requires only 0.6 as many instructions as the old compiler. Unfortunately, it increases the CPI by 1.1. How fast can we expect the application to run using this new compiler? Effective (Average) CPI • Computing the overall effective CPI is done by looking at the different types of instructions and their individual cycle counts and averaging n Overall effective CPI = (CPIi x ICi) i=1 Where ICi is the count (percentage) of the number of instructions of class i executed CPIi is the (average) number of clock cycles per instruction for that instruction class n is the number of instruction classes The overall effective CPI varies by instruction mix – a measure of the dynamic frequency of instructions across one or many programs CPI Example • Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Sequence 1: IC = 5 Clock Cycles = 2×1 + 1×2 + 2×3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 4×1 + 1×2 + 1×3 =9 Avg. CPI = 9/6 = 1.5 Turbo Mode • Clock cycle time has traditionally been fixed. • Today’s processors can vary their clock rates. ▫ Intel Core i7 will temporarily increase clock rate by about 10% until the chip gets too warm. Amdahl’s Law • Improving an aspect of a computer and expecting a proportional improvement in overall performance Timproved Example: multiply accounts for 80s/100s Taffected Tunaffected improvemen t factor How much improvement in multiply performance to get 5× overall? 80 Can’t be done! 20 20 n Corollary: make the common case fast • Amdah’s Law, together with the CPU performance equation, is a handy tool for evaluation potential enhancements. Timproved Taffected Tunaffected improvemen t factor Clock Cycles Instruction Count Cycles per Instruction CPU Time Instruction Count CPI Clock Cycle Time Instruction Count CPI Clock Rate MIPS as a Performance Metric • MIPS: Millions of Instructions Per Second ▫ Faster computers have a higher MIPS rating. ▫ Doesn’t account for Differences in ISAs between computers Differences in complexity between instructions Instruction count MIPS Execution time 10 6 Instruction count Clock rate 6 Instruction count CPI CPI 10 6 10 Clock rate We cannot compare computers with different instruction sets using MIPS. Example • Which computer has the higher MIPS rating? • Which computer is faster? Computer A Computer B Instruction Count 10 billion 8 billion Clock rate 4GHz 4GHz CPI 1.0 1.1