Yu-Ting Tsai Department of Computer Science & Engineering Yuan Ze University, Taiwan Textbook (Patterson) Chap. 1.1~1.4, 1.6, 1.10 Video (Prof. Liu) L01-01, L01-03~05 Reference (Hennessy) Chap. 1 2 • Computers are pervasive nowadays CS250A - Assembly Language & Computer Organization 3 • Novel Computer Applications • Computers in automobiles • Mobile phones • Human genome project • World wide web • Cloud computing • … CS250A - Assembly Language & Computer Organization 4 • Desktop/Laptop (Personal) Computers – Personal usage with a variety of software – Subject to cost/performance tradeoff • Servers – Modern form of mainframes & supercomputers – High capacity, performance, & reliability – Range from small servers to datacenters CS250A - Assembly Language & Computer Organization 5 • Embedded Computers – Span the widest range of applications • Mobile phone, automobile computers, video game consoles, …, etc. – Hidden as components of systems – Stringent power/performance/cost constraints CS250A - Assembly Language & Computer Organization 6 • General Purpose (PCs & Servers) – Commercial (integer), scientific (FP, graphics), home (integer, audio, video, graphics) – Software compatibility is most important – Short product life, higher price & profit margin – Operating system serves another interface above architecture CS250A - Assembly Language & Computer Organization 7 • Embedded Computers – A computer inside another device used for executing a specific application – Examples • Input/Output devices – Printers, disks, … • Consumer electronics – Video game consoles, CD players, PDAs, … CS250A - Assembly Language & Computer Organization 8 • Embedded Computers Lego Mindstorms Robotic command explorer: A “Programmable Brick”, Hitachi H8 CPU (8-bit), 32KB RAM, LCD, batteries, infrared transmitter/receiver, 4 control buttons, 6 connectors CS250A - Assembly Language & Computer Organization 9 • Embedded Computers CS250A - Assembly Language & Computer Organization 10 • Embedded Computers – Large variety in architecture, performance, & on-chip peripherals • Compatibility often is not important • New architecture is easy to enter • Low power becomes important – Usually large volume sale at low price CS250A - Assembly Language & Computer Organization 11 • Application Software • Written in high-level language (HLL) • System Software • Compiler translates HLL code to machine code • Operating system (service code) – Handling input/output, managing memory & storage, … • Hardware • Processor, memory, I/O controllers, … CS250A - Assembly Language & Computer Organization 12 High-Level Language Program Compiler Assembly Language Program Assembler Machine Language Program Machine Interpretation Control Signal Specification temp = v[0]; v[0] = v[1]; v[1] = temp; lw lw sw sw $15, $16, $16, $15, 0($2) 4($2) 0($2) 4($2) 1000 1100 0100 1111 0000 0000 0000 0000 1000 1100 0101 0000 0000 0000 0000 0100 1010 1100 0101 0000 0000 0000 0000 0000 1010 1100 0100 1111 0000 0000 0000 0100 ALUOP[0:3] <= InstReg[9:11] & MASK … … CS250A - Assembly Language & Computer Organization 13 Processor(s) (Active) Control (Brain) Devices Memory (Passive) Datapath (Brawn) CS250A - Assembly Language & Computer Organization Input Output 14 Processor(s) (Active) Control (Brain) Devices Keyboard, mouse, … Memory (Passive) Datapath (Brawn) Input Output Where programs & data live when running CS250A - Assembly Language & Computer Organization Disk (where programs & data live when not running) Display, printer, … 15 • Brains of Computers – Control unit tells datapath, memory, & I/O devices what to do • Decode & dispatch instructions – Datapath performs arithmetic operations using arithmetic/logical units (ALUs) • Based on binary number system – Cache memory • Small size memory for fast data access CS250A - Assembly Language & Computer Organization 16 • Basic Functionality of Control Unit Fetch instruction to which PC points – Fetch/Execute cycle • Steps that CPU takes to execute an instruction Execute fetched instruction – Program counter (PC) • Holds memory address of current instruction CS250A - Assembly Language & Computer Organization Increment PC 17 • AMD Barcelona (4 Processor Cores) CS250A - Assembly Language & Computer Organization 18 • AMD Barcelona (4 Processor Cores) CS250A - Assembly Language & Computer Organization 19 • Volatile Memory – Lose instructions & data when powered off – DRAM, SRAM, … • Non-Volatile Memory – Flash memory, hard drives, optical disks (CD, DVD, blue disc), … CS250A - Assembly Language & Computer Organization 20 • Accessories that allow computer to perform specific tasks – Receive information for processing – Return results of processing – Store information • Common Input/Output Devices – Keyboard, mouse, scanner, display, speakers, printer, hard drive, CD, DVD, … CS250A - Assembly Language & Computer Organization 21 • Communication & Resource Sharing – Local area network (LAN):Ethernet, … – Wide area network (WAN):Internet, … – Wireless network:WiFi, Bluetooth, LTE, … CS250A - Assembly Language & Computer Organization 22 • Scope – Capabilities & performance characteristics of functional units • Registers, ALU, shifters, ... – Ways in which hardware components are interconnected • Structure, … – Information flows between components • Data, datapath, … CS250A - Assembly Language & Computer Organization 23 • Scope – Logic & means by which such information flow is controlled – Register transfer level (RTL) description • A digital system is specified by – Set of registers – Operations performed on stored data – Controllers that supervise sequence of operations CS250A - Assembly Language & Computer Organization 24 • Computer Components & Their Relations – ISA + Computer organization Applications Software Hardware Operating System Compiler (Windows, Unix, Assembler Linux, iOS, …) Processor Memory I/O system Datapath & Control Digital Design Circuit Design Transistors CS250A - Assembly Language & Computer Organization Instruction set architecture (ISA) Computer organization 25 • An Important Computer Abstraction – Interface between hardware & low-level software – Standardizes instructions, machine language bit patterns, …, etc. – May include • Instruction set & formats • Modes of memory addressing • Exceptional conditions •… CS250A - Assembly Language & Computer Organization 26 • Advantages – Different implementations of the same architecture • Disadvantages – Sometimes prevents using new innovations • Modern Instruction Set Architectures – x86 series, PowerPC, MIPS, SPARC, ARM, …, etc. CS250A - Assembly Language & Computer Organization 27 • Examples DEC Alpha (v1, R, B, M, F, C, T) 1992-2001 HP PA-RISC (v1.0, v1.1, v2.0) 1986-1996 Sun SPARC (v7, v8, v9, …) 1987- SGI MIPS (MIPS I, II, III, IV, V, …) x86 Series (x86-16, IA-32, IA-64, 1978x64, MMX, SSE, AVX, ...) ARM (v1, v2, …, v9, …) 1985- RISC-V (v1, v2, 20xxxxxx,…) 2010- CS250A - Assembly Language & Computer Organization 1985- 28 • Instruction Categories Registers – Load/Store R0 - R31 – Computational – Jump & branch PC – Floating point – Memory management HI – Special LO • Example Instructions (All 32-bit Wide) OP OP OP rs rs rt rt rd shamt funct immediate jump txxxxarget CS250A - Assembly Language & Computer Organization 29 • Which airplane has best performance? Concorde • Capacity:132 persons • Range:4000 miles • Cruising speed:1320 mph (Mach 2.02) at 60,000 feet 747-400 • Capacity:470 persons • Range:4150 miles • Cruising speed:567 mph (Mach 0.85) at 35,000 feet CS250A - Assembly Language & Computer Organization 30 • Which airplane has best performance? Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud Concorde BAC/Sud Concorde Douglas DC-8-50 Douglas DC8-50 0 100 200 300 400 500 0 Passenger Capacity Boeing 777 Boeing 747 Boeing 747 BAC/Sud Concorde BAC/Sud Concorde Douglas DC-8-50 Douglas DC8-50 500 1000 4000 6000 8000 10000 Cruising Range (miles) Boeing 777 0 2000 1500 Cruising Speed (mph) CS250A - Assembly Language & Computer Organization 0 100000 200000 300000 400000 Passengers x mph 31 • Algorithm – Determines number of operations executed • Programming Language, Compiler, & Architecture – Determine number of machine instructions executed per operation CS250A - Assembly Language & Computer Organization 32 • Processor & Memory System – Determine how fast instructions are executed • I/O System (including Operating System) – Determines how fast I/O operations are executed CS250A - Assembly Language & Computer Organization 33 • Performance of Computer 1 𝑋: ExecutionTime𝑋 • Relative Performance – What does "𝑋 is 𝑛 times faster than 𝑌“ mean? •𝑛 = Performance𝑋 Performance𝑌 = ExecutionTime𝑌 ExecutionTime𝑋 – Example (time taken to run a program) ExecutionTime 15s • ExecutionTime𝑌 = 10s = 1.5 𝑋 • 𝑋 is 1.5 times faster than 𝑌 CS250A - Assembly Language & Computer Organization 34 • CPU operations are governed by a constantrate clock – Clock period:Duration of a clock cycle • Example:250 ps = 0.25 ns = 250 × 10−12 s – Clock frequency (rate):Cycles per second • Example:4.0 GHz = 4000 MHz = 4.0 × 109 Hz Clock period Clock (cycles) Data transfer & computation Update state CS250A - Assembly Language & Computer Organization 35 • Definition – Time spent on processing a given task • Discount I/O time & shares of other tasks • Comprise user CPU time & system CPU time ClockCycles – ClockRate = ClockCycles × ClockCycleTime CS250A - Assembly Language & Computer Organization 36 • Definition ClockCycles – ClockRate = ClockCycles × ClockCycleTime • How to Improve Performance? – Reduce cycle count (number of clock cycles) – Increase clock rate (or reduce clock cycle time) – Hardware designer must often trade off clock rate against cycle count CS250A - Assembly Language & Computer Organization 37 • Example – Computer 𝐴:2GHz clock rate, 10s CPU time – Design computer 𝐵 & aim for 6s CPU time • Faster clock rate, but with 1.2x clock count – How fast must clock rate of computer B be? • ClockCycles𝐴 = CPUTime𝐴 × ClockRate𝐴 = 10 s × 2 GHz • ClockRate𝐵 = 1.2×10 s×2 GHz 6s ClockCycles𝐵 CPUTime𝐵 = 4 GHz CS250A - Assembly Language & Computer Organization = 1.2×ClockCycles𝐴 CPUTime𝐵 = 38 • Number of Executed C Instructions – a = b – c; for(i=a; i>0; i--) sum = sum + x; • Number of Executed Machine Instructions – sub Loop: blez add addi j End: $r1, $r2, $r3 $r1, $r0, End $r8, $r8, $r10 $r1, $r1, -1 Loop CS250A - Assembly Language & Computer Organization 39 • Number of Executed C Instructions – a = b – c; for(i=a; i>0; i--) sum = sum + x; • Number of Executed Machine Instructions – sub Loop: blez add addi j End: $r1, $r2, $r3 $r1, $r0, End $r8, $r8, $r10 10 times → 42 instructions $r1, $r1, -1 20 times → 82 instructions Loop Dynamic instruction count CS250A - Assembly Language & Computer Organization 40 • Definition – Average number of clock cycles that each instruction takes to execute – CPI is determined by CPU hardware – If different instructions have different CPI • Average CPI is affected by instruction mix CS250A - Assembly Language & Computer Organization 41 • Instruction Count for a Program – Determined by program, ISA, & compiler • Redefine CPU Time – ClockCycles = InstructionCount × CPI – CPUTime = ClockCycles × ClockCycleTime = InstructionCount × CPI × ClockCycleTime InstructionCount × CPI = ClockRate CS250A - Assembly Language & Computer Organization 42 • Example – Computer A:CycleTime𝐴 = 250 ps, CPI𝐴 = 2.0 – Computer B:CycleTime𝐵 = 500 ps, CPI𝐵 = 1.2 – Which is faster & by how much (same ISA)? • CPUTime𝐴 = InstructionCount𝐴 × CPI𝐴 × CycleTime𝐴 = 𝐼 × 2.0 × 250 ps • CPUTime𝐵 = InstructionCount 𝐵 × CPI𝐵 × CycleTime𝐵 = 𝐼 × 1.2 × 500 ps Performance CPUTime 𝐼×1.2×500 ps • Performance𝐴 = CPUTime𝐵 = 𝐼×2.0×250 ps = 1.2 𝐵 CS250A - Assembly Language & Computer Organization 𝐴 43 • Definition Revisited – CPUTime = Instructions Program × ClockCycles Instruction × Seconds ClockCycle • Performance Dependence Program Compiler Instruction Set Organization Technology Instruction Count CPI Clock Rate × × × × CS250A - Assembly Language & Computer Organization × × × × × 44 • If different instruction classes take different numbers of cycles – ClockCycles = 𝑛 𝑖=1 CPI𝑖 × InstructionCount 𝑖 • Weighted Average CPI – CPI = = ClockCycles TotalInstructionCount InstructionCount𝑖 𝑛 𝑖=1 CPI𝑖 × TotalInstructionCount CS250A - Assembly Language & Computer Organization 45 • Example – Code with instructions in classes 𝐴, 𝐵, 𝐶 • Sequence 1:InstructionCount1 = 5 • ClockCycles1 = 2 × 1 + 1 × 2 + 2 × 3 = 10 • WeightedAverageCPI1 = 10 5 = 2.0 Class CPI for class IC in sequence 1 𝐴 𝐵 𝐶 1 2 2 1 3 2 IC in sequence 2 4 1 1 CS250A - Assembly Language & Computer Organization 46 • Example – Code with instructions in classes 𝐴, 𝐵, 𝐶 • Sequence 2:InstructionCount 2 = 6 • ClockCycles2 = 4 × 1 + 1 × 2 + 1 × 3 = 9 • WeightedAverageCPI2 = 9 6 = 1.5 Class CPI for class IC in sequence 1 𝐴 𝐵 𝐶 1 2 2 1 3 2 IC in sequence 2 4 1 1 CS250A - Assembly Language & Computer Organization 47 • Speedup Due to Enhancement 𝐸 – Speedup 𝐸 = ExecutionTime 𝐸 ′ ExecutionTime 𝐸 = 𝐸 ′ :Without 𝐸 Performance 𝐸 Performance 𝐸 ′ – If 𝐸 accelerates a fraction 𝐹 of task by a factor 𝑆 & remainder of task is unaffected • ExecutionTime 𝐸 = • Speedup 𝐸 = 1−𝐹 + 1 𝐹 1−𝐹 +𝑆 ≈ CS250A - Assembly Language & Computer Organization 1 1−𝐹 𝐹 𝑆 × ExecutionTime 𝐸 ′ (for 𝑆 → ∞) 48 • Low Power at Idle – From SPEC power benchmark 100% load 50% load 10% load Active power power power idle power Manufacturer Processor HP Xeon E5440 269 W 227 W (84%) 174 W (65%) 160 W (59%) Dell Xeon E5440 276 W 230 W (83%) 173 W (63%) 157 W (57%) Fujitsu Seimens Xeon X3220 132 W 110 W (83%) 85 W (65%) 80 W (60%) CS250A - Assembly Language & Computer Organization 49 • Low Power at Idle – Example:Google datacenter • Mostly operates at 10%~50% load • Less than 1% of time at 100% load – We may redesign processors to achieve power-proportional computing CS250A - Assembly Language & Computer Organization 50 • Amdahl's Law – Improve only a portion & expect proportional improvement in overall performance • 𝑇improved = 𝑇affected ImproveFactor + 𝑇unaffected – Corollary • Make common cases fast CS250A - Assembly Language & Computer Organization 51 • Amdahl's Law – Example • Multiplication operations account for 80s out of 100s • How much improvement in multiplication in order to get 5x overall performance? – 20 = 80 𝑛 + 20 CS250A - Assembly Language & Computer Organization This cannot be done! 52 • Basic Computer Organization • Hierarchy of Computer Abstractions • Instruction Set Architecture – Hardware/Software interface • About Performance – Execution time, CPI, instruction count, Amdahl's law, … • Fallacies & Pitfalls CS250A - Assembly Language & Computer Organization