+ CS 325: CS Hardware and Software Organization and Architecture Introduction 1 + Outline Course information Course website Syllabus Term project 2 + Dr. Michael Galloway Specialization: VANETs, MANETs, Network Protocols B.W.E – Wireless Engineering – Auburn University, December 2005 Specialization: Parallel and Distributed Computing, Cloud Computing Local Infrastructure as a Service Cloud Architectures M.S. – Computer Science – The University of Alabama, May 2008 COHH 4134 jeffrey.galloway@wku.edu Ph.D. – Computer Science – The University of Alabama, Aug. 2013 Myself 3 Specialization: Wireless Communication Protocols, Resource Management of Handheld Devices. Taught classes at 6 colleges and universities First time teaching computer organization and architecture Current Research: Infrastructure Cloud Architectures, Automated Resource Deployment, Vertical Educational Clouds, Power Modeling Chair: WKU ACM Student Chapter – Everyone should join!! + This Course Description course will provide a means for Coverage of computer systems and architecture Bridge between low-level hardware systems and operating systems programming Means for in-depth coverage of new generation hardware and computer systems. Topics include computer number representations computer arithmetic CPU operations instruction sets I/O memory management system performance parallelism. 4 + Important Information Course Website: http://ip204-5.sth.wku.edu/cs325Web/ Required Book: “Computer Organization and Architecture” by William Stallings, 9th or 10th edition. Office Hours: 8:30am – 10:30am on Monday, Wednesday 5 + Why Should We Study Computer Architecture? It’s 6 required Understand computer performance and cost factors. Basis for understanding of OS and programming concepts. Understand how to write programs that are: Faster Smaller Less prone to error To appreciate the relative cost of operations and the effect of programming choices. Helps you to debug. + Digital The Bad News… Hardware Is complex Cannot be fully understood in one course Requires background in electrical engineering, physics, chemistry The CPU is the most complex device created by humans. Over 10 Billion transistors (2015) Transistor switching speed of over 4 billion/sec (4Ghz) 14nm fabrication process, and getting smaller 10nm scale projected for 2017 ~43 Si atoms! 7 + The Good News! It 8 is possible to understand the architectural components without knowing all of the low-level details. Programmers Characteristics only need to know the essentials of major components Role in overall system Consequences for programmers + Why Study Computer Architecture? Understand where computers are going Future capabilities drive the (computing) world Real world-impact: no computer architecture no computers! Understand high-level design concepts The best architects understand all the levels Devices, circuits, architecture, compiler, applications Write better software The best software designers also understand hardware Need to understand hardware to write fast software Design hardware Intel, AMD, IBM, ARM, Qualcomm, NVIDIA, Samsung 9 + Course Goals See the big ideas in computer architecture Pipelining, parallelism, caching, abstraction, … Exposure to examples of good (and some bad) engineering Get exposure to research and cutting edge ideas Read 10 some research papers + Coursework Research Paper (4 drafts throughout the semester) Homework assignments (~8 throughout semester) Short answer questions related to topics covered in class 48-hour “grace” periods Hand in late, no questions asked No assignments accepted after solutions posted 3 Exams 11 In-depth research on an architecture related topic of your choice Oral presentation of your findings In class Typically 5 questions, equally weighted. Answer 4 out of 5 Cumulative final + Grading Paper reviews: 50% Drafts: 10% each, x4 Final presentation: 10% Homework assignments: 15% Exams: 35% In-class: 20% Final: 15% Smiling: 5% 12 + Computer Architecture Design Goals & Constraints Functional Needs to be correct And unlike software, difficult to update once deployed What functions should it support Reliable Does it continue to perform correctly? Hard fault vs transient fault Space satellites vs desktop vs server High performance Not just “Gigahertz” – 2.6GHz ARM vs. 2GHz Intel Xeon Impossible goal: fastest possible design for all programs 13 + Design Goals & Constraints Low cost Per unit manufacturing cost (wafer cost) Cost of making first chip after design (mask cost) Design cost Low power/energy Energy in (battery life, cost of electricity) Energy out (cooling and related costs) Cyclic problem, very much a problem today Challenge: balancing these goals 14 the relative importance of And the balance is constantly changing Our focus: performance, only touch on cost, power, reliability +Shaping Force: Applications/Domains Another shaping force: applications (usage and context) Scientific: weather prediction, genome sequencing First computing application domain: naval ballistics firing tables Need: large memory, heavy-duty floating point Examples: CRAY T3E, IBM BlueGene, Intel Xeon Phi, GPUs Commercial: database/web serving, e-commerce, Google Need: data movement, high memory + I/O bandwidth 15 Examples: Sun Enterprise Server, AMD Opteron, Intel Xeon +More Recent Applications/Domains Desktop: home office, multimedia, games Mobile: laptops, tablets, phones Need: low power, low cost Examples: ARM chips, dedicated digital signal processors (DSPs) 10 billion ARM chips sold in 2013 Deeply Embedded: disposable “smart dust” sensors 16 Need: low power, computational performance, integrated wireless Laptops: Intel Core i*, Atom, AMD APUs Smaller devices: ARM chips by Samsung, Qualcomm, Apple Embedded: microcontrollers in automobiles, door knobs, robotics Need: Increasing memory bandwidth, computational performance, integrated graphics/network? Examples: Intel Core i*, AMD Athlon Need: extremely low power, extremely low cost + Application Specific Designs This class is about general-purpose CPUs Processor that can do anything, run a full OS, etc. E.g., Intel Core i7, AMD Athlon, IBM Power, ARM, Intel Xeon In contrast to application-specific chips Examples: General Video encoding, 3D graphics rules Hardware is less flexible than software + Hardware more effective (speed, power, cost) than software + Domain specific more “parallel” than general purpose • But general mainstream processors becoming more parallel 17 - +Technology Trends Moore’s Continued (so far) transistor miniaturization Number of transistors in an integrated circuit has doubled approximately every 18 months Some Law technology-based ramifications Annual improvements in density, speed, power, costs SRAM/logic: density: ~30%, speed: ~20% DRAM: density: ~60%, speed: ~4% Disk: density: ~60%, speed: ~10% (non-transistor) Big improvements in flash memory and network bandwidth, too Changing 18 quickly and with respect to each other!! Example: density increases faster than speed Re-evaluate/re-design for each technology generation + Revolution I: The Microprocessor Microprocessor revolution One significant technology threshold was crossed in 1970s Enough transistors (~25K) to put a 16-bit processor on one chip Huge performance advantages: fewer slow chip-crossings Microprocessors Desktops, CD/DVD players, laptops, game consoles, set-top boxes, mobile phones, digital camera, mp3 players, GPS, automotive And 19 have allowed new market segments replaced incumbents in existing segments Microprocessor-based system replaced supercomputers, “mainframes”, “minicomputers”, etc. + First Microprocessor Intel 4004 (1971) Application: calculators Technology: 10000 2300 transistors 13 mm2 108 KHz 12 Volts 4-bit 20 data nm + Pinnacle of Single-Core Microprocessors Intel 21 Pentium4 (2003) Application: desktop/server Technology: 90nm (1/100th of 4004) 55M transistors (20,000x) 101 mm2 (10x) 3.4 GHz (10,000x) 1.2 Volts (1/10th) 32/64-bit data (16x) 22-stage pipelined datapath Two levels of on-chip cache hyperthreading + Modern Multicore Processor Intel 22 Core i7 (2013) Application: desktop/server Technology: 22nm (25% of P4) 1.4B transistors 177 mm2 3.5 GHz to 3.9 Ghz 1.8 Volts 256-bit data 14-stage pipelined datapath 4 instructions per cycle Three levels of on-chip cache hyperthreading Four-core multicore (4x) ??? + Tracing the Microprocessor Revolution How were growing transistor counts used? Initially 4004: 4 … to widen the datapath bits Pentium4: 64 bits and also to add more powerful instructions To reduce overhead of fetch and decode To simplify assembly programming (which was done by hand then) 23 + IBM System 370 Architecture IBM 24 System/370 architecture Was introduced in 1970 Included a number of models Could upgrade to a more expensive, faster model without having to abandon original software New models are introduced with improved technology, but retain the same architecture so that the customer’s software investment is protected Architecture has survived to this day as the architecture of IBM’s mainframe product line 370 Architecture still in use by IBM today! IBM zSeries IBM zEnterprise Series + Intel x86 Backward compatible instruction set architectures 1978 80186, 80286 1985 – 80386, 32 – bit 80486, Pentium, Pentium MMX 1995 – Introduced 8088, 16 – bit – Pentium Pro Pentium II, Pentium III 2000 – Pentium 4 2006 – Core 2, 64 – bit, multi-core 2008 – Core i3/i5/i7 Atom 25 + Function A computer can perform four basic functions: ● Data processing ● Data storage ● Data movement ● Control 27 The Computer + There are four main structural components of the computer: CPU – controls the operation of the computer and performs its data processing functions Main Memory – stores data I/O – moves data between the computer and its external environment System Interconnection – some mechanism that provides for communication among CPU, main memory, and I/O + Control CPU Major structural components: Unit Controls the operation of the CPU and hence the computer Arithmetic and Logic Unit (ALU) Performs the computer’s data processing function Registers Provide storage internal to the CPU CPU Interconnection Mechanisms that provide communication among the control unit, ALU, and registers + 30 Internet Resources - Web site for book http://WilliamStallings.com/COA/COA10e.html Links to sites of interest Links to sites for courses that use the book http://WilliamStallings.com/StudentSupport.html Math How-to Research resources Misc + 31 First Assignment - in class Determine your term project topic Talk with others in the class about good ideas for a research topic Remember, all projects are to be completed individually No overlapping of focus in any specific research area Record your name and research topic/focus on the project sign-up sheet