Architectural Musings Rethinking Computer Systems Architecture Christopher Vick cvick@qualcomm.com June 3, 2012 1 Introduction Vision Talk 2 Mobile computing and current technologies fundamentally change key parameters and constraints for computer system architecture Vast new opportunities for research of great interest to and great relevance for industry Outline Computer System Architecture Then (Circa 1970) Scarce Resources & Bottlenecks Optimizations Now (Mobile Computing Platforms) Scarce Resources & Bottlenecks Optimizations? Qualcomm Research Questions? 3 COMPUTER SYSTEM ARCHITECTURE 4 Computer System Architecture Hardware The 5 classic components (Patterson & Hennessy) Input, Output, Memory, Datapath, Control Software System Virtual Machine (Hypervisor, VM, or VMM) Operating System Compilers & Tools Definitions The way components fit together The arrangement of the various devices in a complete computer system or network The instruction set plus a model of the execution of the instruction set (Amdahl et al) Computer System Architecture The selection and combination of hardware and software components to assemble an effective computer system 5 Combination Application Programs Libraries Operating System Drivers Memory Manager Scheduler Hypercall Interface Virtual Machine Multicore Execution Unit Interconnect IO Devices 6 Memory S o f t w a r e H a r d w a r e Effective An optimization problem Many variables Selection of hardware/software components Selection of interfaces/interconnects Many constraints Physical, sociological, technical & cost constraints Scarce Resources and Bottlenecks Maximize utilization of scarce resources Minimize impact of bottlenecks 7 THEN (CIRCA 1970) 8 Scarce Resources CPU Cycles CPUs expensive Slow clock rates Memory Locations Random Access Memory expensive Address/Data paths into CPU expensive Skilled Programmers Relatively new discipline Poor language and tools support 9 Bottlenecks Programmer Productivity Software development slow and expensive Low level programming paradigms Memory Latency RAM latency gated overall speed (~2-3 MHz) Small RAM backed by vastly slower storage I/O Bandwidth Limited CPU connectivity Crude communication mechanisms 10 Optimizations Time Sharing Effective sharing of limited resource Virtual Memory Effective sharing, and backing with cheaper alternative Hardware Improvements Smaller features provide more resource and faster clock Large Scale Integration Better signaling to improve bandwidth High Level Programming Languages Broadens productive programmer community Abstracts away some hardware complexity 11 Examples Digital PDP 11 16-bit address space Orthogonal instruction set Memory mapped I/O Unix, DOS, many others IBM System 370 12 24-bit address space Virtual Memory VMS, VM/370, DOS/VS Backward compatibility with System 360 NOW (MOBILE COMPUTING) 13 Scarce Resources Energy Fixed Energy Budget for mobile devices Thermal issues at all scales Tradeoff between performance and energy Shrinks no longer significantly improving consumption Memory Bandwidth Providing bandwidth is expensive Memory interconnect consumes significant energy 14 Bottlenecks Memory Latency Increasing gap between CPU speed and DRAM latency Physical distance to DRAM devices a factor Concurrency Shortage of programmers who can handle this Inadequate language/tools support I/O Bandwidth/Latency Wireless bandwidth lower than wired Consumes large amounts of energy 15 Example HTC One Processor: 1.5 GHz Dual Core Qualcomm MSM8960 OS: Android™ 4.0 (ICS) Memory RAM: 1 GB DDR2 Memory Storage: 16 GB onboard storage Display: 4.7" HD super LCD 1280 x 720 Network: LTE CAT3 - DL 100 /UL 50 LTE: 700/AWS WCDMA: 2100/1900/AWS/850 EDGE: 850/900/1800/1900 Battery: 1800 mAh Camera (Main): 8 MP, f/2.0, BSI, 1080p HD Video (Front): 1.3 MP with 720p video Dimensions: 134.8 x 69.9 x 8.9mm This is a General Purpose Computer! 16 Optimizations? Multi-core Aggressive addition of cores and threads Hardware concurrency outstripping software New Concurrent Programming Models/Tools? Memory Subsystem Significant contributor to total energy consumption Adding bandwidth is expensive New technologies addressing some energy issues Wireless bandwidth enhancements (LTE Advanced,etc.) Solutions from desktop/server or embedded worlds may not directly apply in mobile space! 17 Memory System Energy Retaining data (one second) DRAM: ~1-10 pJ/bit self-refresh SRAM: 1200+ pJ/bit, and rising over time [ITRS 2009] 4 pJ/bit (45nm LP, standby) [Barasinski et al., ESSCIRC ‘08] Flash, PCM, STT RAM…: Zero ! Moving Data 32-bit value: 18 Recompute: 60 pJ (Razor) Send 1mm: 10 pJ Retain in cache for 1 ms: 38 pJ Retain in DRAM for 1 second: 32+ pJ Reducing Memory System Energy Move less! Caches physically close to CPU Locality, locality, locality (the first rule of chip real estate) Retain less! Power off unused caches lines [Kaxiras et al., ISCA ‘01] “Drowsy” caches [Flautner et al., ISCA ‘02] … with compiler analysis [Zhang et al., Trans. Emb. Comp. Sys. 4(3) 2005] Don’t refresh unused DRAM … e.g. with garbage collection [Chen et al., CODES+ISSS ‘03] 19 Extending the Memory Model Maintaining the illusion of a single flat memory address space is too expensive On-chip caches can be major consumers of area and energy Coherence protocols are expensive and difficult to scale • Alternative: software-managed memory hierarchies – Tightly-coupled memory (TCM), scratchpads – Do not require tag memory, address comparison logic – More area- and energy-efficient – Help bridge gap between bandwidth and throughput 20 New Challenges and Opportunities Different programming paradigm: software explicitly orchestrates all transfers between on-chip and off-chip memory areas Major implications on memory management Scratchpad allocation strategies Data partitioning strategies Dynamic relocation between scratchpad and DRAM to track the program’s locality characteristics Opportunities for compile-time and runtime optimization Challenges in both Hardware and Software! 21 Qualcomm Research Excellence in Wireless MAY | 2012 WWW.QUALCOMM.COM/RESEARCH State of the Art Capabilities Fostering Innovation 2323 Human Resources Complete Development Labs • 30% of engineers with PhD, 50% Masters • Prototype Development Facilities • Systems, HW, SW, Standards, Test Engineering • CPU Simulation Clusters • Ventures, Bus Dev, Technical Marketing, Program Mgmt. • Outdoor Field Systems • Antenna Ranges Global Research and Development Organization UNITED STATES EUROPE ASIA • San Diego, CA • Cambridge, UK • Beijing, China • Santa Clara, CA • Nuremberg, Germany • Bridgewater, NJ • Vienna, Austria • Bangalore and Hyderabad, India • Seoul, S. Korea 24 Qualcomm Research & University Relations ACADEMIC COLLABORATION TO FOSTER ADVANCED RESEARCH RESEARCH Ongoing relations with more than 30 US and 25 International Universities Current funding includes MIT, UC Berkeley, Stanford, UCSD, UT Austin, ASU, UIUC, Univ. of Michigan, EPFL, IISc Bangalore, KAIST, Tsinghua Research collaboration spans variety of technical areas Computer vision, multicore processing, context aware computing, machine learning, low power devices,, wireless networks and signal processing, etc.. Qualcomm Innovation Fellowship (QInF) invests on innovative ideas Close interactions between Qualcomm Research engineers, graduate students and professors 25 Qualcomm Research For The Wireless Future 26 TAKE WWAN TO THE NEXT LEVEL INNOVATE BEYOND WAN ENABLE SMART APPLICATIONS BREAKTHROUGH PERFORMANCE IMPROVING WWAN TECHNOLOGY EXCELLING IN ALL FORMS OF WIRELESS TRANSFORMING THE MOBILE USER EXPERIENCE RE-ARCHITECTING NEXT-GEN MOBILE DEVICES Innovate Beyond WAN WIRELESS LOCAL AREA PEANUT WIFI ADVANCED • Next gen short range ultra-low power radio • Multi Gbps WLAN using 5 GHz and 60 GHz band. • Next Gen low-power WiFi for Internet of Things LTE D2D (FLASHLINQ) • Proximal Wireless • First Gen device-todevice wireless network • Autonomous discovery • Direct communications 27 INNAV • Indoor positioning for indoor location based applications • Map tools for Mobile Devices Enable Smart Applications ELEVATE THE WIRELESS USER EXPERIENCE AUGMENTED REALITY • Mobile user interface • Computer vision for mobile devices 28 LOOK • Multiple language text detection and recognition • With Mobile phone camera view finder LISTEN • Background Audio processing • Augmented user experience DASH • Efficient video delivery over HTTP for mobile devices AWARE • Build awareness in mobile devices • For enhanced daily life situations Breakthrough Device Performance RE-ARCHITECTING NEX-GEN DEVICES ADVANCED RADIO TECHNOLOGIES • New RF front-end and baseband technologies • Advanced mobile device SW platforms • RF/antenna and systems/protocol techniques • Improved user experience • Concurrent multi-radio operation 29 MANTICORE GRYPHON • Virtual machine design for SoC architecture • Enabling higher power efficiency Thank You