keynote presentation

advertisement
Architectural Musings
Rethinking Computer Systems Architecture
Christopher Vick
cvick@qualcomm.com
June 3, 2012
1
Introduction
 Vision Talk
2
 Mobile computing and current technologies fundamentally
change key parameters and constraints for computer
system architecture
 Vast new opportunities for research of great interest to
and great relevance for industry
Outline
 Computer System Architecture
 Then (Circa 1970)
 Scarce Resources & Bottlenecks
 Optimizations
 Now (Mobile Computing Platforms)
 Scarce Resources & Bottlenecks
 Optimizations?
 Qualcomm Research
 Questions?
3
COMPUTER SYSTEM
ARCHITECTURE
4
Computer System Architecture
 Hardware
 The 5 classic components (Patterson & Hennessy)
 Input, Output, Memory, Datapath, Control
 Software
 System Virtual Machine (Hypervisor, VM, or VMM)
 Operating System
 Compilers & Tools
 Definitions
 The way components fit together
 The arrangement of the various devices in a complete computer system or
network
 The instruction set plus a model of the execution of the instruction set
(Amdahl et al)
 Computer System Architecture
 The selection and combination of hardware and software components to
assemble an effective computer system
5
Combination
Application Programs
Libraries
Operating System
Drivers
Memory
Manager
Scheduler
Hypercall Interface
Virtual Machine
Multicore Execution Unit
Interconnect
IO Devices
6
Memory
S
o
f
t
w
a
r
e
H
a
r
d
w
a
r
e
Effective
 An optimization problem
 Many variables
 Selection of hardware/software components
 Selection of interfaces/interconnects
 Many constraints
 Physical, sociological, technical & cost constraints
 Scarce Resources and Bottlenecks
 Maximize utilization of scarce resources
 Minimize impact of bottlenecks
7
THEN
(CIRCA 1970)
8
Scarce Resources
 CPU Cycles
 CPUs expensive
 Slow clock rates
 Memory Locations
 Random Access Memory expensive
 Address/Data paths into CPU expensive
 Skilled Programmers
 Relatively new discipline
 Poor language and tools support
9
Bottlenecks
 Programmer Productivity
 Software development slow and expensive
 Low level programming paradigms
 Memory Latency
 RAM latency gated overall speed (~2-3 MHz)
 Small RAM backed by vastly slower storage
 I/O Bandwidth
 Limited CPU connectivity
 Crude communication mechanisms
10
Optimizations
 Time Sharing
 Effective sharing of limited resource
 Virtual Memory
 Effective sharing, and backing with cheaper alternative
 Hardware Improvements
 Smaller features provide more resource and faster clock
 Large Scale Integration
 Better signaling to improve bandwidth
 High Level Programming Languages
 Broadens productive programmer community
 Abstracts away some hardware complexity
11
Examples
 Digital PDP 11




16-bit address space
Orthogonal instruction set
Memory mapped I/O
Unix, DOS, many others
 IBM System 370




12
24-bit address space
Virtual Memory
VMS, VM/370, DOS/VS
Backward compatibility with System 360
NOW
(MOBILE COMPUTING)
13
Scarce Resources
 Energy




Fixed Energy Budget for mobile devices
Thermal issues at all scales
Tradeoff between performance and energy
Shrinks no longer significantly improving consumption
 Memory Bandwidth
 Providing bandwidth is expensive
 Memory interconnect consumes significant energy
14
Bottlenecks
 Memory Latency
 Increasing gap between CPU speed and DRAM latency
 Physical distance to DRAM devices a factor
 Concurrency
 Shortage of programmers who can handle this
 Inadequate language/tools support
 I/O Bandwidth/Latency
 Wireless bandwidth lower than wired
 Consumes large amounts of energy
15
Example
 HTC One






Processor: 1.5 GHz Dual Core Qualcomm MSM8960
OS: Android™ 4.0 (ICS)
Memory RAM: 1 GB DDR2
Memory Storage: 16 GB onboard storage
Display: 4.7" HD super LCD 1280 x 720
Network: LTE CAT3 - DL 100 /UL 50 LTE: 700/AWS
WCDMA: 2100/1900/AWS/850
EDGE: 850/900/1800/1900
 Battery: 1800 mAh
 Camera (Main): 8 MP, f/2.0, BSI, 1080p HD Video
(Front): 1.3 MP with 720p video
 Dimensions: 134.8 x 69.9 x 8.9mm
 This is a General Purpose Computer!
16
Optimizations?
 Multi-core
 Aggressive addition of cores and threads
 Hardware concurrency outstripping software
 New Concurrent Programming Models/Tools?
 Memory Subsystem
 Significant contributor to total energy consumption
 Adding bandwidth is expensive
 New technologies addressing some energy issues
 Wireless bandwidth enhancements (LTE Advanced,etc.)
 Solutions from desktop/server or embedded worlds
may not directly apply in mobile space!
17
Memory System Energy
 Retaining data (one second)
 DRAM: ~1-10 pJ/bit self-refresh
 SRAM: 1200+ pJ/bit, and rising over time [ITRS 2009]
 4 pJ/bit (45nm LP, standby) [Barasinski et al., ESSCIRC ‘08]
 Flash, PCM, STT RAM…: Zero !
 Moving Data
 32-bit value:




18
Recompute: 60 pJ (Razor)
Send 1mm: 10 pJ
Retain in cache for 1 ms: 38 pJ
Retain in DRAM for 1 second: 32+ pJ
Reducing Memory System Energy
 Move less!
 Caches physically close to CPU
 Locality, locality, locality (the first rule of chip real estate)
 Retain less!
 Power off unused caches lines [Kaxiras et al., ISCA ‘01]
 “Drowsy” caches [Flautner et al., ISCA ‘02]
 … with compiler analysis
[Zhang et al., Trans. Emb. Comp. Sys. 4(3) 2005]
 Don’t refresh unused DRAM
 … e.g. with garbage collection [Chen et al., CODES+ISSS ‘03]
19
Extending the Memory Model
 Maintaining the illusion of a single flat memory address
space is too expensive
 On-chip caches can be major consumers of area and energy
 Coherence protocols are expensive and difficult to scale
• Alternative: software-managed memory hierarchies
– Tightly-coupled memory (TCM), scratchpads
– Do not require tag memory, address comparison logic
– More area- and energy-efficient
– Help bridge gap between bandwidth and throughput
20
New Challenges and Opportunities
 Different programming paradigm: software explicitly
orchestrates all transfers between on-chip and off-chip
memory areas
 Major implications on memory management
 Scratchpad allocation strategies
 Data partitioning strategies
 Dynamic relocation between scratchpad and DRAM to track the
program’s locality characteristics
 Opportunities for compile-time and runtime optimization
 Challenges in both Hardware and Software!
21
Qualcomm Research
Excellence in Wireless
MAY | 2012
WWW.QUALCOMM.COM/RESEARCH
State of the Art Capabilities Fostering Innovation
2323
Human Resources
Complete Development Labs
• 30% of engineers with PhD,
50% Masters
• Prototype Development Facilities
• Systems, HW, SW, Standards,
Test Engineering
• CPU Simulation Clusters
• Ventures, Bus Dev, Technical
Marketing, Program Mgmt.
• Outdoor Field Systems
• Antenna Ranges
Global Research and Development
Organization
UNITED STATES
EUROPE
ASIA
• San Diego, CA
• Cambridge, UK
• Beijing, China
• Santa Clara, CA
• Nuremberg, Germany
• Bridgewater, NJ
• Vienna, Austria
• Bangalore and
Hyderabad, India
• Seoul, S. Korea
24
Qualcomm Research & University Relations
ACADEMIC COLLABORATION TO FOSTER ADVANCED RESEARCH
RESEARCH
Ongoing relations with more than 30 US and 25 International Universities
 Current funding includes MIT, UC Berkeley, Stanford, UCSD, UT Austin, ASU,
UIUC, Univ. of Michigan, EPFL, IISc Bangalore, KAIST, Tsinghua
Research collaboration spans variety of technical areas
 Computer vision, multicore processing, context aware computing, machine
learning, low power devices,, wireless networks and signal processing, etc..
Qualcomm Innovation Fellowship (QInF) invests on innovative ideas
 Close interactions between Qualcomm Research engineers, graduate students and
professors
25
Qualcomm Research For The Wireless
Future
26
TAKE WWAN TO
THE NEXT LEVEL
INNOVATE
BEYOND WAN
ENABLE SMART
APPLICATIONS
BREAKTHROUGH
PERFORMANCE
IMPROVING WWAN
TECHNOLOGY
EXCELLING IN ALL
FORMS OF
WIRELESS
TRANSFORMING
THE MOBILE USER
EXPERIENCE
RE-ARCHITECTING
NEXT-GEN MOBILE
DEVICES
Innovate Beyond WAN
WIRELESS LOCAL AREA
PEANUT
WIFI ADVANCED
• Next gen short range
ultra-low power radio
• Multi Gbps WLAN using 5
GHz and 60 GHz band.
• Next Gen low-power WiFi
for Internet of Things
LTE D2D
(FLASHLINQ)
• Proximal Wireless
• First Gen device-todevice wireless network
• Autonomous discovery
• Direct communications
27
INNAV
• Indoor positioning for indoor
location based applications
• Map tools for Mobile
Devices
Enable Smart Applications
ELEVATE THE WIRELESS USER EXPERIENCE
AUGMENTED
REALITY
• Mobile user
interface
• Computer vision for
mobile devices
28
LOOK
• Multiple language
text detection and
recognition
• With Mobile phone
camera view finder
LISTEN
• Background Audio
processing
• Augmented user
experience
DASH
• Efficient video
delivery over
HTTP for mobile
devices
AWARE
• Build awareness
in mobile devices
• For enhanced
daily life situations
Breakthrough Device Performance
RE-ARCHITECTING NEX-GEN DEVICES
ADVANCED RADIO
TECHNOLOGIES
• New RF front-end and
baseband technologies
• Advanced mobile device
SW platforms
• RF/antenna and
systems/protocol
techniques
• Improved user
experience
• Concurrent multi-radio
operation
29
MANTICORE
GRYPHON
• Virtual machine
design for SoC
architecture
• Enabling higher power
efficiency
Thank You
Download