Advanced Processor Technologies group overview 1 APT mission “To explore novel architectures and techniques that will enable the effective exploitation of the billion transistor chips of the near-future” 2 APT group • Focus: – Moore’s Law will soon deliver billion transistor chips – how do we make best use of a billion transistors? • parallel processing • systems-on-chip • novel architectures • …? 3 Strategy/Vision • Industry shift to multicore processors – directly addressed by our CMP work • Power/heat is performance-limiting – asynchronous and low-power design have growing importance • Timing closure is a critical problem – acceptance of mixed timing and GALS • Design automation is vital – async automation must be competitive 4 Strategy/Vision • Can university groups design state- of-the-art digital silicon? – probably not in conventional processors – few academic groups still fab digital chips • Is trying to take designs through to fabrication still a good idea? – we believe so, because ‘reality’ matters! – but the game is very tough indeed 5 Many-core Architecture and Software Mikel Lujan 6 Buying a single-core processor is difficult! Multi-cores bring fundamental changes for Computer Science [applications, programming languages, compilers runtime systems (OS), computer architecture] 7 Active projects • Managed Runtime Environments and Low-Power Many-core Architectures – DOME Delaying and Overcoming Microprocessor Errors • Teraflux – On the search for a “good” parallel computational model • AXLE – Accelerating Analytics of Big Data 8 Managed Runtime Enviroments • Java, .Net are examples of managed runtime environments (JVM, CLR) • Key elements: JIT compilation and control of memory allocation • Research opportunities: – Scaling MREs for many-core architectures (GPUs) – Hardware acceleration of MREs – Use MREs for low-power computing – Use MREs for dealing with faults and transistor wearout -> DOME 9 TeraFlux Project • Major focus of current ‘General Purpose’ Many-Core research. • Three major goals – To define the hardware architecture of a highly extensible, general purpose multicore system – To develop a simple to use parallel programming approach based on programming with • side-effect-free computations + transactions – How do we simulate/prototype many- cores architectures? 10 Starting Assumptions • Requiring strongly consistent shared memory is a major impediment to extensibility • The efficient scheduling of controlflow based threads is hard • The major complexity in parallel programming is the handling of shared state (locks etc.) 11 Simulate/Prototype many-core architectures • Designing a chip is expensive and time consuming • Computer architects build software models to simulate new architectures • Simulation can be slow (months to run one application) • How we can accelerate this process? Research opportunities – New modelling techniques – FPGA prototyping 12 AXLE & Big Data • Collaboration with Dr. Gavin Brown (MLO group) • Amount of data generated in scientific experiments or social web keeps growing! • Graph-based data -> complex computation • How can we make sense of this data deluge? – New Learning techniques capable of working at scale – Redesign architectures (clusters/data centres) and software for low power analytics – Accelerate software (JIT adaptation) for data processing – Hardware acceleration for low-power learning algorithms 13 For more background info • "Future Multi-core Computing" (COMP6062b) – Learn by directed reading and group discussions of research papers – Practice parallel programming in the labs • Watch out for the organised ARM & Intel school seminars in Nov and Dec 14 Communication Architectures Javier Navaridas 15 Interconnection Networks • On-chip networks – Tile-based systems – Heterogeneous systems • High performance computing networks – Massively Parallel Processing systems – Compute Clusters – Datacentres 16 Topics • Topologies – Routing – Wiring – Fault resilience – Deadlock avoidance • Router microarchitecture – Congestion control – Quality of Service – Fault tolerance • Scheduling and resource management – Task placement • System and workload modelling – Analytical modelling – Simulation 17 Virtualization Alasdair Rawsthorne 18 Unifying System and Process Virtualization Application Application Application Application Operating System Operating System Dynamic Runtime Operating System CPU Hypervisor/VM M Operating System Optimizing VMM CPU CPU CPU System Virtualization (eg Xen, Vmware, VirtualBox) Process Virtualization (eg JVM, Rosetta, DynamoRIO, ValGrind) Unified Virtualization Unvirtualized • Potential benefits: performance, power, design time, security • Impacts design of future compilers, OS, CPU and runtimes alasdair.rawsthorne@manchester.ac.uk 19 Neural Systems Engineering Steve Furber, Jim Garside, Dave Lester 20 The SpiNNaker project • Multi-core CPU node – 18 ARM968 processors – to model large-scale systems of spiking neurons – in biological real time • Scalable up to systems with 10,000s of nodes – over a million processors – >108 MIPS total 21 Current status… • Full 18-core chip: arrived 20 May 2011 • Test card: 4 chips, 72 processors – Cards can be linked together • Neuron models: LIF, Izhikevich, MLP • Synapse models: STDP, NMDA • Networks: PyNN -> SpiNNaker, various small tools to build Router tables, etc • 48-chip 103 machine …and the next steps: • 500-chip 104 machine (Q4 2012), 5,000-chip 105 machine (H1 2013), 50,000-chip 106 machine (H2 2013). 22 PhD projects • Recent: – SpiNNaker monitoring – PyNN -> SpiNNaker – Real-time neural learning algorithms – Modelling the rat barrel cortex – Technology scaling on SpiNNaker – Error correction with CRC 23 Technology Scaling • 90nm SpiNNaker CPU node SP library is faster • • • requires 128k DTCM LL library better overall? (work by Eustace Painkras, UoM PhD) 24 PyNN -> SpiNN • LIF • Izhikevich 25 PhD projects • Future: – System software • run-time fault-tolerance, scaling, … – SpiNNaker2 architecture exploration – Neural network models • learning algorithms, rewiring – Robotics using SpiNNaker – Non-neural algorithms • graphics, physics modelling, … 26 Emerging Technologies for Integrated Circuits and Systems Let’s do some hard(ware) work Vasilis Pavlidis www.cs.man.ac.uk/~pavlidiv 27 3-D Integration Opportunities 2-D global wire of 20 mm 3-D global wire of 12 mm • Integrate disparate • The same total area for the two circuits • RTSV = 170 mΩ, CTSV = 2 fF • *RCs for 65 nm, Del. Impr: 54% technologies/components 28 * “ASU Predictive Technology Model.” [Online]. Available: http://www.eas.asu.edu/~ptm/ 28 Three-Dimensional (3-D) Integrated Circuits and Systems • Develop design methodologies for 3-D ICs • New models are required to consider the third physical dimension • Diverse technologies – SiP, interposer, TSVs • Many challenges exist down the road!!! – Be the first to address them • Opportunities to tape-out do exist! – CMP/Tezzaron - cmp.imag.fr Xilinx FPGA Virtex 7 – Cadence PDK - 3-D Encounter 29 A New Circuit Design Paradigm (Safe Projects ) • (Re-)Design and assess SpiNNaker-based 3-D architectures – Power, area, performance, cost/yield – Interposer and TSVs technologies • Research methodology – Use available resources – Differentiate only where required • Other topics – Can resonance improve energy efficiency of GALS based architectures? – Design for manufacturability for GALS systems 2-D/3-D • Considering process, voltage, and temperature (PVT) variations • PVT behavior is substantially different in 3-D systems Develop/extend CAD tools for the physical design of 3D systems – Special focus on interposer technologies 30 3-D Integration as a System Integration Approach (High-Return Projects) • Heterogeneous 3-D integration – Preached a lot but not explored (at all)! • Memory on logic is a single application • Develop techniques and methods for “Mix-and-Match” systems – How do you model…? – How do evaluate…? – How do you integrate…? – How do you manufacture…? • The physical proximity of diverse systems may not come for free! Interdisciplinary research is a prerequisite for such systems Rather application driven 31 31 PhD Guidelines PhD is NOT an end in itself but a means to end! • Persistence, Persistence, • • • • Persistence! Manage rejection Be there early! Citations value more than publications Presentation and writing skills 32 32 Asynchronous Logic Design Tools [Doug Edwards,] Jim Garside, Steve Furber, Alasdair Rawsthorne 33 Previous Projects • Balsa – world-leading public asynchronous synthesis tool – used for complete microprocessors • SEDATE – delay Insensitive datapath synthesis • GALSA – framework for heterogeneous GALS • ... 34 GAELS • Globally Asynchronous Elastic Logic Synthesis – modern SoCs comprise numerous, semi-autonomous subsystems – shrinking transistors have hard-topredict variations • Address using Elastic Logic – new, delay tolerant paradigm – new project! 35 Reconfigurable Processing Jim Garside 36 Current Computing • Energy use is a problem • Software – offers processing flexibility – highly inefficient – big overheads • Hardware – limited programmability – greater efficiency – expensive to develop 37 A Solution? • Compile an algorithm into a mixture of hardware and software – how to partition the 'code'? – dynamic adaptation • Existing solutions tend towards static partitioning – require wide skills from developers – sacrifice potential flexibility – intolerant of differing hardware 38 Dynamic Reconfiguration • Keep algorithm in common 'object' format • Identify, 'compile' and run repeating sections in available hardware • Adapt to facilities of any given chip – allow for future portability 39 To date ... • Can identify critical loops and recompile them to hardware – using pre-existing code • Developing tool flow • Have reasonable reconfigurable hardware architecture Results • Promising – not 'earth shattering' 40 Future • Want: • Means of expressing algorithms allowing easy compilation into software or hardware • Extract/exploit sensible parallelism – 'fine grain' for hardware – 'coarse grain' (?) for software • Get (some of) the available speed/power efficiency 41 Mobile Systems Architecture Nick Filer with help from Barry Cheetham 42 Nick Filer • Interests: – Wireless networks of all types. Mainly: • Ad-hoc, • Voice over IP, • Sensors (data collection) , • Pocket networks (e.g. mobile phones, PDAs), • Information dissemination. – Supported by: • Simulation, analysis, software generation tools. – eLearning tools for science. 43 Current Interest - 1 • Pocket Networks – Based on clusters of mobile users. – Person to person transport. – What applications are useful, will work, when and how will applications work? • Voice? • Video? • Delay tolerant text messages? 44 Current Interest - 2 • Low power Wireless Sensor Networks – Algorithms for reduced power usage, mainly getting it low by design. – Intelligent transport/routing protocols driving low power packet routing. – Smart dust: • Current cost $100+, needs to be cheaper. • Ultra-low power (NEW): processor, memory, design. • Nano scale. E.g. for use down oil wells! 45 Current Interest – 3 • Hand-over in mobile wireless networks. – Pretty much solved problem (even if not always ideal) for mobile phones. – Close to solutions for WiFi, WiMAX, Bluetooth, Zigbee etc. Still lots to learn though. – Currently 3 layer hierarchy – infrastructure Wide Area Personal Area. – What happens with more layers? • Macro scale to nano scale? • Fixed infrastructure interacting with mobile autonomous agents? • Just how inefficient are these mechanisms currently? 46 Current Interest - 4 • Information dissemination in mobile ad-hoc networks. – P2P technologies. – P2P optimization for task, availability, handover, low energy, access latency… – P2P to aid DNS like queries (information retrieval) in mobile, changing topology networks. – Delay tolerant P2P. Opportunistic communications e.g. send 100,000 sensors down an oil well, get 1 back, what does it know? Own data, others data? 47 Joint with Barry Cheetham Current Interest - 5 • Real time distributed systems (sound and video) – Internet choir • Very tight audio constraints (max 50ms) • Demands of latency & bandwidth – Singing together • Less constrained internet choir but synchronization very difficult. – Broadcast simulcasts • Mixed video and sound from various locations. • Broadcast over multiple media types with different delay etc. characteristics. – Major Obstacles: • Media types and standards, protocols, congestion, error handling, signal processing, links to hand-over problems .... 48 Current Interest - 6 • Support for adaptable network stacks – Writing or changing software is time consuming, error prone, … – Models can capture semantics of software: Purpose, usage, transformation knowledge ... – Hence: Use models to generate implementations. • Use in teaching/learning, simulation, network stack implementation. – Support for adaptable network stacks 49 Joint with Barry Cheetham Current Interest – 7 • eLearning for Complex Systems – Most eLearning tools you have seen are not much more Content Management Systems. – There is currently little or no evidence they improve student grades! – We have on-going work looking at improving understanding of wireless systems. – Also, interested in science teaching for awkward adolescents. 50 Arithmetic and Control Theory Dave Lester 51 Arithmetic and Control Theory • Exact Arithmetic – NASA/Boeing • Correctness of Control Theory Applications – Airbus • Formalisation and Mechanisation of Probabilistic Reasoning 52