Dynamic Optimization for Interactive Computing Systems Parallel Computing Laboratory Sarah Bird February 23, 2012 Multicore Revolution • Parallel Computing is becoming ubiquitous – Only way forward for computing industry (unless you don’t care if your apps never run faster than in 2008) – Unfortunately, parallel programming is (still) harder than sequential programming Harness the power of parallelism for client applications Bridging the Gap Parallel Applications Parallel Hardware IT industry Parallel Software Users Krste Asanovic, Ras Bodik, Jim Demmel, Armando Fox, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, Dave Patterson, Koushik Sen, John Wawrzynek, David Wessel, and Kathy Yelick Pediatric MRI Typical exam ~ 1 hour Motion blurs the images Scanner is a small loud tunnel Difficult for children to stay still! Traditional Solution: Anesthesia Compressed Sensing reduces each scan to 15 seconds Takes too long to reconstruct image ~ Hours Compressed Sensing for Pediatric MRI Image reconstruction from 1-2 hours down to < 1 min PACORA How do I guarantee interactivity on my multicore device when it’s running a bunch of apps? Speech Decoder Apps don’t miss deadlines Turn off unnecessary resources Developers don’t need to understand hardware Penalty s = slope d Service Requirement Runtime Runtime OS Resource Allocation Framework t (w, b) = R 0 + å w i, j i, j bi * b j Flawless user experience while maximizing battery life! More Great ParLab Research Communication-Avoiding Linear Algebra • Order of magnitude speedups over optimized code • 8.8x faster than Intel’s MKL App Dense Sparse Multicore GPU Music Application Research • New user interfaces with pressure-sensitive multitouch gestural interfaces ParLab SEJITS project: Selected Embedded Just-in-Time Specialization Asp: “Asp is SEJITS in Python” general specializer framework Performance of highly optimized C with the productivity of Python! Parallel Computing Laboratory • User-centric research agenda • Better user-interface programming across diversity of devices • Data capsules for secure data access • Heterogeneity to improve performance and reduce energy • Dynamic client+cloud partitioning to improve efficiency Future of Personal Computing Join us at ParLab for Lunch! 5th Floor Soda Hall A RealTime, Parallel GUI Service in Tessellation ManyCore OS Synthesizing a Parallel Web Browser Layout Engine An Automatic Parallelizing and Vectorizing Compiler for Python Loop-Nests Enabling Specialization via MapReduce Accelerating Graph Algorithms by Software Optimization & Hardware Modification Characterizing Memory Hierarchies of Multicore Processors Using Microbenchmarks Garbage Collection on GPUs Debugging SEJITS Hardware Communication Channels for Quality-of-Service Enforcement OLOV: OpenCL for OpenCV Megh: A Cloud Backed File System Parallelizing Machine Translation Training Pipeline with Hadoop CDT: An interactive compiler translation debugger for SEJITS specializers PACORA: Performance-Aware Convex Optimization for Resource Allocation pOSKI Project Updates SEJITS in the Cloud Communication Costs of LU Decomposition Algorithms for Banded Matrices