OVERVIEW OF OCELOT: ARCHITECTURE SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Overview GPU Ocelot overview Building, configuring, and executing Ocelot programs Ocelot Device Interface and CUDA Runtime API Ocelot PTX Internal Representation PTX Pass Manager SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 2 3 Ocelot: Multiplatform Dynamic Compilation esd.lbl.gov Data Parallel IR Language Front-End R. Domingo & D. Kaeli (NEU) Just-in-time code generation and optimization for data intensive applications • Environment for i) compiler research, ii) architecture research, and iii) productivity tools SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 3 NVIDIA’s Compute Unified Device Architecture (CUDA) Integrate the concept of a compute kernel called from standard languages Multithreaded host programs The compute kernel specifies data parallel computation as thousands of threads An accelerator model of computing Explicit functions for off-loading computation to GPUs Data movement explicitly managed by the programmer SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 4 NVIDIA’s Compute Unified Device Architecture (CUDA) Host For GPU access to CUDA tutorials http://developer.nvidia.com/cuda-education-training SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 5 Structure of a Compute Kernel Parallel Thread Execution (PTX) instruction set architecture Arrays of (data parallel) thread blocks called cooperative thread arrays (CTAs) Barrier synchronization Mapped to single instruction stream multiple data stream (SIMD) processor SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 6 NVIDIA Fermi GF 100 •4 Global Processing Clusters (GPCs) containing 4 SMs each •Each SM has 32 ALUs, 4 SFUs, and 16 LS units •Each ALU has access to 1024 32bit registers (total of 128kB per SM) •Each SM has its own Shared Memory/L1 cache (64kB total) ALU •Unified L2 cache (768kB) Streaming multiprocessor (SM) •Six 64bit Memory Controllers (total 384bit wide) SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 7 Ocelot Structure1 PTX Kernel CUDA Application nvcc Ocelot is built with nvcc and the LLVM backend Structured around a PTX IR LLVM IR Translator Compile stock CUDA applications without modification Diamos, A. Kerr, S. Yalamanchili, and N. Clark, “Ocelot: A Dynamic Optimizing Compiler for Bulk Synchronous Applications in Heterogeneous Systems,” PACT, September 2010. . 1G. SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 8 CUDA to PTX PTX modules stored as string literals in fat binary We ignore accompanying binary image (GPU native binary) SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 9 Overview GPU Ocelot overview Building, configuring, and executing Ocelot programs Ocelot Device Interface and CUDA Runtime API Ocelot PTX Internal Representation PTX Pass Manager SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 10 Dependencies Software C++ Compiler (GCC 4.5.x) Lex Lexer Generator (Flex 2.5.35) YACC Parser Generator (Bison 2.4.1) Scons (Python 2.7) LLVM (3.1) Libraries boost_system (1.46) boost_filesystem (1.46) boost_serialization (1.46) GLEW (optional for GL interop) (1.5) GL (for NVIDIA GPU Devices) Library headers Boost (1.46) http://code.google.com/p/gpuocelot/wiki/Installation SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 11 Ocelot Source Code • Freely available via Google Code project site (New BSD License) http://code.google.com/p/gpuocelot/ • ocelot/ • analysis/ • api/ • cuda/ • executive/ • ir/ • parser/ • tools/ • trace/ • translator/ • transforms/ -- analysis passes -- Ocelot-specific API extensions -- implements CUDA runtime -- Device interface and backend implementations -- internal representations (PTX, LLVM, AMD IL) -- parser (to PTX) -- standalone applications using Ocelot -- trace generation and analysis tools -- translators from PTX to LLVM and AMD IL -- program transformations svn checkout http://gpuocelot.googlecode.com/svn/trunk/ gpuocelot-read-only SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 12 Building GPU Ocelot Obtain source code Compile with Scons sudo ./build.py –install Build and execute unit tests svn checkout http://gpuocelot.googlecode.com/svn/trunk/ gpuocelot-read-only sudo ./build.py –test=full Output appears in .release_build libocelot.so OcelotConfig Tests Installation directory: /usr/local/include/ocelot /usr/local/lib http://code.google.com/p/gpuocelot/wiki/Installation SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 13 14 Configuring Ocelot Controls Ocelot’s initial state Located in application’s startup directory trace: { configure.ocelot memoryChecker: { enabled: true, checkInitialization: false trace specifies which trace generators are initially attached }, raceDetector: { executive controls device properties enabled: false, ignoreIrrelevantWrites: true trace: memoryChecker – ensures raceDetector - enforces synchronized access to .shared debugger - interactive debugger }, debugger: { enabled: false, kernelFilter: "_Z13scalarProdGPUPfS_S_ii", executive: devices: List of Ocelot backend devices that are enabled nvidia - NVIDIA GPU backend emulated – Ocelot PTX emulator (trace generators) llvm – efficient execution of PTX on multicore CPU amd – translation to AMD IL for PTX on AMD RADEON GPU SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY alwaysAttach: true }, }, executive: { devices: [ "emulated" ], } } 14 Building and Executing CUDA Programs nvcc -c example.cu -arch sm_23 g++ -o example example.o `OcelotConfig -l` `OcelotConfig -l` expands to ‘-locelot’ libocelot.so replaces libcudart.so SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 15 Overview GPU Ocelot overview Building, Ocelot Ocelot PTX configuring, and executing Ocelot programs Device Interface and CUDA Runtime API PTX Internal Representation Pass Manager SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 16 CUDA Runtime API Ocelot implements CUDA Runtime API Transparent hooks into existing CUDA applications override methods of cuda::CudaDeviceInterface Maps CUDA RT onto Ocelot device interface abstraction cuda::CudaRuntime Extended through custom Ocelot API e.g. ocelot::registerPTXModule( ); SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 17 18 Ocelot CUDA Runtime Overview A reimplementation of the CUDA Runtime API Compatible with existing applications Link against libocelot.so instead of libcudart R. Domingo & D. Kaeli (NEU) Kernels execute anywhere Key to portability! SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 18 Ocelot CUDA Runtime Clean device abstraction All back-ends implement same interface Ocelot API Extensions Add/remove trace generators Compile/launch kernels directly in PTX Device memory sharing among host threads Device switching SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 19 Ocelot Source Code: CUDA Runtime API • ocelot/ • analysis/ • api/ • cuda/ • • • • • • • • • • • • -- analysis passes -- Ocelot-specific API extensions -- implements CUDA runtime interface/CudaRuntimeInterface.h interface/CudaRuntime.h interface/CudaRuntimeContext.h interface/FatBinaryContext.h interface/CudaDriverFrontend.h executive/ ir/ parser/ tools/ trace/ translator/ transforms/ -- Device interface and backend implementations -- internal representations (PTX, LLVM, AMD IL) -- parser (to PTX) -- standalone applications using Ocelot -- trace generation and analysis tools -- translators from PTX to LLVM and AMD IL -- program transformations SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 20 Ocelot CUDA Runtime API Implementation Implement interface defined by cuda::CudaRuntimeInterface ocelot/cuda/interface/CudaRuntime.h ocelot/cuda/implementation/CudaRuntime.cpp class cuda::CudaRuntime cuda::CudaRuntime members Host thread contexts Ocelot devices Registered modules, textures, kernels Fat binaries Global mutex CUDA Runtime API functions eg. cudaMemcpy, cudaLaunch, __cudaRegisterModule(), Additional functions eg. _lock(), _unlock(), _registerModule() SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 21 Ocelot Source Code: Device Interface • ocelot/ • executive/ • • • • • -- Device interface and backend implementations interface/Device.h interface/EmulatorDevice.h interface/NVIDIAGPUDevice.h interface/MulticoreCPUDevice.h interface/ATIGPUDevice.h SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 22 Ocelot Device Interface class executive::Device Succinct interface for device objects Module registration Memory management Kernel configuration and launching Global variable and texture management OpenGL interoperability Streams and Events Trace generators Minimal set of APIs for device-oriented programming model Capture device state: Memory allocations, global variables, textures, graphics interoperability Facilitate creation of backend execution targets 57 functions (versus CUDA Runtime’s 120+) Implement Device interface Enable multiple API front ends Implement front ends targeting Device interface SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 23 Overview GPU Ocelot overview Building, Ocelot Device Interface and CUDA Runtime API Ocelot PTX configuring, and executing Ocelot programs PTX Internal Representation Pass Manager SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 24 Ocelot PTX Intermediate Representation (IR) Backend compiler framework for PTX Full-featured PTX IR Class hierarchy for PTX instructions/directives PTX control flow graph Static single-assignment form Dataflow/dominance analysis Enables PTX optimization PTX Kernel IR to IR translation From PTX to other IRs LLVM (x86/PowerPC/ARM) CAL (AMD GPUs) SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 25 Ocelot Source Code: Intermediate Representation • ocelot/ • ir/ • • • • • • • -- internal representations (PTX, LLVM, AMD IL) interface/Module.h interface/PTXInstruction.h interface/PTXOperand.h interface/PTXKernel.h interface/ControlFlowGraph.h interface/ILInstruction.h interface/LLVMInstruction.h • parser/ -- parser (to PTX) • interface/PTXParser.h SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 26 Ocelot PTX Internal Representation C++ classes representing PTX module ir::PTXModule ir::PTXKernel ir::PTXInstruction ir::PTXOperand ir::GlobalVariable ir::LocalVariable ir::Parameter Ocelot PTX Parser target, Emitter source ir::PTXInstruction::valid( ) Translator source PTX to LLVM PTX to AMD IL Suitable for analysis and transformation Executable representation PTX Emulator SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 27 Ocelot PTX IR: Kernels ir::Module .global .f32 globalVariable; ir::Global ir::Kernel .entry sequence ( .param .u64 __cudaparm_sequence_A, .param .s32 __cudaparm_sequence_N) { .reg .u32 %r<11>; .reg .u64 %rd<6>; .local u32 %rp0; ir::Local ... ... ir::Parameter ir::BasicBlock $LDWbegin_sequence: ld.param.s32 %r6, [__cudaparm_sequence_N]; setp.le.s32 %p1, %r6, %r5; @%p1 bra $Lt_0_1026; ... ... $Lt_0_1026: exit; $LDWend_sequence: } // sequence SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 28 Ocelot PTX IR: Instructions ir::BasicBlock ir::PTXInstruction add.s32 %r7, %r5, 1; ir::PTXOperand ld .param .u64 %rd1, [__cudaparm_sequence_A]; addressMode: address opcode addressSpace dataType d a addressMode: register cvt.s64.s32 %rd2, %r5; mul.wide.s32 %rd3, %r5, 4; add.u64 %rd4, %rd1, %rd3; st .global .s32 [ %rd4 + 0 ], %r7; addressMode: indirect $Lt_0_6146; addressMode: label addressMode: immediate Guard predicate @%p1 bra SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 29 Control and Data-Flow Graphs • Data structure for representing kernels • Basic blocks • fall-through and branch edges • instruction vector • label • Traversals: • pre-order, topological, post-order • iterator visits blocks • Data-flow graph overlays CFG • definition-use chains explicit • to and from SSA form • CFG Transformations: • split blocks, edges • DFG Transformations: • insert and remove values • iterate over def-use SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 30 Example: Control-Flow Graphs // example: splits basic blocks containing barriers // for (ir::ControlFlowGraph::iterator bb_it = kernel->cfg()->begin(); bb_it != kernel->cfg()->end(); ++bb_it) { // iterate over basic blocks unsigned int n = 0; ir::BasicBlock::InstructionList::iterator inst_it; for (inst_it = (bb_it)->instructions.begin(); inst_it != (bb_it)->instructions.end(); ++inst_it, n++) { // iterate over instructions in *bb_it const ir::PTXInstruction *inst = static_cast< const ir::PTXInstruction *>(*inst_it); if (inst->opcode == ir::PTXInstruction::Bar) { if (n + 1 < (unsigned int)(bb_it)->instructions.size()) { std::string label = (bb_it)->label + "_bar"; kernel->cfg()->split_block(bb_it, n+1, ir::BasicBlock::Edge::FallThrough, label); } break; // split block containing bar.sync // so that it’s always the last // instruction in a block } } // end for (inst_it) } // end for (bb_it) SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 31 Example: Spilling Live Values // ocelot/analysis/implementation/RemoveBarrierPass.cpp // void RemoveBarrierPass::_addSpillCode( DataflowGraph::iterator block, const DataflowGraph::Block::RegisterSet& alive ) { unsigned int bytes = 0; ir::PTXInstruction move ( ir::PTXInstruction::Mov ); move.type = ir::PTXOperand::u64; move.a.identifier = "__ocelot_remove_barrier_pass_stack"; move.a.addressMode = ir::PTXOperand::Address; move.a.type = ir::PTXOperand::u64; move.d.reg = _kernel->dfg()->newRegister(); move.d.addressMode = ir::PTXOperand::Register; move.d.type = ir::PTXOperand::u64; _kernel->dfg()->insert( block, move, block->instructions().size() - 1 ); ... SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Example: Spilling Live Values ... for( DataflowGraph::Block::RegisterSet::const_iterator reg = alive.begin(); reg != alive.end(); ++reg ) { ir::PTXInstruction save( ir::PTXInstruction::St ); save.type = reg->type; save.addressSpace = ir::PTXInstruction::Local; save.d.addressMode = ir::PTXOperand::Indirect; save.d.reg = move.d.reg; save.d.type = ir::PTXOperand::u64; save.d.offset = bytes; bytes += ir::PTXOperand::bytes( save.type ); save.a.addressMode = ir::PTXOperand::Register; save.a.type = reg->type; save.a.reg = reg->id; _kernel->dfg()->insert( block, save, block->instructions().size() - 1 ); } _spillBytes = std::max( bytes, _spillBytes ); } SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY IR for AMD and LLVM LLVM IR • • • AMD Backend: R. Domingo & D. Kaeli (NEU) Implements all of the LLVM instruction set Decouples translator with LLVM project Easier to construct than LLVM’s actual IR AMD IL • Supports translation from PTX to AMD interface Emitters construct parseable string representations of modules SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 34 Overview GPU Ocelot overview Building, configuring, and executing Ocelot programs Ocelot Device Interface and CUDA Runtime API Ocelot PTX Internal Representation PTX Pass Manager SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 35 PTX PassManager Orchestrates analysis and transformation passes Derived from LLVM model Analysis Passes generate meta-data Meta-data consumed by transformations Transformation Passes modify the IR SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 36 Using the Pass Manager Passes added to a manager Schedules execution Manages analysis meta-data Ensures meta-data available Up to date; not redundantly computed SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 37 Analysis Passes Analysis runs over the PTX IR Generates meta-data Modifies PTX IR Possibly updates or invalidates existing meta-data Examples Data-flow graph Dominator and Post-dominator trees Thread frontiers SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 38 Analysis Passes – Supported Analaysis Structures Control Flow Graph Data Flow Graph analysis/interface/DominatorTree.h analysis/interface/PostDominatorTree.h Superblock Analysis analysis/interface/SuperblockAnalysis.h Divergence Graph analysis/interface/DataflowGraph.h Dominator and Post-Dominator Trees ir/interface/ControlFlowGraph.h analysis/interface/DivergenceGraph.h Thread Frontiers analysis/interface/ThreadFrontiers.h SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 39 Transformation Passes Modify the PTX IR Consume meta-data Examples: Dead-code elimination Control-flow structuring transforms/interface/StructuralTransform.h Sync elimination transforms/interface/DeadCodeEliminationPass.h transforms/interface/SyncElimination.h Dynamic instrumentation SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 40 Example: Dead Code Elimination Transformation Pass SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 41 Dead Code Elimination Approach Run once on each kernel Consume data-flow analysis meta-data Delete instructions producing values with no users Implementation transforms/interface/DeadCodeEliminationPass.h transforms/implementation/DeadCodeEliminationPass.cpp SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 42 Dead Code Elimination (1 of 5) Setup pass dependencies DeadCodeEliminationPass::DeadCodeEliminationPass() : KernelPass(Analysis::DataflowGraphAnalysis | Analysis::StaticSingleAssignment, "DeadCodeEliminationPass") { } SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 43 Dead Code Elimination (2 of 5) Run pass void DeadCodeEliminationPass::runOnKernel(ir::IRKernel& k) { Get analysis metadata Analysis* dfgAnalysis = getAnalysis(Analysis::DataflowGraphAnalysis); assert(dfgAnalysis != 0); // cast up analysis::DataflowGraph& dfg = *static_cast<analysis::DataflowGraph*>(dfgAnalysis); assert(dfg.ssa()); SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 44 Dead Code Elimination (3 of 5) Loop until change BlockSet blocks; for (iterator block = dfg.begin(); block != dfg.end(); ++block) { report(" Queueing up BB_" << block->id()); blocks.insert(block); } while(!blocks.empty()) { iterator block = *blocks.begin(); blocks.erase(blocks.begin()); eliminateDeadInstructions(dfg, blocks, block); } SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 45 Dead Code Elimination (4 of 5) Remove unused live-out values AliveKillList aliveOutKillList; for (RegisterSet::iterator aliveOut = block->aliveOut().begin(); aliveOut != block->aliveOut().end(); ++aliveOut) { if (canRemoveAliveOut(dfg, block, *aliveOut)) { report(" removed " << aliveOut->id); aliveOutKillList.push_back(aliveOut); } } for (AliveKillList::iterator killed = aliveOutKillList.begin(); killed != aliveOutKillList.end(); ++killed) { block->aliveOut().erase(*killed); } SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 46 Dead Code Elimination (5 of 5) Check if an instruction can be removed if (ptx.hasSideEffects()) return false; for (RegisterPointerVector::iterator reg = instruction->d.begin(); reg != instruction->d.end(); ++reg) { // the reg is alive outside the block if (block->aliveOut().count(*reg) != 0) return false; InstructionVector::iterator next = instruction; for (++next; next != block->instructions().end(); ++next) { for (RegisterPointerVector::iterator source = next->s.begin(); source != next->s.end(); ++source) { // found a user in the block if (*source->pointer == *reg->pointer) return false; } } } SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 47 Dead Code Elimination Repeat for phi instructions Other instructions alive-in values Ensures meta-data is valid SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 48 Running Passes on PTX Static optimizer PTXOptimizer Runs passes on PTX assembly files ocelot/tools/PTXOptimizer.cpp JIT optimization Runs passes before kernels are launched ocelot/api/implementation/OcelotRuntime.cpp SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 49 Questions GPU Ocelot Google Code site: Research Project site: http://gpuocelot.gatech.edu Mailing list: gpuocelot@googlegroups.com Contributors http://code.google.com/p/gpuocelot Gregory Diamos, Rodrigo Dominguez, Naila Farooqui, Andrew Kerr, Ashwin Lele, Si Li, Tri Pho, Jin Wang, Haicheng Wu, Sudhakar Yalamanchili Sponsors AMD, IBM, Intel, LogicBlox, NSF, NVIDIA SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 50