ECE 734 Project Proposal ECE 734 Project Proposal: "Implementation and Characterization of Vector Optimization on 3-D Game Engine using SIMD Instructions" Ho-Seop Kim (hskim@ece.wisc.edu) 1.Introduction We, computer engineers, are facing unprecedented challenge nowadays. New types of applications demand more computing power than before, and more importantly, at least some of them can not be efficiently handled by conventional general purpose microprocessors. That's primarily because today's generalpurpose microprocessors are optimized to squeeze whatever little ILP (instruction-level parallelism) is there from programs with irregular control flow and little data parallelism. That is, they're not designed to handle inherently parallel problems well. Unfortunately many new applications fall into the later category and especially 3-D games are among the most important and performance-demanding applications nowadays. To exploit inherent parallelism in game programs (especially in 3-D graphics part), many new and/or rehashed architectural ideas are emerging in microprocessors in both console game machines and PCs. Among the newly introduced architectural tricks including VLIW (Very Long Instruction Word), vector computing seems be the most dominant and gaining biggest support from developer community. [Kunimatsu 2000] [Diefendorff 1998] In this project, we want to learn what vector computing is like in today's real-world microprocessors. It is our belief that by doing hands-on program optimization based on SIMD (Single Instruction Multiple Data) instructions provided by today's microprocessors we can understand what is needed and what can be done for one of the most important new applications -- 3-D games. There are certain implementation specific scheduling limits using these SIMD instructions and understanding this problem is also a very important part of this project. 2.How to do it? To understand the effect of SIMD paradigm on application performance, it would be the best if we could design the SIMD ISA from scratch, implement it on hardware, create SIMD-optimized programs (hopefully using vectorizing compiler) and run them on the hardware. Of course such a project wouldn't fit into the given course schedule. To get reasonable results within given project time frame, we suggest following: Use widely available, predefined SIMD ISA (Instruction Set Architecture): Use Intel's SSE (Streaming SIMD Extensions). Not surprisingly, they provides the most comprehensive development support. Game consoles were excluded simply because it is hard to find development and test platform in academic environment. Use open source 3-D game engine: Right now we're considering "Crystal Space". It provides reasonably complete API and there are many open source 3-D game projects are going on using it. More importantly, there is virtually no MMX/SSE optimization in it. [Crystal Space 2000] Implement SSE optimization on only selected performance-critical part of the game engine: Ho-Seop Kim 1/2 ECE 734 Project Proposal Although "back-end" of 3-D pipeline (in software sense), i.e., rendering is usually done by 3-D accelerator hardware found in most PCs, "front-end", i.e., coordinate transformation and lighting is still done by host CPU. [Glaskowsky 1999] We want to limit our work to the T&L part of the game engine. Try both high-level and low-level optimizations: Intel provides "wrapper" C++ class library to reduce the pain in developing MMX/SSE specific optimizations. We can also do inline assembly programming. [Intel 1999, 1] One important thing here is, you need to take good care of SSE instruction scheduling since Pentium III implementation of SSE is not totally general. [Intel 1999, 2] For example, it only allows two SSE fp muliplications in a single cycle even though SSE fp multiply instruction works on 4 pairs of single precision fp numbers at a time. Measuring the performance enhancement: We're planning to write small "kernel" programs to wrap the target 3-D APIs and compare the performance between generic version and SSEoptimized version. We also plan to measure final game performance by observing FPS (frames per second) using bundled test game program "walktest". It is not by any means a complete game program but it does let you "walk" through small virtual 3-D world and displays final FPS number on screen. 3.Summary To learn about one of today's challenges in high performance computing, we suggest doing a class project writing SIMD optimizations on an established 3-D game engine. More specifically, we will optimized T&L part of open source game engine using Intel SSE and measure the performance improvement on both kernel programs and rather complete game program. We hope to learn both the big picture and subtle details involved in SIMD type programming which could be essential to tomorrow's microarchitects. There are ugly implementation subtleties in today's SIMD implementations and we want to get hands-on experience on it too. 4.Reference [Glaskowsky 1999] Peter N. Glaskowsky, "A Concise Review of 3D Technology" Microprocessor Report vol. 12, no. 13, Jun 21, 1999 [Diefendorff 1998] Keith Diefendorff, "Katmai enhances MMX" Microprocessor Report vol. 12, no. 13, Oct 5, 1998 [Intel 1999, 1] Intel Application Note, "Software Development Strategies for SSE" version 2.1, Jan 1999 [Intel 1999, 2] Intel Application Note, "SSE -- 3-D Transformation" verion 1.3, Jan 1999 [Kunimatsu 2000] Atsushi Kunimatsu et al, "Vector Unit Architecture for Emotion Synthesis" IEEE Micro vol. 20 no. 2, pp. 40-47, Mar/Apr 2000 [Crystal Space 2000] HTML documentation in Crystal Space 17.001 distribution. Can also be found at: http://crystal.linuxgames.com Ho-Seop Kim 2/2