Abstract

advertisement
ECE 734 Project Proposal
ECE 734 Project Proposal:
"Implementation and Characterization of Vector Optimization on 3-D Game Engine
using SIMD Instructions"
Ho-Seop Kim (hskim@ece.wisc.edu)
1.Introduction
We, computer engineers, are facing unprecedented challenge nowadays. New types of applications demand
more computing power than before, and more importantly, at least some of them can not be efficiently
handled by conventional general purpose microprocessors. That's primarily because today's generalpurpose microprocessors are optimized to squeeze whatever little ILP (instruction-level parallelism) is
there from programs with irregular control flow and little data parallelism. That is, they're not designed to
handle inherently parallel problems well.
Unfortunately many new applications fall into the later category and especially 3-D games are among the
most important and performance-demanding applications nowadays. To exploit inherent parallelism in
game programs (especially in 3-D graphics part), many new and/or rehashed architectural ideas are
emerging in microprocessors in both console game machines and PCs. Among the newly introduced
architectural tricks including VLIW (Very Long Instruction Word), vector computing seems be the most
dominant and gaining biggest support from developer community. [Kunimatsu 2000] [Diefendorff 1998]
In this project, we want to learn what vector computing is like in today's real-world microprocessors. It is
our belief that by doing hands-on program optimization based on SIMD (Single Instruction Multiple Data)
instructions provided by today's microprocessors we can understand what is needed and what can be done
for one of the most important new applications -- 3-D games. There are certain implementation specific
scheduling limits using these SIMD instructions and understanding this problem is also a very important
part of this project.
2.How to do it?
To understand the effect of SIMD paradigm on application performance, it would be the best if we could
design the SIMD ISA from scratch, implement it on hardware, create SIMD-optimized programs (hopefully
using vectorizing compiler) and run them on the hardware. Of course such a project wouldn't fit into the
given course schedule.
To get reasonable results within given project time frame, we suggest following:
Use widely available, predefined SIMD ISA (Instruction Set Architecture): Use Intel's SSE
(Streaming SIMD Extensions). Not surprisingly, they provides the most comprehensive
development support. Game consoles were excluded simply because it is hard to find
development and test platform in academic environment.
Use open source 3-D game engine: Right now we're considering "Crystal Space". It
provides reasonably complete API and there are many open source 3-D game projects are
going on using it. More importantly, there is virtually no MMX/SSE optimization in it.
[Crystal Space 2000]
Implement SSE optimization on only selected performance-critical part of the game engine:
Ho-Seop Kim
1/2
ECE 734 Project Proposal
Although "back-end" of 3-D pipeline (in software sense), i.e., rendering is usually done by 3-D
accelerator hardware found in most PCs, "front-end", i.e., coordinate transformation and
lighting is still done by host CPU. [Glaskowsky 1999] We want to limit our work to the T&L
part of the game engine.
Try both high-level and low-level optimizations: Intel provides "wrapper" C++ class library to
reduce the pain in developing MMX/SSE specific optimizations. We can also do inline
assembly programming. [Intel 1999, 1] One important thing here is, you need to take good care
of SSE instruction scheduling since Pentium III implementation of SSE is not totally general.
[Intel 1999, 2] For example, it only allows two SSE fp muliplications in a single cycle even
though SSE fp multiply instruction works on 4 pairs of single precision fp numbers at a time.
Measuring the performance enhancement: We're planning to write small "kernel" programs to
wrap the target 3-D APIs and compare the performance between generic version and SSEoptimized version. We also plan to measure final game performance by observing FPS (frames
per second) using bundled test game program "walktest". It is not by any means a complete
game program but it does let you "walk" through small virtual 3-D world and displays final
FPS number on screen.
3.Summary
To learn about one of today's challenges in high performance computing, we suggest doing a class project
writing SIMD optimizations on an established 3-D game engine. More specifically, we will optimized T&L
part of open source game engine using Intel SSE and measure the performance improvement on both kernel
programs and rather complete game program.
We hope to learn both the big picture and subtle details involved in SIMD type programming which could
be essential to tomorrow's microarchitects. There are ugly implementation subtleties in today's SIMD
implementations and we want to get hands-on experience on it too.
4.Reference
[Glaskowsky 1999] Peter N. Glaskowsky, "A Concise Review of 3D Technology"
Microprocessor Report vol. 12, no. 13, Jun 21, 1999
[Diefendorff 1998] Keith Diefendorff, "Katmai enhances MMX"
Microprocessor Report vol. 12, no. 13, Oct 5, 1998
[Intel 1999, 1] Intel Application Note, "Software Development Strategies for SSE"
version 2.1, Jan 1999
[Intel 1999, 2] Intel Application Note, "SSE -- 3-D Transformation"
verion 1.3, Jan 1999
[Kunimatsu 2000] Atsushi Kunimatsu et al, "Vector Unit Architecture for Emotion Synthesis"
IEEE Micro vol. 20 no. 2, pp. 40-47, Mar/Apr 2000
[Crystal Space 2000] HTML documentation in Crystal Space 17.001 distribution. Can also be found at:
http://crystal.linuxgames.com
Ho-Seop Kim
2/2
Download