C++ on Next-Gen Consoles: Effective Code for New

C++ on Next-Gen Consoles:
Effective Code for New
Pete Isensee
Development Manager
Microsoft Game Technology Group
Last Year at GDC
Chris Hecker ranted
 What did he say?
Programmers: danger ahead
 Out-of-order execution: good
 In-order execution: bad
 Microsoft and Sony are going to screw you
 You are so hosed. Game over, man.
“There’s absolutely nothing you can do
about this”
Console Hardware Architectures
Optimized to do floating-point math
 Optimized for multithreaded tasks
 Optimized to run games
 Not optimized to run general purpose code
 Not optimized to do branch prediction, code
reordering, instruction pipelining or other
out-of-order magic
 Large L2 caches
 Large latencies
We’re Game Programmers.
We Love Challenges.
We will make games on these consoles
 The solution is not assembly language
 The solution is to tailor our C/C++ engines,
inner loops and bottleneck functions to the
realities of the hardware
 Remember: C++ code can make or break
your game’s performance
Not Covering
Profiling (do it)
 Multithreading (do it)
 Memory allocation (avoid in game loop)
 Compiler settings (experiment)
 Exception handling (avoid it)
Topics for Today
Thinking about L2
Optimize memory access
 Use CPU caches effectively
Thinking about in-order processing
Avoid function call overhead
 Tips for efficient math
 Avoid hidden C++ inefficiencies
Optimize Memory Access
Proverb: thou shalt treat memory as if it were
thy hard drive
 You will be memory-bound on new consoles
 Recommendations
Never read from the same place twice in a frame
 Read data sequentially
 Write data sequentially
 Use everything you read
Minimize Data Passes
Game frame loops often access data twice
Or three times
 Or more
Optimize for a single pass
 Consider less frequent operations
 Physics, collision
 Networking
 Particle systems
Multiple Pass
Pointer Aliasing Explained
void init( float *a, const float *b ) {
a[0] = 1.0f - *b;
a[1] = 1.0f - *b;
Nominal case
0.0 1.0
Worst case
float a[2]={0.0f};
init( a, &a[0] );
0.0 0.0
A Solution: Restrict
Restrict keyword tells the compiler there’s no
Restrict permits the compiler to generate much
more efficient code
void init( float* __restrict a,
const float* __restrict b ) {
a[0] = 1.0f - *b; // compiler can do
a[1] = 1.0f - *b; // the right thing
What to Restrict
Use restrict widely
 Function pointer parameters
 Local pointers
 Pointers in structs/classes
 But not:
Function return types
 Casts
 Global pointers (maybe)
 References (maybe)
Use the CPU Caches Effectively
The L2 cache is your best friend
 Using the cache well is an art
 Ensure you have a good profiler by your side
Keep the Working Set Small
Pack commonly used data together
 Frequently used data might deserve its own
 Keep rarely used data separate
Consider bitfields
Example: texture file names
Bitfields are extremely efficient on PowerPC
Consider other forms of lossless
Inefficient Structs Are Bad Mojo
struct InefficientCar {
bool manual; // padding here
wheel wheels[8]; // 8 wheels?
bool convertible; // more pad
char engine; // 4 bits used
char file[32]; // rarely used
double maxAccel; // double?
sizeof(InefficientCar) = 80
Carefully Design Structures
struct EfficientCar {
wheel wheels[4]; // 4 wheels
wheel *moreWheels;
char *file; // stored elsewhere
float maxAccel; // float
unsigned engine:4; // bitfields
unsigned manual:1;
unsigned convertible:1;
sizeof(EfficientCar) = 32
Choose the Right Container
Prefer contiguous containers
Or at least mostly contiguous
 Examples: array, vector, deque
Avoid node-based containers
List, set/map, binary trees, hash tables
If you must use a tree, consider a custom
allocator for memory locality
 Vector + std::sort is often faster (and
smaller) than set or map or hash tables, by
an order of magnitude
Avoid Function Call Overhead
Function call overhead was a surprising
cause of performance issues on Xbox
 The same is true on Xbox 360 and PS3
 Fortunately, there are lots of solutions
 Research compiler settings. On Xbox 360:
Inline “any suitable”
 Enable link-time code generation
Spend time ensuring the compiler is inlining
the right things
Avoid Virtual Functions
Weigh the limitations of virtual functions
Adds a branch instruction
 Branch is always mispredicted
 Compiler is limited in how it can optimize
Consider replacing
virtual void Draw() = 0;
Xbox360.cpp: void Draw() { ... }
 Windows.cpp: void Draw() { ... }
 PS3.cpp:
void Draw() { ... }
Maximize Leaf Functions
Leaf functions don’t call other functions, ever
 If a potential leaf function calls another
function, the high-level function:
Is much less likely to be inlined
 Must set up a stack frame
 Must set up registers
Potential solutions
Remove the inner function completely
 Inline the inner function
 Provide two versions of the outer function
Unroll Inner Loops
Compiler can’t unroll loops where n is variable
 Even unrolling from ++i to i+=4 can be a
significant gain
Eliminates three branch instructions
 Increases opportunity for code scheduling
Don’t forget to hoist invariants out, too
Example Unrolling
// original
for( i=a.beg(); i!=a.end(); ++i )
// unrolled
e = a.end();
for( i=a.beg(); i!=e; i+=4 ) {
process(i); process(i+1);
process(i+2); process(i+3);
Pass Native Types by Value
Tradition says that “large” types are passed
by pointer or reference, but be careful
New consoles have really large registers
Native types include
64-bit int (__int64)
 VMX vector (__vector4) – 128 bits!
Pass structs by pointer or reference
One exception: pass structs consisting of bitfields
<= 64 bits by value
Know Data Type Performance
int32 and int64 have equivalent perf
 float and double have equivalent perf
 int8 and int16 are slower than int
They generate extra instructions
High bits cleared or sign-extended
Example: int32 adds 2X faster than int16 adds
Store as smallest type required
 Load into int32, int64 or double for calculations
Use Native Vector Types
In CS 101, you learned to create abstract
data types, such as matrices
typedef std::vector<float,4> vec;
typedef std::vector<vec,4> matrix;
This code is an abomination
 At least on Xbox 360 and PS3
 Xbox 360 and PS3 have dedicated vector
math units called VMX units
 Use them!
Your Math Buddies
__vector4 (4 32-bit floats; 128-bit register)
 XMVECTOR (typedef for vector4)
 XMMATRIX (array of 4 vector4s)
 XMVECTOR operators (+,-,*,/)
 Hundreds of XMVECTOR and XMMATRIX
 Xbox 360-specific, but similar constructs in
PS3 compilers
Avoid Floating-Point Branches
FP branches are slow
Cache has to be flushed
 ~10X slower than int branches
Avoid loops with float test
 Eliminate altogether if possible
Can be faster to calculate values
you won’t use!
Compare integers instead
 Replace with fsel when possible
10-20X performance gain
The fsel Option in Detail
Definition of hardware implementation:
float fsel(float a, float b, float c)
return ( a < 0.0f ) ? b : c;
You can replace expressions like
v = ( w < x ) ? y : z; // slow
With faster expressions like
v = fsel( w - x, y, z ); // turbo
Prefer Platform-Specific Funcs
The C runtime (CRT) is not usually the best
option when performance matters
 Xbox 360 examples
Prefer CreateFile to fopen or C++ streams
Prefer XMemCpy to memcpy
Options for asynchronous reads and other goodness
2-6X faster
Prefer XMemSet to memset
8-14X faster
Avoid Hidden C++ Inefficiencies
C++ rocks the house!
 C++ can bring your game to its knees!
 Consider these innocuous snippets
Quaternion q;
 s.push_back( k );
 if( (float)i > f )
 obj->Draw();
 GameObject arr[1000];
 a = b + c;
 i++;
C++ is Dangerous
With power comes responsibility
 Beware constructors
Is initialization the right thing to do?
Beware hidden allocations
 Conversion casts may have significant cost
 Use virtual functions with care
 Beware overloaded operators
 Stick to known idioms
Operator++ should be a constant-time operation.
 Really.
There absolutely are many things you can
do to efficiently program next-gen consoles
 Two key issues: L2/memory and in-order
Treat memory as you would a hard disk
 Watch out for those branches; use tricks like fsel
Prefer a light C++ touch
What’s Next
Our games are only as good as the weakest
member of the team
 Share what you’ve learned
 “The sharing of ideas allows us to stand on
one another’s shoulders instead of on one
another’s feet” – Jim Warren
 Fill out your feedback forms