What is Intel® Threading Building Blocks?

Session:
Game Threading Analysis &
Methodology
INTEL CONFIDENTIAL
Intel® Software College
Objectives
At the end of the module you will be able to:
• Describe two strategies to parallelize a game using two different
threading implementations
• Evaluate the effectiveness of each strategy with respect to how each
uses the underlying number of cores
we WILL NOT be teaching you how to program using Windows API. We
will not be teaching you how to program with Threading Building
blocks. We will not be teaching you how to program DirectX or
Direct 3D.
This module is intended primarily to show a higher level method of
attack for games and how to use tools to evaluate the effectiveness
of the threading strategy.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
2
Intel® Software College
Agenda

Introduction to Intel®Thread Profiler

Usual Game Structure

Parallelization with
Windows* and POSIX Threads

What is Intel® Threading Building Blocks?

Parallelization with
Intel® Threading Building Blocks
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
3
Intel® Software College
Motivation for a threading Tool
Developing efficient multithreaded applications is hard
New performance problems are caused by the interaction between
concurrent threads
• Load imbalance
• Contention on synchronization objects
• Threading overhead
Need a tool to help!
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
4
Intel® Software College
Intel® Thread Profiler
Plugs in to the VTune™ performance environment
• Instrumentation-based data collector in VTune
Identifies performance issues in OpenMP* or threaded
applications using the Win32* API, POSIX* threads, and
Intel® Threading Building Blocks
Pinpoints performance bottlenecks that directly affect
execution time
Binary instrumentation of applications
Different views and filters available to assist and organize
analysis
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
5
Intel® Software College
Thread Timeline
Horizontal bands represent
threads
Light green:
Threads are waiting
Dark green: Threads are
active (running or runnable)
Hatched light green:
Threads are busy waiting
Thread 1
Thread 2
Thread 3
Yellow Transition lines: Signals that
wake up other threads, such as
transferring a lock or sending a message
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
6
Intel® Software College
Concurrency Profile
Measure core utilization so user can see
how parallel their program really is
• Relative to the system executing the application
Idle: no active threads
Serial: a single thread
Under-subscribed: # threads > 1 && # threads < # cores
Fully-subscribed*: # threads == # cores
Oversubscribed: # threads > # cores
Concurrency level is the number of
threads that are active (not waiting,
sleeping, blocked, etc.) at a given time
* example reflects 4 core machine
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
7
Intel® Software College
Agenda

Introduction to Intel®Thread Profiler

Usual Game Structure

Parallelization with
Windows* and POSIX Threads

What is Intel® Threading Building Blocks?

Parallelization with
Intel® Threading Building Blocks

Curriculum Application & Summary
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
8
Intel® Software College
Usual Game Structure
Consists of loop called “Game Loop”
Get Input
Simulate
Render
Get Input Physics
AI
Particles Render
DTC uses the usual game loop
http://softwarecommunity.intel.com/articles/eng/1363.htm
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
9
Intel® Software College
Lab Activity 1
Build Destroy the Castle
•
Follow the steps for Lab
Activity 1 to build & run Destroy
the Castle
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
10
Intel® Software College
Limitation of Serial Games for Multi-Core Systems
With clock rates reaching into the multiple GHz
range, further increases are becoming harder
Parallel hardware has gone mainstream for
desktop
To exploit the performance potential of
multi-core processors, applications
must be threaded
Serial games get
no benefits from multi-core
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
11
Intel® Software College
Agenda

Usual Game Structure

Introduction to Intel®Thread Profiler

Parallelization with
Windows* and POSIX Threads

What is Intel® Threading Building Blocks?

Parallelization with
Intel® Threading Building Blocks

Curriculum Application & Summary
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
12
Intel® Software College
Parallelization with Windows* Threads
Updating with double buffered data structures
Decoupling rendering from frame processing
Asynchronous update of parts of the scene
Task Decomposition
Render
Physics
AI
Particles
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
13
Intel® Software College
Lab Activity 2
Use Thread Profiler to Analyze the Baseline and Task Decomposition
profiles
• Follow the steps for Lab Activity 2 to analyze the baseline and Task
Decomposition profiles you created in Activity 1
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
14
Intel® Software College
DTC Baseline Analysis
Thread Profiler shows serial code
dominating the execution
Render
Physics
AI
Particles
Serial code dominates baseline application
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
15
Intel® Software College
Performance Profile task Decomposition
Thread pool for
3 “simulate” threads
Issue: Load imbalance
Render
Render
Thread Pool
Thread Pool
Thread Pool
Physics
AI
Particles
Some parallelism but …
low utilization of 4 cores
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
16
Intel® Software College
Data Level Parallelism
Nested parallelism
• Top level - task decomposition
• Next level - data decomposition
Render
Physics
update several AI units
update several AI units
AI
update several AI units
update several AI units
Particles
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
17
Intel® Software College
Lab Activity 3
Use Thread Profiler to Analyze the effect of adding data decomposition
in the AI thread
• Follow the steps for Lab Activity 3 to analyze the two & four thread
AI data decomposition profiles you created in Activity 1
• Analyze the profiles for the AIThreads2.tp & AIThreads4.tp
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
18
Intel® Software College
Performance Profile AI Decomp. with 2 Threads
Thread pool for 3 threads
Split AI for 2 threads
Load imbalance
Render
Thread Pool
Thread Pool
Thread Pool
AI Pool
AI Pool
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
19
Intel® Software College
Performance Profile AI Decomp. with 4 Threads
Thread pool for 3 threads
Split AI for 4 threads
Load imbalance
Oversubscription
Render
Thread Pool
Thread Pool
Thread Pool
AI Pool
AI Pool
AI Pool
AI Pool
Too many threads for number of cores
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
20
Intel® Software College
One More Problem: Nested Parallelism
Software components are built from smaller
components
If each turtle specifies threads...
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
21
Intel® Software College
Disadvantages of …
Using Windows* and POSIX Threads for Games
• Low-Level details (not intuitive)
– Hard to come up with good design
– Code often becomes very dependent on a particular OS’s threading
facilities
• Load imbalance
– Has to be managed manually
• Oversubscription
– Multiple components create threads that compete for CPU resources
– Hard to manage nested parallelism
Hard to achieve scalability
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
22
Intel® Software College
Agenda

Usual Game Structure

Introduction to Intel®Thread Profiler

Introduction to code instrumentation

Parallelization with
Windows* and POSIX Threads

What is Intel® Threading Building Blocks?

Parallelization with
Intel® Threading Building Blocks

Curriculum Application & Summary
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
23
Intel® Software College
What is Intel® Threading Building Blocks?
It is Open Source now!
• http://www.intel.com/software/products/tbb/
• http://threadingbuildingblocks.org/
• Port for Xbox 360
Threading Abstraction Library
• Relies on generic programming
• Provides high-level generic implementation of parallel design
patterns and concurrent data structures
You specify task patterns instead of threads
• Library maps your logical tasks onto physical threads,
efficiently using cache and balancing load
• Full support for nested parallelism
Targets threading for robust performance
• Designed to provide scalable performance for computationally
intense portions of shrink-wrapped applications
• Portable across Linux*, Mac OS*, and Windows*
Emphasizes scalable data parallel programming
• Solutions based on task decomposition usually do not scale
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
24
Intel® Software College
Components of Intel® Threading Building Blocks
•Parallel algorithms
•Concurrent containers
•Synchronization primitives
•Memory allocation
•Task scheduler
Problem
Intel® TBB Approach
•Low-Level details
Operate with task patterns instead of threads
•Load imbalance
Work-stealing balances load
•Oversubscription
One scheduled thread per hardware thread
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
25
Intel® Software College
Lab Activity 4
Build the TBB version of Destroy the Castle, collect profile data and
then analyze the data to compare this parallel strategy to the
previous one.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
26
Intel® Software College
Agenda

Usual Game Structure

Introduction to Intel®Thread Profiler

Introduction to code instrumentation

Parallelization with
Windows* and POSIX Threads

What is Intel® Threading Building Blocks?

Parallelization with
Intel® Threading Building Blocks

Curriculum Application & Summary
* InteCul and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective
owners.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
27
Intel® Software College
Parallelization with TBB
Scheme of parallelization with Windows* and POSIX threads
Render
Physics
update several AI units
update several AI units
...
update several AI units
update several AI units
Particles
Scheme of parallelization with Intel® TBB
Render
update several
blocks
update several AI
units
update several
particles
update several AI
units
update several AI
units
update several
particles
update several
blocks
update several
blocks
update several
particles
update several
particles
update several
blocks
update several AI
units
update several
blocks
update several AI
units
update several
particles
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
28
Intel® Software College
Task Graph
MainTask
SyncTask
PhysicsTask
AITask
AIFinalTask
ParticlesTask
AIBodyTask
AIBodyTask
Not expanded
...
AIBodyTask
Task creation order
Task completion signals
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
29
Intel® Software College
Performance Profile
Intel® TBB task pool for 3 threads
Automatic load balancing with
work-stealing
Benchmark: 8.66sec
Measured on 4 core test machine
Render
Good utilization of 4 cores
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
30
Intel® Software College
Limitation of Using
Intel® Threading Building Blocks for Games
Intel® TBB is not intended for
– I/O bound processing
– Hard real-time processing
– Excessive usage of explicit synchronization
However, it is compatible with other threading packages
– It can be used in concert with Windows* and POSIX threads, etc
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
31
Advantages of Using
Intel® Threading Building Blocks for Games
Intel® Software College
Generic Parallel Algorithms
– You specify task patterns instead of threads
– Cross-Platform implementation
Load balancing
– Adaptive tuning to variable computation
– Full support for nested parallelism
Efficient use of resources
– One scheduled thread per hardware thread
– Effective cache reuse
Easy to achieve scalability
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
32
Intel® Software College
Copyright © 2008, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
or other countries. * Other brands and names are the property of their respective owners.
33