Session: Game Threading Analysis & Methodology INTEL CONFIDENTIAL Intel® Software College Objectives At the end of the module you will be able to: • Describe two strategies to parallelize a game using two different threading implementations • Evaluate the effectiveness of each strategy with respect to how each uses the underlying number of cores we WILL NOT be teaching you how to program using Windows API. We will not be teaching you how to program with Threading Building blocks. We will not be teaching you how to program DirectX or Direct 3D. This module is intended primarily to show a higher level method of attack for games and how to use tools to evaluate the effectiveness of the threading strategy. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 2 Intel® Software College Agenda Introduction to Intel®Thread Profiler Usual Game Structure Parallelization with Windows* and POSIX Threads What is Intel® Threading Building Blocks? Parallelization with Intel® Threading Building Blocks * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 3 Intel® Software College Motivation for a threading Tool Developing efficient multithreaded applications is hard New performance problems are caused by the interaction between concurrent threads • Load imbalance • Contention on synchronization objects • Threading overhead Need a tool to help! Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 4 Intel® Software College Intel® Thread Profiler Plugs in to the VTune™ performance environment • Instrumentation-based data collector in VTune Identifies performance issues in OpenMP* or threaded applications using the Win32* API, POSIX* threads, and Intel® Threading Building Blocks Pinpoints performance bottlenecks that directly affect execution time Binary instrumentation of applications Different views and filters available to assist and organize analysis Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 5 Intel® Software College Thread Timeline Horizontal bands represent threads Light green: Threads are waiting Dark green: Threads are active (running or runnable) Hatched light green: Threads are busy waiting Thread 1 Thread 2 Thread 3 Yellow Transition lines: Signals that wake up other threads, such as transferring a lock or sending a message Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 6 Intel® Software College Concurrency Profile Measure core utilization so user can see how parallel their program really is • Relative to the system executing the application Idle: no active threads Serial: a single thread Under-subscribed: # threads > 1 && # threads < # cores Fully-subscribed*: # threads == # cores Oversubscribed: # threads > # cores Concurrency level is the number of threads that are active (not waiting, sleeping, blocked, etc.) at a given time * example reflects 4 core machine Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 7 Intel® Software College Agenda Introduction to Intel®Thread Profiler Usual Game Structure Parallelization with Windows* and POSIX Threads What is Intel® Threading Building Blocks? Parallelization with Intel® Threading Building Blocks Curriculum Application & Summary * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 8 Intel® Software College Usual Game Structure Consists of loop called “Game Loop” Get Input Simulate Render Get Input Physics AI Particles Render DTC uses the usual game loop http://softwarecommunity.intel.com/articles/eng/1363.htm Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 9 Intel® Software College Lab Activity 1 Build Destroy the Castle • Follow the steps for Lab Activity 1 to build & run Destroy the Castle Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 10 Intel® Software College Limitation of Serial Games for Multi-Core Systems With clock rates reaching into the multiple GHz range, further increases are becoming harder Parallel hardware has gone mainstream for desktop To exploit the performance potential of multi-core processors, applications must be threaded Serial games get no benefits from multi-core Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 11 Intel® Software College Agenda Usual Game Structure Introduction to Intel®Thread Profiler Parallelization with Windows* and POSIX Threads What is Intel® Threading Building Blocks? Parallelization with Intel® Threading Building Blocks Curriculum Application & Summary * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 12 Intel® Software College Parallelization with Windows* Threads Updating with double buffered data structures Decoupling rendering from frame processing Asynchronous update of parts of the scene Task Decomposition Render Physics AI Particles * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 13 Intel® Software College Lab Activity 2 Use Thread Profiler to Analyze the Baseline and Task Decomposition profiles • Follow the steps for Lab Activity 2 to analyze the baseline and Task Decomposition profiles you created in Activity 1 Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 14 Intel® Software College DTC Baseline Analysis Thread Profiler shows serial code dominating the execution Render Physics AI Particles Serial code dominates baseline application Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 15 Intel® Software College Performance Profile task Decomposition Thread pool for 3 “simulate” threads Issue: Load imbalance Render Render Thread Pool Thread Pool Thread Pool Physics AI Particles Some parallelism but … low utilization of 4 cores Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 16 Intel® Software College Data Level Parallelism Nested parallelism • Top level - task decomposition • Next level - data decomposition Render Physics update several AI units update several AI units AI update several AI units update several AI units Particles Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 17 Intel® Software College Lab Activity 3 Use Thread Profiler to Analyze the effect of adding data decomposition in the AI thread • Follow the steps for Lab Activity 3 to analyze the two & four thread AI data decomposition profiles you created in Activity 1 • Analyze the profiles for the AIThreads2.tp & AIThreads4.tp Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 18 Intel® Software College Performance Profile AI Decomp. with 2 Threads Thread pool for 3 threads Split AI for 2 threads Load imbalance Render Thread Pool Thread Pool Thread Pool AI Pool AI Pool Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 19 Intel® Software College Performance Profile AI Decomp. with 4 Threads Thread pool for 3 threads Split AI for 4 threads Load imbalance Oversubscription Render Thread Pool Thread Pool Thread Pool AI Pool AI Pool AI Pool AI Pool Too many threads for number of cores Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 20 Intel® Software College One More Problem: Nested Parallelism Software components are built from smaller components If each turtle specifies threads... Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 21 Intel® Software College Disadvantages of … Using Windows* and POSIX Threads for Games • Low-Level details (not intuitive) – Hard to come up with good design – Code often becomes very dependent on a particular OS’s threading facilities • Load imbalance – Has to be managed manually • Oversubscription – Multiple components create threads that compete for CPU resources – Hard to manage nested parallelism Hard to achieve scalability * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 22 Intel® Software College Agenda Usual Game Structure Introduction to Intel®Thread Profiler Introduction to code instrumentation Parallelization with Windows* and POSIX Threads What is Intel® Threading Building Blocks? Parallelization with Intel® Threading Building Blocks Curriculum Application & Summary * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 23 Intel® Software College What is Intel® Threading Building Blocks? It is Open Source now! • http://www.intel.com/software/products/tbb/ • http://threadingbuildingblocks.org/ • Port for Xbox 360 Threading Abstraction Library • Relies on generic programming • Provides high-level generic implementation of parallel design patterns and concurrent data structures You specify task patterns instead of threads • Library maps your logical tasks onto physical threads, efficiently using cache and balancing load • Full support for nested parallelism Targets threading for robust performance • Designed to provide scalable performance for computationally intense portions of shrink-wrapped applications • Portable across Linux*, Mac OS*, and Windows* Emphasizes scalable data parallel programming • Solutions based on task decomposition usually do not scale * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 24 Intel® Software College Components of Intel® Threading Building Blocks •Parallel algorithms •Concurrent containers •Synchronization primitives •Memory allocation •Task scheduler Problem Intel® TBB Approach •Low-Level details Operate with task patterns instead of threads •Load imbalance Work-stealing balances load •Oversubscription One scheduled thread per hardware thread Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 25 Intel® Software College Lab Activity 4 Build the TBB version of Destroy the Castle, collect profile data and then analyze the data to compare this parallel strategy to the previous one. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 26 Intel® Software College Agenda Usual Game Structure Introduction to Intel®Thread Profiler Introduction to code instrumentation Parallelization with Windows* and POSIX Threads What is Intel® Threading Building Blocks? Parallelization with Intel® Threading Building Blocks Curriculum Application & Summary * InteCul and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 27 Intel® Software College Parallelization with TBB Scheme of parallelization with Windows* and POSIX threads Render Physics update several AI units update several AI units ... update several AI units update several AI units Particles Scheme of parallelization with Intel® TBB Render update several blocks update several AI units update several particles update several AI units update several AI units update several particles update several blocks update several blocks update several particles update several particles update several blocks update several AI units update several blocks update several AI units update several particles * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 28 Intel® Software College Task Graph MainTask SyncTask PhysicsTask AITask AIFinalTask ParticlesTask AIBodyTask AIBodyTask Not expanded ... AIBodyTask Task creation order Task completion signals Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 29 Intel® Software College Performance Profile Intel® TBB task pool for 3 threads Automatic load balancing with work-stealing Benchmark: 8.66sec Measured on 4 core test machine Render Good utilization of 4 cores Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 30 Intel® Software College Limitation of Using Intel® Threading Building Blocks for Games Intel® TBB is not intended for – I/O bound processing – Hard real-time processing – Excessive usage of explicit synchronization However, it is compatible with other threading packages – It can be used in concert with Windows* and POSIX threads, etc * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners. Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 31 Advantages of Using Intel® Threading Building Blocks for Games Intel® Software College Generic Parallel Algorithms – You specify task patterns instead of threads – Cross-Platform implementation Load balancing – Adaptive tuning to variable computation – Full support for nested parallelism Efficient use of resources – One scheduled thread per hardware thread – Effective cache reuse Easy to achieve scalability Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 32 Intel® Software College Copyright © 2008, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. 33