DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program 2013-14 The Multi-core Age CSIRO ‘Bragg’ Compute Cluster Mobile Phone 2-4 Cores 2 | Presentation title | Presenter name PC Intel Xeon Phi 4-16 Cores 61 Cores 2048 Cores Programming for multi-cores Problem Machine Instructions Execution CPU Core 1 CPU Core 2 Divide the problem CPU Core 3 CPU Core 4 3 | Presentation title | Presenter name Amdahl's Law • The maximum speedup is dependent on % of the problem you can run in parallel 95% 20x speedup 90% 10x speedup 4x speedup 75% 2x speedup 50% Single Core Processor 1x Speed 0 5 10 15 Maximum Speedup 4 | Presentation title | Presenter name 20 25 Data structures: • Memory (data) is still a shared resource. Single Core Computer 4-Core Computer CPU core CPU core CPU core Memory (data) Memory (data) CPU core 5 | Presentation title | Presenter name CPU core Linked-list (Stack) Data Structure A “node” that holds data. TOP Data EMPTY A link to the next data point 6 | Presentation title | Presenter name Add new item (Push) We want to add a chunk of data (Data B) to the structure Data B TOP Data A 7 | Presentation title | Presenter name EMPTY Add new item (Push) Steps: For new data B 1) Find the start of the structure (TOP) Data B TOP Data A 8 | Presentation title | Presenter name EMPTY Add new item Steps: For new data B 2) Link into the structure. Data B TOP Data A 9 | Presentation title | Presenter name EMPTY Add new item TOP (new) Steps: For new data B 3) Update TOP. Data B Data A 10 | Presentation title | Presenter name NULL Resulting structure • Like stacking dinner plates • Only need to keep track of where TOP is to access the rest. TOP Data Data 11 | Presentation title | Presenter name Data Data Data NULL What happens in multi-core systems? Two threads trying to operate on the stack structure: Thread 1 attempts at time T. Thread 2 attempts at time T + 1 nanosecond. Because each of the steps takes time to complete, errors occur. 12 | Presentation title | Presenter name What happens in multi-core systems? This causes the interleaving of steps Thread 1 reads TOP (1) Thread 2 reads TOP (1) Thread 1 sets the next pointer (2) Thread 2 sets the next pointer (2) Thread 1 updates TOP (3) Thread 2 updates TOP (3) 13 | Presentation title | Presenter name Data B is lost forever because it is not linked to TOP anymore (Stack failure) Data B Thread 1 Data A TOP Data C Thread 2 14 | Presentation title | Presenter name EMPTY How do we fix this? • Use “data locks”. • Protect the 3 steps. • One thread at a time is granted access to the stack. • Complete an operation and release the lock. This is the standard approach for multithreaded structures. 15 | Presentation title | Presenter name Locks Easy to use. 2 lines of code added to fix. - Get Lock - Step 1, 2 ,3. - Release Lock. × Slow. One thread at a time can use the lock. This becomes sequential code. This is the code that cannot run in parallel. Analogy: Merging highway traffic into a single lane. 16 | Presentation title | Presenter name Lock-free New method • Lock-free data structure. • Special low-level instructions allows three steps in one computer instruction. • Removes the need for locks. • Called a Compare-Exchange. 17 | Presentation title | Presenter name Lock-free • Downside: Writing lock-free code is difficult (hence the project). • The Compare-Exchange operation forms the base for writing lock-free code. • The project takes specifications from research papers to implement. 18 | Presentation title | Presenter name Lock-free Implemented a range of lock-free optimizations for the stack. Open coding standards (C++, OpenMP) Benchmarked using a Intel Xeon Phi 61 core processor. Lock-free structure performed about 2x better for pure stack operations. 19 | Presentation title | Presenter name Summary Amdahl’s Law shows that it’s important to optimize sequential sections of code. The shared data structures are often sequential bottlenecks. Implementing lock-free data structures reduced this bottleneck. 20 | Presentation title | Presenter name