Introduction to Concurrent Programming Software & Hardware Basics Slides by Ofer Givoli Based on: The art of multiprocessor programming Maurice Herlihy and Nir Shavit, 2008 • • Appendix A – Software Basics Appendix B – Hardware Basics 1 Software Basics 2 Threads in Java • Executes a single, sequential program • Subclass of: java.lang.Thread 3 … Taken from: The art of multiprocessor programming, by Maurice Herlihy and Nir Shavit, 2008 (modified) 4 Monitors • lock + waiting set • every object is a monitor • Critical section: using the synchronized keyword. • Waiting: using the wait()method • Waking-up waiting threads, using the methods: • notify() • notifyAll() 5 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); public void push(T obj) { innerStack.push(obj); } public T pop() { return innerStack.pop(); } } ConcurrentStack<Integer> s = ... s.push(1); ... = s.pop(); ... = s.pop(); 6 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); private Object monitor = new Object(); public void push(T obj) { synchronized(monitor) { innerStack.push(obj); } } public T pop() { synchronized(monitor) { return innerStack.pop(); } } } ... = s.pop(); ... = s.pop(); BLOCKED 7 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); private Object monitor = new Object(); public void push(T obj) { synchronized(monitor) { innerStack.push(obj); } } public T pop() { synchronized(monitor) { return innerStack.pop(); } } } 8 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); public void push(T obj) { synchronized(this) { innerStack.push(obj); } } public T pop() { synchronized(this) { return innerStack.pop(); } } } 9 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); public synchronized void push(T obj) { innerStack.push(obj); } public synchronized T pop() { return innerStack.pop(); } } New feature: waiting for pop() 10 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); public synchronized void push(T obj) { innerStack.push(obj); } public synchronized T pop() { while (innerStack.empty()) {} return innerStack.pop(); } } Problem? 11 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); public synchronized void push(T obj) { innerStack.push(obj); } public synchronized T pop() { while (innerStack.empty()) {} return innerStack.pop(); } } deadlock s.push(1); ... = s.pop(); BLOCKED 12 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); public synchronized void push(T obj) { innerStack.push(obj); } public synchronized T pop() { while (innerStack.empty()) {} return innerStack.pop(); } } 13 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); public synchronized void push(T obj) { if (innerStack.empty()) notifyAll(); innerStack.push(obj); } public synchronized T pop() { while (innerStack.empty()) {wait();} return innerStack.pop(); } } 14 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); public synchronized void push(T obj) { if (innerStack.empty()) notifyAll(); innerStack.push(obj); } public synchronized T pop() { while (innerStack.empty()) {wait();} return innerStack.pop(); } } ... = s.pop(); s.push(1); BLOCKED WAITING BLOCKED 15 public class ConcurrentStack<T> { private Stack<T> innerStack = new Stack<T>(); public synchronized void push(T obj) { if (innerStack.empty()) notify(); innerStack.push(obj); } public synchronized T pop() { while (innerStack.empty()) {wait();} return innerStack.pop(); } } lost wakeup ... = s.pop(); ... = s.pop(); WAITING WAITING 16 s.push(1); s.push(2); Thread.sleep(t); Thread.yield(); 17 Thread-Local Objects class ThreadLocallD extends ThreadLocal<Integer> { protected Integer initialValue() { return …; } } ThreadLocallD id = …; id.set(…); … … = id.get(); id.set(…); … … = id.get(); 18 • Synchronization in C# • Pthreads 19 Hardware Basics 20 Taken from: https://software.intel.com/en-us/articles/optimizing-applications-for-numa 21 CPU Speed: Size: Cost: Power: L1 Cache L2 Cache Fastest Smallest Highest Highest L3 Cache Memory (DRAM) Slowest Biggest Lowest Lowest Taken from: Computer Structure 2014 slides, by Lihu Rappoport and Adi Yoaz (modified) 22 Processor 1 Processor 2 L1 cache L1 cache L2 cache (shared) Memory Taken from: Computer Structure 2014 slides, by Lihu Rappoport and Adi Yoaz 23 SMP (symmetric multiprocessing) not scalable NUMA (Non-uniform memory access) Taken from: The art of multiprocessor programming, by Maurice Herlihy and Nir Shavit, 2008 (modified) 24 Cache Coherence Cache-line states: • Modified • Exclusive • Shared • Invalid false sharing Taken from: The art of multiprocessor programming, by Maurice Herlihy and Nir Shavit, 2008 (modified) 25 Spinning SMP NUMA Taken from: The art of multiprocessor programming, by Maurice Herlihy and Nir Shavit, 2008 (modified) 26 Multi-Core and Multi-Threaded Architectures Taken from: The art of multiprocessor programming, by Maurice Herlihy and Nir Shavit, 2008 (modified) • Execute instructions out-of-order/in parallel/speculatively. • write buffer • reordering of reads-writes by compiler • memory barrier instruction (expensive) • reads-writes reorder in Java • Volatile variables in Java 27 Hardware Synchronization Instructions • compare-and-swap/set (CAS) • load-linked & store-conditional (LL/SC) 28 Thanks! 29