Synchronization ECE 1747H: Parallel Programming Lecture 1-part2: More on parallelism and dependences -- synchronization Example 1 f() { a = 1; b = 2; c = 3;} g() { d = 4; e = 5; f = 6; } main() { f(); g(); } • No dependences between f and g. • Thus, f and g can be run in parallel. • All programming models give the user the ability to control the ordering of events on different processors. • This facility is called synchronization. Example 2 f() { a = 1; b = 2; c = 3; } g() { a = 4; b = 5; c = 6; } main() { f(); g(); } • Dependences between f and g. • Thus, f and g cannot be run in parallel. 1 Example 2 (continued) f() { a = 1; b = 2; c = 3; } g() { a = 4 ; b = 5; c = 6; } main() { f(); g(); } • Dependences are between assignments to a, assignments to b, assignments to c. • No other dependences. • Therefore, we only need to enforce these dependences. Example 2 (continued) f() { a = 1; b = 2; c = 3; } g() { a = 4; b = 5; c = 6; } main() { f(); g(); } f() { a = 1; signal(e_a); b = 2; signal(e_b); c = 3; signal(e_c); } g() { wait(e_a); a = 4; wait(e_b); b = 5; wait(e_c); c = 6; } main() { f(); g(); } Synchronization Facility • Suppose we had a set of primitives, signal(x) and wait(x). • wait(x) blocks unless a signal(x) has occurred. • signal(x) does not block, but causes a wait(x) to unblock, or causes a future wait(x) not to block. Example 2 (continued) a = 1; signal(e_a); b = 2; signal(e_b); c = 3; signal(e_c); wait(e_a); a = 4; wait(e_b); b = 5; wait(e_c); c = 6; • Execution is (mostly) parallel and correct. • Dependences are “covered” by synchronization. 2 About synchronization • Synchronization is necessary to make some programs execute correctly in parallel. • However, synchronization is expensive. • Therefore, needs to be reduced, or sometimes need to give up on parallelism. Example 4 for( i=1; i<100; i++ ) { a[i] = …; …; … = a[i-1]; } • Loop-carried dependence, not parallelizable Example 3 f() { a=1; b=2; c=3; } g() { d=4; e=5; a=6; } main() { f(); g(); } f() { a=1; signal(e_a); b=2; c=3; } g() { d=4; e=5; wait(e_a); a=6; } main() { f(); g(); } Example 4 (continued) for( i=...; i<...; i++ ) { a[i] = …; signal(e_a[i]); …; wait(e_a[i-1]); … = a[i-1]; } 3 Example 4 (continued) • Note that here it matters which iterations are assigned to which processor. • It does not matter for correctness, but it matters for performance. • Cyclic assignment is probably best. Example 5 (contimued) • We will need to make parallel execution stop after first loop and resume at the beginning of the second loop. • Two (standard) ways of doing that: Example 5 for( i=0; i<100; i++ ) a[i] = f(i); x = g(a); for( i=0; i<100; i++ ) b[i] = x + h( a[i] ); • First loop can be run in parallel. • Middle statement is sequential. • Second loop can be run in parallel. Fork-Join Synchronization • fork() causes a number of processes to be created and to be run in parallel. • join() causes all these processes to wait until all of them have executed a join(). – fork() - join() – barrier synchronization 4 Example 5 (continued) fork(); for( i=...; i<...; i++ ) a[i] = f(i); join(); x = g(a); fork(); for( i=...; i<...; i++ ) b[i] = x + h( a[i] ); join(); Example 6 (continued) for( k=0; k<...; k++ ) sum[k] = 0.0; fork(); for( j=…; j<…; j++ ) sum[k] += a[j]; join(); sum = 0.0; for( k=0; k<...; k++ ) sum += sum[k]; Example 6 sum = 0.0; for( i=0; i<100; i++ ) sum += a[i]; • Iterations have dependence on sum. • Cannot be parallelized, but ... Reduction • This pattern is very common. • Many parallel programming systems have explicit support for it, called reduction. sum = reduce( +, a, 0, 100 ); 5 Final word on synchronization • Many different synchronization constructs exist in different programming models. • Dependences have to be “covered” by appropriate synchronization. • Synchronization is often expensive. 6