WSE 187: INTRODUCTION TO PARALLEL PROGRAMMING* *Prepared with the help of free online resources. Lecture 2 Jesmin Jahan Tithi LOGIN TO SSH Steps (Windows) Connect to the host Give provided password when prompted For the first time users: Accept security keys Change password: First provide the Old password Then type new password Repeat new password Do not afraid if you do not see any character On screen. But be careful when you are Typing. Save the host info in SSH when prompted. Mac uses: Use terminal to directly connect to the server. Other steps are the same. You may try to use Filezilla to transfer file from mac to server. HELLO PARALLEL WORLD! Intel Cilk Plus CILK PLUS Intel® Cilk™ Plus = add-on to the C and C++, implemented by the Intel® C++ Compiler 3 keywords to C and C++: cilk_for, cilk_spawn, and cilk_sync cilk_spawn - Specifies that a function call can execute asynchronously, without requiring the caller to wait for it to return. This is an expression of an opportunity for parallelism, not a command that mandates parallelism. The Intel Cilk Plus runtime will choose whether to run the function in parallel with its caller. cilk_sync - Specifies that all spawned calls in a function must complete before execution continues. There is an implied cilk_sync at the end of every function that contains a cilk_spawn. cilk_for - Allows iterations of the loop body to be executed in parallel. cilk_spawn and cilk_for keywords express opportunities for parallelism. CILK_SPAWN Compile: icc –O3 –o hello Hello_parallel_world.cpp Run: ./hello Run: CILK_NWORKERS=4 ./hello #include <stdio.h> #include <cilk/cilk.h> static void hello(){ int i=0; for(i=0;i<1000000;i++) printf(""); printf("Hello "); } static void world(){ int i=0; for(i=0;i<1000000;i++) printf(""); printf("world! "); } int main(){ cilk_spawn hello(); cilk_spawn world(); //cilk_sync; printf("Done! "); } CILK_SPAWN EXERCISE #include <stdio.h> #include <cilk/cilk.h> Order of placement -------------------------Wheels, Chassis, -----------------------Engine, Frame, ------------------------ void make(char* str){ int i=0; for(i=0;i<1000000;i++) printf(""); printf("%s has/have been created.\n",str); } void place(char* str){ int i=0; for(i=0;i<1000000;i++) printf(""); printf("%s has/have been placed.\n",str); } Steering wheel int main(){ //Place your code here } CILK_FOR for (int i = 0; i < 8; ++i) { cilk_spawn do_work(i); } cilk_sync; A better approach is to use a cilk_for loop: cilk_for (int i = 0; i < 8; ++i) { do_work(i); } #include <stdio.h> #include<iostream> #include <cilk/cilk.h> #include "cilktime.h" using namespace std; #define n 16384 int main(){ // First input vector. int A[n]; // Second input vector. int B[n]; // Sum vector. int C[n]; // Initialize cilk_for (int A[i] = B[i] = } the vectors or arrays with input. i = 0; i <= n; i++){ i; i+1; // Compute the sum unsigned long long tstart = cilk_getticks(); //beginning time stamp cilk_for (int i = 0; i <= n; i++){ C[i] = A[i] + B[i]; } unsigned long long tend = cilk_getticks(); //end time stamp cout<<"Time to run:"<<cilk_ticks_to_seconds(tend-tstart)<<endl; // Check the sum to verify. int pos; cout<<"Enter position of element to inspect"<<endl; cin>>pos; cout<<C[pos]<<endl; return 0; } CILK_FOR for (int i = 0; i < 8; ++i) { cilk_spawn do_work(i); } cilk_sync; A better approach is to use a cilk_for loop: cilk_for (int i = 0; i < 8; ++i) { do_work(i); } #include <stdio.h> #include <cilk/cilk.h> int main(){ long int sum = 0; cilk_for (int i = 0; i <= 100000000; i++) sum += i; printf("%ld\n",sum); return 0; } //wrong! race condition CILK_FOR Several ways of dealing with race conditions. First option: Use locks! We will learn more later. #include <stdio.h> #include <cilk/cilk.h> #include <pthread.h> //pthread library int main(){ long int sum = 0; pthread_mutex_t m; //define the lock pthread_mutex_init(&m,NULL); //initialize the lock cilk_for (int i = 0; i <= 1000000; i++){ pthread_mutex_lock(&m); //lock - prevents other threads from running this code sum += i; pthread_mutex_unlock(&m); //unlock - allows other threads to access this code } printf("%ld\n",sum); } CHANGING NUMBER OF CORES/THREADS Run with: CILK_NWORKERS=4 ./executable Or change inside the main program: if (0!= __cilkrts_set_param("nworkers","16")) { cout<<"Failed to set worker count\n"<<endl; return 1; } Check to verify: int num_threads =__cilkrts_get_nworkers(); cout<< num_threads <<endl;