Selecting a Suitable Parallel Technology Parallel Technology Comparison Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective Intel Confidential 3/16/2016 3/16/2016 1 1 Agenda • • • • Considerations Examine Win32/Posix Threading Issues Overview of Parallel Programming Models The Threading Models – Fixed Function Libraries – Intel Parallel Building Blocks – Intel® Cilk Plus – Intel® Threading Building Blocks – Intel® Array Building Blocks • Comparison Chart • Research Initiatives Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 2 Considerations in Choosing a Parallel Programming Model • Compatibility with your Code – Supports your Base Language – Focus Today = C/C++/FORTRAN – Supports your Design Pattern – Tasking – Data Parallelism – Algorithms, Data Structures, Data • Portability – New Processors – New Processor Paradigms (GPU) – Vendors: Compilers/OS/CPU Architecture – Compiler Extensions – Requires Supporting Compiler • Scalability – Micro Architecture Parallelism (SIMD) – Multi-Core – GPU – Distributed • Ease of Use – How much to learn – Data Model (Shared, Distributed, Other) – How easy to understand/reduce threading considerations – Compiler Extensions vs Library Based • Composability with other Parallel Programming Models – Source File, – Process, – System • Tool Support – QA Tools – Debugger – Automated Checking – Performance Analysis Tools Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 3 Finding a Number with Win32*/POSIX* Style Threading int pos=NOTFOUND; DWORD WINAPI FindnumThr((LPVOID)MyThreadID) { for(i=MyThreadID;i<MAX; i+=MAXTHREADS) if array[i]==val pos=i;} findnum() { for (int ThreadID=0;ThreadID<=MAXTHREADS;i++) SWThreads[i] = CreateThread(NULL, 0, ThreadFn, (LPVOID) i, 0, NULL); WaitForMultipleObjects(MAXTHREADS, SWThreads, TRUE, INFINITE);} Thought Lab: List as many nondesirable/missing features of this implementation as possible Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 4 Parallel Programming Models Intel® Parallel Building Blocks Intel® Threading Building Blocks Intel® Array Building Blocks Intel® Cilk Plus Fixed Function Libraries Other Supported Standards Research Initiatives MPI Intel® Concurrent Collections OpenMP* Intel® Cluster OpenMP Intel® OpenCL SDK Software Transactional Memory Intel® Math Kernel Library Intel® Integrated Performance Primitives Win32* & POSIX* Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 5 Fixed Function Libraries • Intel® Math Kernel Library (MKL) • BLAS, (Sca)LAPACK, FFTw, Vector Math Library, Vector Random Number Generation, Solvers (sparse) • Intel® Integrated Performance Primitives (IPP) • Video, Image, Audio, Speech, Signal, Crypto, Compression, Data Integrity, String Processing, Linear Algebra, Solvers, Processor related functions, etc… • Best Way to Thread – Let Somebody else do it :> • When there are performance benefits functions are threaded/distributed • All the functions are thread safe • Included Examples shows how to use functions in a threaded manner • Ex: Matrix Matrix Multiply • MKL: • IPP: dgemm ippiMul_8u_C4RSfs Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 6 Intel® Cilk Plus • An extremely simple but powerful C++/C compiler extensions consisting of a few simple keywords, hyperobjects, array notations, SIMD hint’s, and Elemental Functions cilk::reducer_list<float> pos; void findnum(int *MAX, float *array, float val) { cilk_for(int i=0;i<*MAX;i++) if array[i]==val pos.push_back(i); } Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 7 Intel® Threading Building Blocks • A C++ Template Library – which promotes good threading design. tbb::concurent_queue<float> pos; void findnum(int MAX, float *array, float val) { tbb::parallel_for( tbb::blocked_range<int> (0, MAX), [&]( const tbb::blocked_range<int> &r ) for(int i=r.begin;i<r.end;i++) if array[i]==val pos.push(i) //or instead of Q tbb::task::self().cancel_group_execution();) } Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 8 Intel® Array Building Blocks A generalized data-parallel programming model which transforms code (C++ today) into multiple implementations that exploit current and future hardware implementations. void findnum(dense<f32> array, f32 val, dense<usize>& results) { dense<boolean> locations = (array == val); dense<usize> matching_indices = indices(0, array.length()); results = pack(matching_indices, locations); } Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 9 A Comparison of Parallel Technologies Language Learning s1 Curve Composable Scalable C, Fortran Easy Yes3 Includes Very Good Distributed (Many Vendors) IPP C Easy Yes3 Very Good Linux, Windows, Apple Good Intel Cilk Plus C++, C Easy Yes Very Good Stay Tuned Yes4 Intel TBB C++ Medium Yes Very Good Yes Yes (Open Source) Intel (r) Array Building Blocks C++ Medium Yes Very Good Stay Tuned Yes4 Win32* C Easy/Hard2 No Difficult No Very Good Posix* C Easy/Hard2 No Difficult Yes Very Good (Many Vendors) MKL Fixed Function Libraries Intel Parallel Building Blocks Other Standards Portable (OS/Compiler) Intel Tool Support Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 10 Summary Introduced Parallel Programming Technologies: – Intel Parallel Building Blocks – Intel Cilk Plus – Intel Threading Building Blocks – Intel Array Building Blocks – Other Supported Standards – Win32/Posix Discussed Parallel Programming Technologies Criteria – Compatibility – Composability – Scalabilty – Portability – Learning Curve – Tool Support Compared the Technologies Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 11 OpenMP* • A set of compiler directives that you can insert into C/C++/FORTRAN code to specify parallelism int pos=NOTFOUND; void findnum(int MAX, float *array, float val) { #pragma OMP parallel for for (int i=0;i<MAX;i++) if array[i]==val #pragma omp critical pos=i;} Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 12 Message Passing Interface (MPI) “Distributed Parallel” Programming Library for Passing Messages between Processes. All Communication between tasks is explicit. #include <setupArrayMAXand Val.h> int pos=NOTFOUND; MPI_Status status; MPI_Request request; int crank,csize; void main(int argc, char* argv[]) { MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&crank); MPI_Comm_size(MPI_COMM_WORLD,&csize); if (crank==0) getarray(array); findnum(array, MAX, val); MPI_Finalize(); } void findnum(int MAX, float *array, float val) { int localpos; MPI_Bcast(array,MAX,MPI_REAL,0, MPI_COMM_WORLD); for (int i=(crank*MAX/csize); i<=((crank+1)*(MAX/csize)-1);i++) if (array[i]==val) localpos=i; if (crank>=1) for(int i=1;i<csize;++i) MPI_send(&localpos,1,MPI_INT,0,TAG, MPI_COMM_WORLD); else { pos=localpos; for(int i=1;i<csize;++i) { int remotepos; MPI_recv(&remotepos,1,MPI_INT,i,TAG, MPI_COMM_WORLD,&status); if (remotepos!=NOTFOUND) then pos=remotepos; }}} Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 13 Research @Intel • Intel® OpenCL SDK • On Whatif.intel.com – Concurrent Collections – Software Transactional Memory – Cluster OpenMP* Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 14 A Comparison of Parallel Technologies Languages1 Learning Curve Composable Portable (OS/Compiler) Very Includes (Many Distributed Vendors) Linux, Very Good Windows, Apple Intel Tool Support MKL C, Fortran Easy Yes3 IPP C Easy Yes3 Intel Cilk Plus C++, C Easy Yes Very Good Stay Tuned Yes4 Intel TBB C++ Medium Yes Very Good Intel Array Building Blocks C++ Medium Yes Very Good Stay Tuned Yes4 Win32 C Easy/Hard2 No Difficult Very Good Posix C Easy/Hard2 No OpenMP C, C++, Fortran Easy No MPI C, Fortran Medium Yes Medium ? Fixed Function Libraries Intel Parallel Building Blocks Other Standards Intel OpenCL C SDK Scalable Good Good Yes Yes (Open Source) No Yes Difficult (Many Vendors) Yes Good (Many Vendors) Yes Distributed (Many Vendors) Yes Good (Many Vendors) Very Good Good Very Good Some Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 15 Optimization Notice Optimization Notice Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel Streaming SIMD Extensions 2 (Intel SSE2), Intel Streaming SIMD Extensions 3 (Intel SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #20110228 Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 16 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2011. Intel Corporation. http://intel.com/software/products Software & Services Group, Developer Products Division Software & Services Group Copyright © 2010, Intel Corporation. All rights reserved. Developer Products Division Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and owners. names are the property of their respective owners. *Other brands and names are the property of their respective 3/16/2016 1717