Selecting a Suitable Parallel Technology
Parallel Technology Comparison
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
Intel Confidential
3/16/2016
3/16/2016
1
1
Agenda
•
•
•
•
Considerations
Examine Win32/Posix Threading Issues
Overview of Parallel Programming Models
The Threading Models
– Fixed Function Libraries
– Intel Parallel Building Blocks
– Intel® Cilk Plus
– Intel® Threading Building Blocks
– Intel® Array Building Blocks
• Comparison Chart
• Research Initiatives
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
2
Considerations in Choosing a Parallel
Programming Model
• Compatibility with your Code
– Supports your Base Language
– Focus Today = C/C++/FORTRAN
– Supports your Design Pattern
– Tasking
– Data Parallelism
– Algorithms, Data Structures, Data
• Portability
– New Processors
– New Processor Paradigms (GPU)
– Vendors: Compilers/OS/CPU
Architecture
– Compiler Extensions – Requires
Supporting Compiler
• Scalability
– Micro Architecture Parallelism (SIMD)
– Multi-Core
– GPU
– Distributed
• Ease of Use
– How much to learn
– Data Model (Shared, Distributed,
Other)
– How easy to understand/reduce
threading considerations
– Compiler Extensions vs Library Based
• Composability with other Parallel
Programming Models
– Source File,
– Process,
– System
• Tool Support
– QA Tools
– Debugger
– Automated Checking
– Performance Analysis Tools
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
3
Finding a Number with
Win32*/POSIX* Style Threading
int pos=NOTFOUND;
DWORD WINAPI FindnumThr((LPVOID)MyThreadID) {
for(i=MyThreadID;i<MAX; i+=MAXTHREADS)
if array[i]==val
pos=i;}
findnum() {
for (int ThreadID=0;ThreadID<=MAXTHREADS;i++)
SWThreads[i] = CreateThread(NULL, 0, ThreadFn,
(LPVOID) i, 0, NULL);
WaitForMultipleObjects(MAXTHREADS, SWThreads,
TRUE, INFINITE);}
Thought Lab: List as many nondesirable/missing
features of this implementation as possible
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
4
Parallel Programming Models
Intel® Parallel
Building Blocks
Intel®
Threading
Building
Blocks
Intel® Array
Building
Blocks
Intel® Cilk
Plus
Fixed
Function
Libraries
Other
Supported
Standards
Research
Initiatives
MPI
Intel®
Concurrent
Collections
OpenMP*
Intel®
Cluster
OpenMP
Intel®
OpenCL
SDK
Software
Transactional
Memory
Intel® Math
Kernel
Library
Intel®
Integrated
Performance
Primitives
Win32* &
POSIX*
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
5
Fixed Function Libraries
• Intel® Math Kernel Library (MKL)
• BLAS, (Sca)LAPACK, FFTw, Vector Math Library, Vector Random
Number Generation, Solvers (sparse)
• Intel® Integrated Performance Primitives (IPP)
• Video, Image, Audio, Speech, Signal, Crypto, Compression, Data
Integrity, String Processing, Linear Algebra, Solvers, Processor related
functions, etc…
• Best Way to Thread – Let Somebody else do it :>
• When there are performance benefits functions are threaded/distributed
• All the functions are thread safe
• Included Examples shows how to use functions in a threaded manner
• Ex: Matrix Matrix Multiply
• MKL:
• IPP:
dgemm
ippiMul_8u_C4RSfs
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
6
Intel® Cilk Plus
• An extremely simple but powerful C++/C compiler
extensions consisting of a few simple keywords,
hyperobjects, array notations, SIMD hint’s, and
Elemental Functions
cilk::reducer_list<float> pos;
void findnum(int *MAX, float *array, float val)
{
cilk_for(int i=0;i<*MAX;i++)
if array[i]==val
pos.push_back(i);
}
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
7
Intel® Threading Building Blocks
• A C++ Template Library – which promotes good
threading design.
tbb::concurent_queue<float> pos;
void findnum(int MAX, float *array, float val) {
tbb::parallel_for(
tbb::blocked_range<int> (0, MAX),
[&]( const tbb::blocked_range<int> &r )
for(int i=r.begin;i<r.end;i++)
if array[i]==val
pos.push(i)
//or instead of Q
tbb::task::self().cancel_group_execution();)
}
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
8
Intel® Array Building Blocks
A generalized data-parallel programming model
which transforms code (C++ today) into multiple
implementations that exploit current and future
hardware implementations.
void findnum(dense<f32> array, f32 val,
dense<usize>& results) {
dense<boolean> locations = (array == val);
dense<usize> matching_indices =
indices(0, array.length());
results = pack(matching_indices, locations);
}
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
9
A Comparison of Parallel Technologies
Language Learning
s1
Curve
Composable
Scalable
C, Fortran Easy
Yes3
Includes
Very
Good
Distributed (Many Vendors)
IPP
C
Easy
Yes3
Very Good
Linux,
Windows,
Apple
Good
Intel Cilk Plus
C++, C
Easy
Yes
Very Good
Stay Tuned
Yes4
Intel TBB
C++
Medium
Yes
Very Good
Yes
Yes
(Open Source)
Intel (r) Array
Building
Blocks
C++
Medium
Yes
Very Good
Stay Tuned
Yes4
Win32*
C
Easy/Hard2
No
Difficult
No
Very Good
Posix*
C
Easy/Hard2
No
Difficult
Yes
Very Good
(Many Vendors)
MKL
Fixed Function
Libraries
Intel Parallel
Building Blocks
Other Standards
Portable
(OS/Compiler)
Intel Tool
Support
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
10
Summary
Introduced Parallel Programming Technologies:
– Intel Parallel Building Blocks
– Intel Cilk Plus
– Intel Threading Building
Blocks
– Intel Array Building Blocks
– Other Supported Standards
– Win32/Posix
Discussed Parallel Programming Technologies Criteria
– Compatibility
– Composability
– Scalabilty
– Portability
– Learning Curve
– Tool Support
Compared the Technologies
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
11
OpenMP*
• A set of compiler directives that you can insert into
C/C++/FORTRAN code to specify parallelism
int pos=NOTFOUND;
void findnum(int MAX, float *array,
float val) {
#pragma OMP parallel for
for (int i=0;i<MAX;i++)
if array[i]==val
#pragma omp critical
pos=i;}
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
12
Message Passing Interface (MPI)
“Distributed Parallel” Programming Library for Passing Messages
between Processes. All Communication between tasks is explicit.
#include <setupArrayMAXand Val.h>
int pos=NOTFOUND;
MPI_Status status;
MPI_Request request;
int crank,csize;
void main(int argc, char* argv[]) {
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&crank);
MPI_Comm_size(MPI_COMM_WORLD,&csize);
if (crank==0) getarray(array);
findnum(array, MAX, val);
MPI_Finalize();
}
void findnum(int MAX, float *array, float val) {
int localpos;
MPI_Bcast(array,MAX,MPI_REAL,0,
MPI_COMM_WORLD);
for (int i=(crank*MAX/csize);
i<=((crank+1)*(MAX/csize)-1);i++)
if (array[i]==val) localpos=i;
if (crank>=1)
for(int i=1;i<csize;++i)
MPI_send(&localpos,1,MPI_INT,0,TAG,
MPI_COMM_WORLD);
else {
pos=localpos;
for(int i=1;i<csize;++i) { int remotepos;
MPI_recv(&remotepos,1,MPI_INT,i,TAG,
MPI_COMM_WORLD,&status);
if (remotepos!=NOTFOUND) then
pos=remotepos;
}}}
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
13
Research @Intel
• Intel® OpenCL SDK
• On Whatif.intel.com
– Concurrent Collections
– Software Transactional Memory
– Cluster OpenMP*
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
14
A Comparison of Parallel Technologies
Languages1
Learning
Curve
Composable
Portable
(OS/Compiler)
Very
Includes
(Many
Distributed
Vendors)
Linux,
Very Good Windows,
Apple
Intel Tool
Support
MKL
C, Fortran
Easy
Yes3
IPP
C
Easy
Yes3
Intel Cilk Plus C++, C
Easy
Yes
Very Good Stay Tuned
Yes4
Intel TBB
C++
Medium
Yes
Very Good
Intel Array
Building
Blocks
C++
Medium
Yes
Very Good Stay Tuned
Yes4
Win32
C
Easy/Hard2 No
Difficult
Very Good
Posix
C
Easy/Hard2
No
OpenMP
C, C++,
Fortran
Easy
No
MPI
C,
Fortran
Medium
Yes
Medium
?
Fixed Function
Libraries
Intel Parallel
Building Blocks
Other Standards
Intel OpenCL
C
SDK
Scalable
Good
Good
Yes
Yes
(Open Source)
No
Yes
Difficult
(Many
Vendors)
Yes
Good
(Many
Vendors)
Yes
Distributed (Many
Vendors)
Yes
Good
(Many
Vendors)
Very Good
Good
Very Good
Some
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
15
Optimization Notice
Optimization Notice
Intel compilers, associated libraries and associated development tools may include or utilize options that optimize
for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction
sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel
compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors.
For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they
implicate, please refer to the “Intel Compiler User and Reference Guides” under “Compiler Options." Many library
routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other
microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and
Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will
get extra performance on Intel microprocessors.
Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree
for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations
include Intel Streaming SIMD Extensions 2 (Intel SSE2), Intel Streaming SIMD Extensions 3 (Intel SSE3), and
Supplemental Streaming SIMD Extensions 3 (Intel SSSE3) instruction sets and other optimizations. Intel does not
guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured
by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on
Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to
determine which best meet your requirements. We hope to win your business by striving to offer the best
performance of any compiler or library; please let us know if you find we do not.
Notice revision #20110228
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
16
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED,
BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS
DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS
OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR
INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Performance tests and ratings are measured using specific computer systems and/or components
and reflect the approximate performance of Intel products as measured by those tests. Any
difference in system hardware or software design or configuration may affect actual performance.
Buyers should consult other sources of information to evaluate the performance of systems or
components they are considering purchasing. For more information on performance tests and on
the performance of Intel products, reference www.intel.com/software/products.
BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom Inside, Centrino
Inside, Centrino logo, Cilk, Core Inside, FlashFile, i960, InstantIP, Intel, the Intel logo, Intel386,
Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside,
Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel
NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel
XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium
Inside, skoool, Sound Mark, The Journey Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon
Inside are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2011. Intel Corporation.
http://intel.com/software/products
Software & Services Group, Developer Products Division
Software & Services Group
Copyright © 2010, Intel Corporation. All rights reserved.
Developer Products Division
Copyright© 2011, Intel Corporation. All rights reserved.
*Other
brands and owners.
names are the property of their respective owners.
*Other brands and names are the property of their
respective
3/16/2016
1717