OpenCL Introduction
AN EXAMPLE FOR OPENCL
LU LU
OCT.11 2014
CONTENTS
1. Environment Configuration
2. Case Analyzing
OPENCL INTRODUCTION | APRIL 11, 2014
2
1. ENVIRONMENT CONFIGURATION
1. ENVIRONMENT CONFIGURATION
 IDE
– Any IDE for C/C++ could use OpenCL.
– We use Microsoft Visual Studio 2010.
 Setting for the requiring projects:
– Add include path of the SDK to Additional include directories.
– Add library path of the SDK to Additional library directories.
OPENCL INTRODUCTION | APRIL 11, 2014
4
1. ENVIRONMENT CONFIGURATION
 Include Directory
OPENCL INTRODUCTION | APRIL 11, 2014
5
1. ENVIRONMENT CONFIGURATION
 Lib Directory
OPENCL INTRODUCTION | APRIL 11, 2014
6
1. ENVIRONMENT CONFIGURATION
 OpenCL Lib
OPENCL INTRODUCTION | APRIL 11, 2014
7
2. CASE ANALYZING
2. CASE ANALYZING
1. Problem Description
2. Algorithm
3. Calculation Features
4. Parallelizing
5. Programming
1. Kernel
2. Host
6. Tools
1. AMD Profiler
2. gDEBugger
OPENCL INTRODUCTION | APRIL 11, 2014
9
2.1 PROBLEM DESCRIPTION
 Input an image, the rotation center and angle;
 Output the rotated image with the same size of the input (original)
image.
Original
OPENCL INTRODUCTION | APRIL 11, 2014
Rotated
10
2.2 ALGORITHM
 Let 𝑥0 , 𝑦0 be the rotation center, 𝜃 be the rotation angle;
 A point in original image 𝑥1 , 𝑦1 will be move into the new position
𝑥2 , 𝑦2 after rotating 𝜃 clockwise as per following formula:
𝑥2 = 𝑥1 − 𝑥0 cos 𝜃 + 𝑦1 − 𝑦0 sin 𝜃
𝑦2 = − 𝑥1 − 𝑥0 sin 𝜃 + 𝑦1 − 𝑦0 cos 𝜃
OPENCL INTRODUCTION | APRIL 11, 2014
11
2.3 CALCULATION FEATURES
 The calculation for each point is the same and independent;
 A large amount of points.
 So it is fit for parallel computing with GPU.
OPENCL INTRODUCTION | APRIL 11, 2014
12
2.4 PARALLELIZING
 With OpenCL framework, assign one work-item for the calculation for
each point.
 There are two methods to implement the algorithm:
– Assign work-items as per original image;
• For each point, calculate the new position and copy it to the output image;
• Write-memory conflict.
– Assign work-items as per output image.
• For each point, calculate the source position and copy it from the original image;
• Read-memory conflict.
OPENCL INTRODUCTION | APRIL 11, 2014
13
2.5 PROGRAMMING
1. Kernel
– which run in GPU.
1. Host
– which run in CPU.
OPENCL INTRODUCTION | APRIL 11, 2014
14
2.5.1 KERNEL
1.
__kernel void image_rotate(
2.
__global float * src_data, __global float * dest_data,
//Data in global memory
3.
int W, int H,
//Image Dimensions
4.
float sinTheta, float cosTheta )
//Rotation Parameters
5.
{
6.
//Thread gets its index within index space
7.
const int ix = get_global_id(0);
8.
const int iy = get_global_id(1);
9.
//Calculate location of data to move into ix and iy– Output decomposition as mentioned
10.
float xpos = (((float)ix) * cosTheta + ((float)iy) * sinTheta);
11.
float ypos = (((float)iy) * cosTheta - ((float)ix) * sinTheta);
12.
//Bound Checking
13.
if ((((int)xpos >= 0) && ((int)xpos < W)) && (((int)ypos >= 0) && ((int)ypos < H)))
14.
{
15.
//Read (xpos,ypos) src_data and store at (ix,iy) in dest_data
16.
dest_data[iy * W + ix] = src_data[(int)(floor(ypos * W + xpos))];
17.
18.
}
}
OPENCL INTRODUCTION | APRIL 11, 2014
15
2.5.1 KERNEL
 This kernel will rotate the image with rotation angle 𝜃 anticlockwise.
 OpenCL defined some native function, such as sin and cos, but here
calculate these value in host and pass them as parameters to the
kernel because they are the same for every work-item.
OPENCL INTRODUCTION | APRIL 11, 2014
16
2.5.1 KERNEL
 KernelAnalyzer
OPENCL INTRODUCTION | APRIL 11, 2014
17
2.5.1 KERNEL
 KernelAnalyzer
– We can see the bottlenecks are ALU ops.
– It means that the main work of kernel is calculation, but not the data
transfer.
– This kernel has high performance.
OPENCL INTRODUCTION | APRIL 11, 2014
18
2.5.2 HOST
Platform
• Query Platform
• Query Devices
• Create Context
• Create Command Queue
Compiler
• Compile Program
• Create Kernel
Runtime
OPENCL INTRODUCTION | APRIL 11, 2014
• Create Buffers
• Write buffers
• Set Kernel Arguments
• Run Kernel
• Read buffers
19
2.5.2 HOST
 Query Platform
cl_int clGetPlatformIDs (cl_uint num_entries,
cl_platform_id *platforms,
cl_uint *num_platforms)
– This function is usually called twice; first calling is
for getting the number of platform, and second
calling is for getting the platforms.
– First calling:
• clGetPlatformIDs(NULL, NULL, num)
– Second calling:
• clGetPlatformIDs(num, platforms, NULL)
Query Devices
Create Context
Create Command Queue
Compile Program
Create Kernel
Create Buffers
Write buffers
Set Kernel Arguments
Run Kernel
Read buffers
OPENCL INTRODUCTION | APRIL 11, 2014
20
2.5.2 HOST
Query Platform
 Query Devices
cl_int clGetDeviceIDs (cl_platform_id platform,
cl_device_type
device_type,
cl_uint num_entries,
cl_device_id *devices,
cl_uint *num_devices)
– This function is also usually called twice just like
clGetPlatformIDs.
– device_type:
• CL_DEVICE_TYPE_ALL
• CL_DEVICE_TYPE_CPU
• CL_DEVICE_TYPE_GPU
Create Context
Create Command Queue
Compile Program
Create Kernel
Create Buffers
Write buffers
Set Kernel Arguments
Run Kernel
Read buffers
OPENCL INTRODUCTION | APRIL 11, 2014
21
2.5.2 HOST
Query Platform
Create Context
Create Command Queue
Compile Program
Create Kernel
Create Buffers
Write buffers
Set Kernel Arguments
Run Kernel
Read buffers
OPENCL INTRODUCTION | APRIL 11, 2014
22
2.5.2 HOST
 Create Context
cl_context clCreateContext (
const cl_context_properties *properties,
cl_uint num_devices,
const cl_device_id *devices,
void (CL_CALLBACK *pfn_notify)(const char
*errinfo, const void *private_info, size_t cb, void
*user_data),
void *user_data,
cl_int *errcode_ret)
 Create Command Queue
cl_command_queue clCreateCommandQueue (
cl_context context,
cl_device_id device,
cl_command_queue_properties properties,
cl_int *errcode_ret)
Query Platform
Query Devices
Compile Program
Create Kernel
Create Buffers
Write buffers
Set Kernel Arguments
Run Kernel
Read buffers
OPENCL INTRODUCTION | APRIL 11, 2014
23
2.5.2 HOST
 Compile Program
cl_program clCreateProgramWithSource(
cl_context context,
cl_uint count,
const char **strings,
const size_t *lengths,
cl_int *errcode_ret)
Query Platform
Query Devices
Create Context
Create Command Queue
 Create Kernel
cl_kernel clCreateKernel (
cl_program program,
const char *kernel_name,
cl_int *errcode_ret)
Create Buffers
Write buffers
Set Kernel Arguments
Run Kernel
Read buffers
OPENCL INTRODUCTION | APRIL 11, 2014
24
2.5.2 HOST
 Create Buffers
cl_mem clCreateBuffer (cl_context context,
cl_mem_flags flags,
size_t size,
void *host_ptr,
cl_int *errcode_ret)
 Write Buffers
cl_int clEnqueueWriteBuffer (cl_command_queue
command_queue,
cl_mem buffer,
cl_bool blocking_write,
size_t offset,
size_t size,
const void *ptr,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list,
cl_event *event)
OPENCL INTRODUCTION | APRIL 11, 2014
Query Platform
Query Devices
Create Context
Create Command Queue
Compile Program
Create Kernel
Set Kernel Arguments
Run Kernel
Read buffers
25
2.5.2 HOST
Query Platform
 Set Kernel Arguments (for each one)
Query Devices
cl_int clSetKernelArg (cl_kernel kernel,
cl_uint arg_index,
size_t arg_size,
const void *arg_value)
Create Context
Create Command Queue
 Run Kernel
cl_int clEnqueueNDRangeKernel
(cl_command_queue command_queue,
cl_kernel kernel,
cl_uint work_dim,
const size_t
*global_work_offset,
const size_t
*global_work_size,
const size_t
*local_work_size,
cl_uint
num_events_in_wait_list,
const cl_event
*event_wait_list,
cl_event *event)
OPENCL INTRODUCTION | APRIL 11, 2014
Compile Program
Create Kernel
Create Buffers
Write buffers
Read buffers
26
2.5.2 HOST
 Parameters of clEnqueueNDRangeKernel
– work_dim is the number of dimensions used to
specify the global work-items and work-items in the
work-group.
– global_work_offset can be used to specify an array
of work_dim unsigned values that describe the
offset used to calculate the global ID of a work-item.
– If global_work_offset is NULL, the global IDs start
at offset (0, 0, … 0).
– local_work_size points to an array of work_dim
unsigned values that describe the number of workitems that make up a work-group (also referred to
as the size of the work-group) that will execute the
kernel specified by kernel.
Query Platform
Query Devices
Create Context
Create Command Queue
Compile Program
Create Kernel
Create Buffers
Write buffers
Set Kernel Arguments
Read buffers
OPENCL INTRODUCTION | APRIL 11, 2014
27
2.5.2 HOST
 Parameters of clEnqueueNDRangeKernel
– global_work_size into appropriate work-group
instances. If local_work_size is specified,
global_work_size must be evenly divisible by
local_work_size.
– event_wait_list and num_events_in_wait_list
specify events that need to complete before this
particular command can be executed.
– event returns an event object that identifies this
particular kernel execution instance.
Query Platform
Query Devices
Create Context
Create Command Queue
Compile Program
Create Kernel
Create Buffers
Write buffers
Set Kernel Arguments
Read buffers
OPENCL INTRODUCTION | APRIL 11, 2014
28
2.5.2 HOST
 Read Buffers
cl_int clEnqueueReadBuffer (
cl_command_queue command_queue,
cl_mem buffer,
cl_bool blocking_read,
size_t offset,
size_t size,
void *ptr,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list,
cl_event *event)
Query Platform
Query Devices
Create Context
Create Command Queue
Compile Program
Create Kernel
Create Buffers
Write buffers
Set Kernel Arguments
Run Kernel
OPENCL INTRODUCTION | APRIL 11, 2014
29
2.5.2 HOST
 Release
–
–
–
–
–
–
clReleaseKernel
clReleaseProgram
clReleaseMemObject
clReleaseCommandQueue
clReleaseContext
clReleaseDevice
OPENCL INTRODUCTION | APRIL 11, 2014
30
2.6 TOOLS
1. AMD Profiler
2. gDEBugger
OPENCL INTRODUCTION | APRIL 11, 2014
31
2.6.1 AMD PROFILER
 Counters
We can see the running information of any kernel.
OPENCL INTRODUCTION | APRIL 11, 2014
32
2.6.1 AMD PROFILER
 Trace
Trace the OpenCL Runtime.
OPENCL INTRODUCTION | APRIL 11, 2014
33
2.6.2 GDEBUGGER
 Debug into kernel
OPENCL INTRODUCTION | APRIL 11, 2014
34
THANK YOU!
OPENCL INTRODUCTION | APRIL 11, 2014
35
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies,
omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
limited to product and roadmap changes, component and motherboard version changes, new model and/or product
releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the
like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right
to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify
any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND
ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY
PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT,
SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED
HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are
trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered
trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only
and may be trademarks of their respective owners.
OPENCL INTRODUCTION | APRIL 11, 2014
36