ghosh

advertisement
Accelerating a climate physics
model with OpenCL
CMSC 601 Spring 11 – Research Skills
Dibyajyoti Ghosh
What is climate physics model?
• Global weather is controlled by many interconnected events.
Includes changes in atmosphere and oceans, ebb and flow of
sea ice etc.
• World’s most powerful super computers can simulate these
events.
• CCSM-2 model simulate Earth’s climate patterns in
considerable detail through 700 billion calculations to
recreate a single day of the world’s climate.
• Scientists use these data to understand ocean currents,
predict weather patterns, study O3 layer among others.
http://www.ucar.edu/communications/CCSM/overview.html
Background
• Solar radiation component of
NASA’s GEOS-v5 takes ~20% of
model computation time.
• NASA interested in analysis of
performance and cost benefit
using non traditional computing
systems.
• GEOS-v5 - 20+ old, written in
Fortran (mostly), still evolving.
• Cannot be entirely rewritten
due to production constraints.
http://www.ucar.edu/communications/CCSM/overview.html
Related Work
• Accelerating Climate Models with the IBM Cell Processor –
Shujia Zhou et al, 2008
• GPU Computing for Atmospheric Modeling - Kelly, Rory
NCAR, Boulder, July-Aug. 2010
• Accelerating Atmospheric Modeling Through Emerging
Multi-core Technologies - Linford, John Christian , Virginia
Tech, 2010
• Exploiting Array Syntax in Fortran for Accelerator
Programming - Matthew J. Sottile, Craig E Rasmussen, Wayne
N. Weseloh, Robert W. Robey, Los Alamos National Laboratory
Motivation
OpenCL - created with goal of unifying hybrid systems.
No literature on OpenCL portability among architectures.
No data on how OpenCL fares against GCC in vectorization.
http://www.cc.gatech.edu/~bader/AFRL-GT-Workshop2009/AFRL-GT-Bader.pdf
What is vectorization?
VF = 4
0
VR1
1
2
3
a b c d
OP(a)
VR2
OP(b)
VR3
OP(c)
VR4
OP(d)
VR5
 original serial loop:
for(i=0; i<N; i++){
a[i] = a[i] + b[i];
}
VOP( a, b, c,VR1
d )
 loop in vector
notation:
Vector
operation
for (i=0; i<N; i+=VF) {
a[i:i+VF-1] = a[i:i+VF-1] + b[i:i+VF-1];
}
vectorization
Vector Registers
Data in Memory:
 Data elements packed into vectors
 Vector length  Vectorization Factor (VF)
a b c d e f g h i j k l m n o p
6
Thanks to Dorit Nuzman, IBM www.hipeac.net/system/files/4_Nuzman.ppt for this wonderful slide
OpenCL trivia
• A framework for heterogeneous computing
resources developed by Apple Inc. now supported by
all major vendors.
• A subset of C language with additional features to
facilitate parallel processing.
http://www.khronos.org/opencl/
How data ||-ism works on
OpenCL?
• Kernel is the code for a work item that is executed on a device
(CPU or GPU or others).
• Imagine a NxN grid with one kernel invocation per grid.
Our Approach
Used code from the production version of the
NASA GEOS-v5 climate model.
• Step #1 – Identify computation intensive
sections from the weather model.
• Step #2 – Port these sections to OpenCL on
IBM Cell B.E. and then to Mac OSX to test on
Intel CPU.
• Step #3 – Analyze performance and reason the
performance.
Findings - I
Speedup on IBM Cell B.E. with OpenCL
Speedup on Mac OSX with OpenCL
Serial VS parallel speedup of a code section analyzed on Mac OSX
Findings - II
1. Speedup achieved ~40x on both IBM and Intel CPUs.
2. Code NOT portable among architectures, sections of code
not functioning due to incomplete OpenCL implementation on
Mac OSX Intel based architecture.
3. GCC vectorization fails in certain cases compared to OpenCL.
We attempted compilation of serial code with
gcc -O2 -ftree-vectorize flag.
Road Ahead
• Making appropriate changes to the solar
radiation code for Mac OS X Intel CPU based
architecture. Remember some parts of the
code base is non-functional on Intel CPUs.
• Modify the OpenCL code to run on GPUs and
understand if performance is portable, in
addition to code.
Summary
• OpenCL’s attempt towards portability in high
performance computing is still a long road
ahead.
• GCC vectorization fails against OpenCL.
Acknowledgements
• Dr. Shujia Zhou, MC2 Lab
• Fahad Zafar, MC2 Lab
• Center for Hybrid Multicore Productivity
Research, UMBC
• CMSC 601 folks
http://www.asianjobportal.com/wp-content/uploads/2010/11/25_questions_interview.jpg
Vectorization Analysis - I
A part of the serial code with gcc vectorization error output
Vectorization Analysis - II
A part of the OpenCL code with vectorized instruction set for the loop-construct in
the last slide
Download