slides - iwcse 2013

advertisement
Session: Supercomputer/GPU and
Algorithms (GPU-2)
A GPU Accelerated Explicit Finite-volume
Euler Equation Solver with Ghost-cell
Approach
F.-A. Kuo1,2, M.R. Smith3, and J.-S. Wu1*
1Department
of Mechanical Engineering
National Chiao Tung University
Hsinchu, Taiwan
2National
Center for High-Performance Computing, NARL
Hsinchu, Taiwan
3Department of Mechanical Engineering
National Cheng Kung University
Tainan, Taiwan
*E-mail: chongsin@faculty.nctu.edu.tw
2013 IWCSE
Taipei, Taiwan
October 14-17, 2013
1 1
Outline
 Background & Motivation
 Objectives
 Split HLL (SHLL) Scheme
 Cubic-Spline Immersed Boundary Method
(IBM)
 Results & Discussion
 Parallel
Performance
 Demonstrations
 Conclusion and Future work
2
Background & Motivation
3
Parallel CFD
 Computational fluid dynamics (CFD) has played
an important role in accelerating the progress of
aerospace/space and other technologies.
 For several challenging 3D flow problems,
parallel computing of CFD bceomes necessary to
greatly shorten the very lengthy computational
time.
 Parallel computing of CFD has evolved from SIMD
type vectorized processing to SPMD type
distributed-memory processing for the past 2
decades, mainly because of the much lower cost
for H/W of the latter and easier programming.
4
SIMD vs. SPMD
 SIMD (Single instruction, multiple data), which is a
class of parallel computers, performs the same
operation on multiple data points at the
instruction level simultaneously.
 SSE/AVX
instructions in CPU and GPU computation,
e.g., CUDA.
 SPMD (Single program, multiple data) is a higher
level abstraction where programs are run across
multiple processors and operate on different
subsets of the data.
 Message
passing programming on distributed memory
computer architectures, e.g., MPI.
5
MPI vs. CUDA
Most well-known parallel CFD codes adopt
SPMD parallelism using MPI.
 e.g.,
Fluent (Ansys), CFL3D (NASA), to name a few.
Recently, because of the potentially very high
C/P ratio by using graphics processor units
(GPUs), parallelization of CFD code using GPUs
has become an active research area based on
CUDA, developed by Nvidia.
However, redesign of the numerical scheme
may be necessary to take full advantage of the
GPU architecture.
6
Split HLL Scheme on GPUs
Split Harten-Lax-van Leer (SHLL) scheme (Kuo et al., 2011)
a
highly local numerical scheme, modified from the
original HLL scheme
 Cartesian grid
 ~ 60 times of speedup (Nvidia C1060 GPU vs. Intel X5472
Xeon CPU) with explicit implementation
However, it is difficult to treat objects with complex
geometry accurately, especially for high-speed gas
flow. One example is given in the next page.
Thus, how to take advantage of easy
implementation of Cartesian grid on GPUs, while
improving the capability of treating objects with
complex geometry becomes important in further
extending the applicability of SHLL scheme in CFD
7
simulations.
Staircase-like vs. IBM
Staircase-like
IBM
Shock direction
Spurious waves are often generated using
staircase-like solid surface for high-speed gas
flows.
8
Immersed Boundary Method
Immersed boundary method (IBM) (Peskin, 1972; Mittal & Iaccarino,
2005 )
 easy
treatment of objects with complex geometry on a
Cartesian grid
 grid computation near the objects become automatic or
very easy
 easy treatment of moving objects in computational
domain w/o remeshing
Major idea of IBM is simply to enforce the B.C.’s at
computational grid points thru interpolation among
fluid grid and B.C.’s at solid boundaries.
Stencil of IBM operation is local in general.
Enabling an efficient use of original numerical
scheme, e.g., SHLL
Easy parallel implementation
9
Objectives
10
Goals
To develop and validate an explicit cellcentered finite-volume solver for solving
Euler equation, based on SHLL scheme, on
a Cartesian grid with cubic-spline IBM on
multiple GPUs
To study the parallel performance of the
code on single and multiple GPUs
To demonstrate the capability of the code
with several applications
11
Split HLL Scheme
12
SHLL Scheme - 1
i-1
i
i+1
+Flux
SIMD model for 2D flux computation
F 
F L  S LU

S
1  L

SR





L

 FR  S RU
 SR




1
S

 L

Original HLL
R
F 
F L  S LU L

S
1  L
A

SR






F R  S RU R

S
1  R
A

SL

Introduce local
approximations




- Flux
F
F


 1   L2
 D L  1 
 FL 
  DU L a L 
2


 2




 1   R2
 D R  1 
  FR 
  DU R a R 
2


 2




Final form (SHLL) is a
highly local scheme
 New SR & SL term are approximated w/o involving the
neighbor-cell data.
 A highly local flux computational scheme: great for
13
13
GPU!
SHLL Scheme - 2
i-1
i
SIMD model for 2D flux computation
 Flux computation is perfect for GPU
application.
 Almost the same as the vector addition
case.
 > 60 times speedup possible using a
single Tesla C1060 GPU device.
Performance compares to single thread
of a high-performance CPU (Intel Xeon
X5472)
i+1
+Flux
F
F


- Flux
 1   L2
 D L  1 
 FL 
  DU L a L 
2


 2




 1   R2
 D R  1 
  FR 
  DU R a R 
2


 2




Final Form (SHLL)
14
14
Cubic-spline IBM
15
Two Critical Issues of IBM
How to approximate solid boundaries?


Local Cubic Spline for reconstructing solid
boundaries w/ much fewer points
Easier calculation of surface normal/tangent
How to apply IBM in a cell-centered FVM
framework?
Ghost-cell approach
Obtain ghost cell properties by the interpolation
of data among neighboring fluid cells
 Enforce BCs at solid boundaries to ghost cells
through data mapping from image points


16
Cell Identification
1. Define a cubic-spline function for each segment of
boundary data to best fit solid boundary geometry
2. Identify all the solid cells, fluid cells and ghost points
3. Locate image points corresponding to ghost cells
Fluid
cell
Ghost
cell
Solid
boundary
curve
Solid
cell
17
Cubic-Spline Reconstruction
(Solid Boundary)
 The cubic spline
method provides
the advantages
including :
1. A high order curve
fitting boundary
2. Find these ghost
cells easily.
3. Calculate the
normal vector
which is normal to
the body surface.
18
BCs of Euler Eqns.
Approximated form

0
n
V
0
n
T
0
n
n
unit normal of body
surface
 im a g e

V
n ,im a g e
 V
n ,gh ost
V
t ,im a g e
 V
t ,gh ost
T im a g e

 g h o st
T g h o st
19
19
IBM Procedures
Fluid
cell
Interpolation
Image
point
Solid
cell
Ghost
point
Approximate the properties of the image
points using bi-linear interpolation among
neighboring fluid cells
20
SHLL/IBM Scheme on
GPU
21
Nearly All-Device Computation
Start
Set GPU device ID
and flowtime
Initialize
Flux calculation
True
State
calculation
T > flowtime
flowtime
+= dt
False
IBM
Output the result
CFL calculation
dt new
Device
dt new
Host
22
Results & Discussion
(Parallel Performance)
23
Parallel Performance - 1
Also named as
“Schardin’s problem”
Test Conditions
L=1
– Moving shock w/ Mach
1.5
– Resolution:
2000x2000 cells
H=1
– CFLmax=0.2
– Physical time:
0.35 sec. for 9843 timesteps using one GPU
Moving shock
x0.2 @ t=0
24
Parallel Performance - 2
Resolution
 2000x2000 cells
Sec.
Speedup
600
4
GPU cluster
 GPU:
Geforce GTX590
(2x 512 cores, 1.2 Ghz
3GB DDR5)
 CPU: Intel Xeon X5472
Overhead w/ IBM
 3%
only
Speedup



GPU/CPU: ~ 60x
GPU/GPU: 1.9 @2 GPUs
GPU/GPU: 3.6 @4 GPUs
3.5
500
3
400
2.5
300
2
1.5
200
1
100
0.5
0
0
1 GPU
2 GPUs
Compute time
4 GPUs
Speedup
25
Results & Discussion
(Demonstrations)
26
Shock over a finite wedge - 1
w/o IBM
 In the case of 400x400 cells w/o IBM,
the staircase solid boundary
generates spurious waves, which
destroys the accuracy of the surface
properties.
 By comparison, the case w/ IBM
shows much more improvement for
the surface properties.
w/ IBM
27
Shock over a finite wedge - 2
Density contour comparison
with IBM
w/o IBM
28
t= 0.35 s
All important physical phenomena are well captured
by the solver with IBM without spurious wave 28
generation.
Transonic Flow past a NACA Airfoil
Staircase boundary
w/o IBM
IBM result
pressure
pressure
In the left case, the spurious waves appear near the solid boundary,
but in the right case, we modify the boundary by using the IBM.
29
Transonic Flow past a NACA Airfoil
Distribution of pressure around the surface of the airfoil
Upper surf.
Lower surf.
New approach method
Ghost cell method,
J. Liu et al., 2009
These 2 results are very closed, and the right result is
made by Liu in 2009, and the left result is made by the
cubic spline IBM.
30
Transonic Flow past a NACA Airfoil
Top-side shock wave comparison
New approach method
Furmanek*, 2008
* Petr Furmánek, “Numerical Solution of Steady and Unsteady
Compressible Flow”, Czech Technical University in Prague, 2008
31
Transonic Flow past a NACA Airfoil
Bottom-side shock wave comparison
New approach method
Furmanek*, 2008
* Petr Furmánek, “Numerical Solution of Steady and Unsteady
Compressible Flow”, Czech Technical University in Prague, 2008
32
Conclusion & Future Work
33
Summary
A cell-centered 2-D finite-volume solver for the
inviscid Euler equation, which can easily treat
objects with complex geometry on a Cartesian
grid by using the cubic-spline IBM on multiple
GPUs, is completed and validated
 The addition of cubic-spline IBM only increase 3%
of the computational time, which is negligible.
Speedup for GPU/CPU generally exceeds 60 times
on a single GPU (Nvidia, Telsa C1060) as
compared to that on a single thread of an Intel
X5472 Xeon CPU.
Speedup for GPUs/GPU reaches 3.6 at 4 GPUs
(GeForce) for a simulation w/ 2000x2000 cells.
34
Future Work
To modify the Cartesian grid to the
adaptive mesh grid.
To simulate the moving boundary problem
and real-life problems with this immersed
boundary method
To change the SHLL solver to the truedirection finite volume solver, likes QDS
35
Thanks for your patient
and Questions ?
36
Download