Part 1

advertisement
HPC Middleware (HPC-MW)
Infrastructure for Scientific Applications on
HPC Environments
Overview and Recent Progress
Kengo Nakajima, RIST.
3rd ACES WG Meeting, June 5th & 6th, 2003.
Brisbane, QLD, Australia,
3rd ACES WG mtg. June 2003.
My Talk


HPC-MW (35 min.)
Hashimoto/Matsu’ura Code on ES (5 min.)
2
3rd ACES WG mtg. June 2003.
Table of Contents




Background
Basic Strategy
Development
On-Going Works/Collaborations
 XML-based


System for User-Defined Data Structure
Future Works
Examples
3
3rd ACES WG mtg. June 2003.
Frontier Simulation Software for
Industrial Science (FSIS)
http://www.fsis.iis.u-tokyo.ac.jp


Part of "IT Project" in MEXT (Minitstry of Education,
Culture, Sports, Science & Technology)
HQ at Institute of Industrial Science, Univ. Tokyo
5
years, 12M USD/yr. (decreasing ...)
 7 internal projects, > 100 people involved.

Focused on
 Industry/Public
Use
 Commercialization
4
3rd ACES WG mtg. June 2003.
Frontier Simulation Software for
Industrial Science (FSIS) (cont.)
http://www.fsis.iis.u-tokyo.ac.jp







Quantum Chemistry
Quantum Molecular Interaction System
Nano-Scale Device Simulation
Fluid Dynamics Simulation
Structural Analysis
Problem Solving Environment (PSE)
High-Performance Computing Middleware (HPC-MW)
5
3rd ACES WG mtg. June 2003.
Background

Various Types of HPC Platforms
 Parallel



Computers
PC Clusters
MPP with Distributed Memory
SMP Cluster (8-way, 16-way, 256-way): ASCI, Earth Simulator
 Power,
HP-RISC, Alpha/Itanium, Pentium, Vector PE
 GRID Environment -> Various Resources

Parallel/Single PE Optimization is important !!
 Portability
under GRID environment
 Machine-dependent optimization/tuning.
 Everyone knows that ... but it's a big task especially for
application experts, scientists.
6
3rd ACES WG mtg. June 2003.
Background

Various Types of HPC Platforms
 Parallel



Computers
PC Clusters
MPP with Distributed Memory
SMP Cluster (8-way, 16-way, 256-way): ASCI, Earth Simulator
 Power,
HP-RISC, Alpha/Itanium, Pentium, Vector PE
 GRID Environment -> Various Resources

Parallel/Single PE Optimization is important !!
 Portability
under GRID environment
 Machine-dependent optimization/tuning.
 Everyone knows that ... but it's a big task especially for
application experts, scientists.
7
3rd ACES WG mtg. June 2003.
Reordering for SMP Cluster with
Vector PEs: ILU Factorization
PDJDS/CM-RCM
PDCRS/CM-RCM CRS no re-ordering
short innermost loop
8
3rd ACES WG mtg. June 2003.
3D Elastic Simulation
Problem Size~GFLOPS
Earth Simulator/SMP node (8 PEs)
1.E+02
GFLOPS
1.E+01
1.E+00
1.E-01
1.E-02
1.E+04
1.E+05
1.E+06
1.E+07
DOF
●:PDJDS/CM-RCM,■:PCRS/CM-RCM,▲:Natural Ordering
9
3rd ACES WG mtg. June 2003.
3D Elastic Simulation
Problem Size~GFLOPS
Intel Xeon 2.8 GHz, 8 PEs
GFLOPS
3.00
2.50
2.00
1.50
1.E+04
1.E+05
1.E+06
1.E+07
DOF
●:PDJDS/CM-RCM,■:PCRS/CM-RCM,▲:Natural Ordering
10
3rd ACES WG mtg. June 2003.
Parallel Volume Rendering
MHD Simulation of Outer Core
11
3rd ACES WG mtg. June 2003.
Volume Rendering Module
using Voxels
On PC Cluster
-Hierarchical Background Voxels
-Linked-List
On Earth Simulator
-Globally Fine Voxels
-Static-Array
12
3rd ACES WG mtg. June 2003.
Background (cont.)

Simulation methods such as FEM, FDM etc. have
several typical processes for computation.
13
3rd ACES WG mtg. June 2003.
"Parallel" FEM Procedure
Pre-Processing
Main
Post-Processing
Initial Grid Data
Data Input/Output
Post Proc.
Partitioning
Matrix Assemble
Visualization
Linear Solvers
Domain Specific
Algorithms/Models
14
3rd ACES WG mtg. June 2003.
Background (cont.)


Simulation methods such as FEM, FDM etc. have
several typical processes for computation.
How about "hiding" these process from users by
Middleware between applications and compilers ?
 Development:
efficient, reliable, portable, easy-to-maintain
 accelerates advancement of the applications (= physics)
 HPC-MW = Middleware close to "Application Layer"
Applications
Middleware
Big Gap
Compilers, MPI etc.
15
3rd ACES WG mtg. June 2003.
Example of HPC Middleware
Simulation Methods include
Some Typical Processes
FEM
I/O
Matrix Assemble
Linear Solver
Visualization
16
3rd ACES WG mtg. June 2003.
Example of HPC Middleware
Individual Process can be optimized for
Various Types of MPP Architectures
FEM
MPP-A
I/O
Matrix Assemble
MPP-B
Linear Solver
Visualization
MPP-C
17
3rd ACES WG mtg. June 2003.
Example of HPC Middleware
Library-Type HPC-MW for Existing HW
FEM code developed on PC
18
3rd ACES WG mtg. June 2003.
Example of HPC Middleware
Library-Type HPC-MW for Existing HW
FEM code developed on PC
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Earth Simulator
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Xeon Cluster
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Hitachi SR8000
19
3rd ACES WG mtg. June 2003.
Example of HPC Middleware
Library-Type HPC-MW for Existing HW
FEM code developed on PC
I/O
Matrix
Assemble
I/F for
I/O
I/F for
Mat.Ass.
Linear
Solver
Vis.
I/F for
Solvers
I/F for
Vis.
HPC-MW for Earth Simulator
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Xeon Cluster
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Hitachi SR8000
20
3rd ACES WG mtg. June 2003.
Example of HPC Middleware
Parallel FEM Code Optimized for ES
FEM code developed on PC
I/F for
I/O
I/O
I/F for
Mat.Ass.
Matrix
Assemble
I/F for
Solvers
Linear
Solver
I/F for
Vis.
Vis.
HPC-MW for Earth Simulator
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Xeon Cluster
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Hitachi SR8000
21
3rd ACES WG mtg. June 2003.
Example of HPC Middleware
Parallel FEM Code Optimized for Intel Xeon
FEM code developed on PC
I/F for
I/O
I/O
I/F for
Mat.Ass.
Matrix
Assemble
I/F for
Solvers
Linear
Solver
I/F for
Vis.
Vis.
HPC-MW for Xeon Cluster
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Earth Simulator
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Hitachi SR8000
22
3rd ACES WG mtg. June 2003.
Example of HPC Middleware
Parallel FEM Code Optimized for SR8000
FEM code developed on PC
I/F for
I/O
I/O
I/F for
Mat.Ass.
Matrix
Assemble
I/F for
Solvers
Linear
Solver
I/F for
Vis.
Vis.
HPC-MW for Hitachi SR8000
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Earth Simulator
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Xeon Cluster
23
3rd ACES WG mtg. June 2003.
 Background
 Basic
Strategy
 Development
 On-Going Works/Collaborations
XML-based
System for User-Defined Data
Structure
 Future
Works
 Examples
24
3rd ACES WG mtg. June 2003.
HPC Middleware(HPC-MW)?

Based on idea of "Plug-in" in GeoFEM.
25
3rd ACES WG mtg. June 2003.
System Config. of GeoFEM
Utilities
One-domain mesh
Partitioner
Pluggable Analysis Modules
Structure
Fluid
Wave
Comm.
I/F
構造計算(Static linear)
構造計算(Dynamic
構造計算(
linear)
Contact)
Solver
I/F
Vis.
I/F
Platform
Parallel Equation Visualizer
I/O
solvers
Partitioned mesh
PEs
GPPView
Visualization data
http://geofem.tokyo.rist.or.jp/
26
3rd ACES WG mtg. June 2003.
Local Data Structure
Node-based Partitioning
internal nodes-elements-external nodes
PE#1
21
22
PE#0
23
24
4
PE#1
5
17
12
11
18
13
7
6
19
2
PE#3
3
15
9
4
PE#2
6
7
PE#0
2
3
10
7
8
9
11
10
12
5
11
14
13
4
5
10
10
1
2
3
8
9
11
12
10
9
11
12
9
6
3
1
15
20
14
8
12
25
1
16
6
8
4
8
4
6
5
5
1
2
PE#3
7
7
1
2
PE#2
3
27
3rd ACES WG mtg. June 2003.
What can we do by HPC-MW ?


We can develop optimized/parallel code easily on
HPC-MW from user's code developed on PC.
Library-Type
 most
fundamental approach
 optimized library for individual architecture

Compiler-Type
 Next
Generation Architecture
 Irregular Data

Network-Type
 GRID
Environment (heterogeneous)
 Large-Scale Computing (Virtual Petaflops), Coupling.
28
3rd ACES WG mtg. June 2003.
HPC-MW
Procedure
Utility Software for
Parallel Mesh Generation
& Partitioning
Initial Entire
Mesh Data
FEM Source Code
by Users
CNTL Data
mesh generation
partitioning
Distributed
Local Mesh Data
FEM Codes
HW Data
Library-type
HPC-MW
LINK
Compiler-Type
HPC-MW Generator
Distributed
Local Mesh Data
HPC-MW
Prototype
Source FIle
Network-Type
HPC-MW Generator
Network
HW Data
GEN.
GEN.
Exec. File
Parallel FEM
CNTL Data
FEM
線形ソルバ
線形ソルバ
ソースコード
線形ソルバ
ソースコード
線形ソルバ
Compiler-Type
ソースコード
ソースコード
HPC-MW
Source Code
線形ソルバ
線形ソルバ
ソースコード
線形ソルバ
ソースコード
線形ソルバ
Network-Type
ソースコード
ソースコード
HPC-MW
Source Code
Patch File
for Visualization
Distributed
Local Results
Image File
for Visualization
MPICH, MPICH-G
29
3rd ACES WG mtg. June 2003.
Library-Type HPC-MW
Parallel FEM Code Optimized for ES
FEM code developed on PC
I/F for
I/O
I/O
I/F for
Mat.Ass.
Matrix
Assemble
I/F for
Solvers
Linear
Solver
I/F for
Vis.
Vis.
HPC-MW for Earth Simulator
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Xeon Cluster
I/O
Matrix
Assemble
Linear
Solver
Vis.
HPC-MW for Hitachi SR8000
30
3rd ACES WG mtg. June 2003.
What can we do by HPC-MW ?


We can develop optimized/parallel code easily on
HPC-MW from user's code developed on PC.
Library-Type
 most
fundamental approach
 optimized library for individual architecture

Compiler-Type
 Next
Generation Architecture
 Irregular Data

Network-Type
 GRID
Environment,
 Large-Scale Computing (Virtual Petaflops), Coupling.
31
3rd ACES WG mtg. June 2003.
Compiler-Type HPC-MW
Optimized code is generated by special language/ compiler based
on analysis data (cache blocking etc.) and H/W information.
MPP-A
MPP-B
I/O
Data for
Analysis Model
Parameters
of H/W
F
E
M
Matrix Assemble
Special
Compiler
Linear Solver
Visualization
MPP-C
32
3rd ACES WG mtg. June 2003.
What can we do by HPC-MW ?


We can develop optimized/parallel code easily on
HPC-MW from user's code developed on PC.
Library-Type
 most
fundamental approach
 optimized library for individual architecture

Compiler-Type
 Next
Generation Architecture
 Irregular Data

Network-Type
 GRID
Environment (heterogeneous).
 Large-Scale Computing (Virtual Petaflops), Coupling.
33
3rd ACES WG mtg. June 2003.
Network-Type HPC-MW
Heterogeneous Environment
"Virtual" Supercomputer
analysis
model
space
F
E
M
34
3rd ACES WG mtg. June 2003.
What is new, What is nice ?

Application Oriented (limited to FEM at this stage)
 Various
types of capabilities for parallel FEM are
supported.
 NOT just a library

Optimized for Individual Hardware
 Single
Performance
 Parallel Performance

Similar Projects
 Cactus
 Grid
Lab (on Cactus)
35
3rd ACES WG mtg. June 2003.
 Background
 Basic
Strategy
 Development
 On-Going Works/Collaborations
XML-based
System for User-Defined Data
Structure
 Future
Works
 Examples
36
3rd ACES WG mtg. June 2003.
Schedule
FY.2002
FY.2003
FY.2004
FY.2005
FY.2006
Basic Design
Prototype
Library-type
HPC-MW
Compiler-type
Network-type
HPC-MW
Scalar
Vector
Compiler
Network
FEM Codes
on HPC-MW
Public Release
37
3rd ACES WG mtg. June 2003.
System for Development
HW Vendors
HW Info.
HPC-MW
Public Release
Public Users
Library-Type Compiler-Type Network-Type
I/O
Vis.
AMR
Solvers Coupler
DLB
Feed-back,
Comments
Mat.Ass.
Feed-back
Appl. Info.
Infrastructure
FEM Code on HPC-MW
Solid
Fluid
Thermal
38
3rd ACES WG mtg. June 2003.
FY. 2003

Library-Type HPC-MW
 FORTRAN90,C,
 Public




PC-Cluster Version
Release
Sept. 2003: Prototype Released
March 2004: Full version for PC Cluster
Demonstration on Network-Type HPC-MW in
SC2003, Phoenix, AZ, Nov. 2003.
Evaluation by FEM Code
39
3rd ACES WG mtg. June 2003.
Library-Type HPC-MW
Mesh Generation is not considered









Parallel I/O : I/F for commercial code (NASTRAN etc.)
Adaptive Mesh Refinement (AMR)
Dynamic Load-Balancing using pMETIS (DLB)
Parallel Visualization
Linear Solvers(GeoFEM + AMG, SAI)
FEM Operations (Connectivity, Matrix Assembling)
Coupling I/F
Utility for Mesh Partitioning
On-line Tutorial
40
3rd ACES WG mtg. June 2003.
AMR+DLB
41
3rd ACES WG mtg. June 2003.
Parallel Visualization
Scalar Field
Vector Field
Surface rendering
Streamlines
Interval volume-fitting
Particle tracking
Volume rendering
Topological map
Tensor Field
Hyperstreamlines
LIC
Volume rendering
Extension of functions
Extension of dimensions
Extension of Data Types
42
3rd ACES WG mtg. June 2003.
PMR(Parallel Mesh Relocator)




Data Size is Potentially Very Large in Parallel
Computing.
Handling entire mesh is impossible.
Parallel Mesh Generation and Visualization are
difficult due to Requirement for Global Info.
Adaptive Mesh Refinement (AMR) and Grid
Hierarchy.
43
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation
using AMR

Prepare Initial Mesh
 with
size as large as single PE can
handle.
Initial Mesh
44
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation
using AMR

Partition the Initial Mesh into Local
Data.
 potentially
Partition
very coarse
Local Local
Data Data
Local Local
Data Data
Initial Mesh
45
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation
using AMR

Parallel Mesh Relocation (PMR) by
Local Refinement on Each PEs.
Partition
Local Local
Data Data
PMR
Local Local
Data Data
Initial Mesh
46
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation
using AMR

Parallel Mesh Relocation (PMR) by
Local Refinement on Each PEs.
Refine
Partition
Local Local
Data Data
Local Local
Data Data
Initial Mesh
PMR
Refine
Refine
Refine
47
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation
using AMR

Parallel Mesh Relocation (PMR) by
Local Refinement on Each PEs.
Partition
Local Local
Data Data
Local Local
Data Data
Initial Mesh
PMR
Refine
Refine
Refine
Refine
Refine
Refine
Refine
Refine
48
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation
using AMR

Hierarchical Refinement History
 can
be utilized for visualization &
multigrid
Partition
Local Local
Data Data
Local Local
Data Data
Initial Mesh
PMR
Refine
Refine
Refine
Refine
Refine
Refine
Refine
Refine
49
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation
using AMR

Hierarchical Refinement History
 can
be utilized for visualization &
multigrid
Initial Mesh
mapping results reversely
from local fine mesh to initial
coarse mesh
50
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation
using AMR

Visualization is possible on Initial
Coarse Mesh using Single PE.
 various
Initial Mesh
existing software can be used.
mapping results reversely
from local fine mesh to initial
coarse mesh
51
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation and
Visualization using AMR
Summary
Initial Coarse Mesh
(Single)
52
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation and
Visualization using AMR
Summary
Partition
PMR
Initial Coarse Mesh
(Single)
Local Fine Mesh
(Distributed)
53
3rd ACES WG mtg. June 2003.
Parallel Mesh Generation and
Visualization using AMR
Summary
Partition
PMR
Reverse
Mapping
Second to
the Coarsest
Initial Coarse Mesh
(Single)
Local Fine Mesh
(Distributed)
Initial Coarse Mesh
54
3rd ACES WG mtg. June 2003.
Example (1/3) Initial Entire Mesh
node
elem
: 1,308
:
795
55
3rd ACES WG mtg. June 2003.
Example (2/3) Initial Partitioning
PE
nodes
elements
0
1
2
3
4
5
6
7
224
184
188
207
203
222
202
194
100
100
100
99
99
99
99
99
56
3rd ACES WG mtg. June 2003.
Example (3/3) Refinemenet
57
3rd ACES WG mtg. June 2003.
Parallel Linear Solvers
Parallel Work Ratio
GFLOPS rate
4000
100
GFLOPS
3000
●Flat MPI
2000
1000
0
Parallel Work Ratio: %
○Hybrid
90
80
70
60
50
40
0
16
32
48
64
80
96 112 128 144 160 176 192
NODE#
0
16
32
48
64
80
96 112 128 144 160 176 192
NODE#
On the Earth Simulator:176 node,3.8 TFLOPS
58
3rd ACES WG mtg. June 2003.
Coupler
C
o
u
p
l
e
r
Fluid
Fluid
Coupler
Structure
Structure
Coupler= MAIN
(MpCCI)
Fluid= MAIN
Structure is called from Fluid as a
subroutine trough Couper.
59
3rd ACES WG mtg. June 2003.
Coupler
module hpcmw_mesh
type hpcmw_local_mesh
...
end type hpcmw_local_mesh
end module hpcmw_mesh
Fluid
FLUID
program fluid
Coupler
Structure
use hpcmw_mesh
type (hpcmw_local_mesh) :: local_mesh_b
...
call hpcmw_couple_PtoS_put
call structure_main
call hpcmw_couple_StoP_get
...
end program fluid
STRUCTURE
subroutine structure_main
Fluid= MAIN
Structure is called from Fluid as a
subroutine trough Couper.
use hpcmw_mesh
type (hpcmw_local_mesh) :: local_mesh_b
call hpcmw_couple_PtoS_get
...
call hpcmw_couple_StoP_put
end subroutine structure_main
60
3rd ACES WG mtg. June 2003.
Coupler
module hpcmw_mesh
type hpcmw_local_mesh
...
end type hpcmw_local_mesh
end module hpcmw_mesh
Fluid
FLUID
program fluid
Coupler
Structure
use hpcmw_mesh
type (hpcmw_local_mesh) :: local_mesh_b
...
call hpcmw_couple_PtoS_put
call structure_main
call hpcmw_couple_StoP_get
...
end program fluid
STRUCTURE
Different Mesh
Files
subroutine structure_main
Fluid= MAIN
Structure is called from Fluid as a
subroutine trough Couper.
use hpcmw_mesh
type (hpcmw_local_mesh) :: local_mesh_b
call hpcmw_couple_PtoS_get
...
call hpcmw_couple_StoP_put
end subroutine structure_main
61
3rd ACES WG mtg. June 2003.
Coupler
module hpcmw_mesh
type hpcmw_local_mesh
...
end type hpcmw_local_mesh
end module hpcmw_mesh
Fluid
FLUID
program fluid
Coupler
Structure
use hpcmw_mesh
type (hpcmw_local_mesh) :: local_mesh_b
...
call hpcmw_couple_PtoS_put
call structure_main
call hpcmw_couple_StoP_get
...
end program fluid
Communication !!
STRUCTURE
subroutine structure_main
Fluid= MAIN
Structure is called from Fluid as a
subroutine trough Couper.
use hpcmw_mesh
type (hpcmw_local_mesh) :: local_mesh_b
call hpcmw_couple_PtoS_get
...
call hpcmw_couple_StoP_put
Communication !!
end subroutine structure_main
62
3rd ACES WG mtg. June 2003.
FEM Codes on HPC-MW


Primary Target: Evaluation of HPC-MW itself !
Solid Mechanics
 Elastic,
Inelastic
 Static, Dynamic
 Various types of elements, boundary conditions.



Eigenvalue Analysis
Compressible/Incompressible CFD
Heat Transfer with Radiation & Phase Change
63
3rd ACES WG mtg. June 2003.
Release in Late September 2003

Library-Type HPC-MW
 Parallel

Original Data Structure, GeoFEM, ABAQUS
 Parallel

for Mesh Partitioning
Serial Partitioner,Viewer
 On-line

Linear Solvers
Preconditioned Iterative Solvers (ILU, SAI)
 Utility

Visualization
PVR,PSR
 Parallel

I/O
Tutorial
FEM Code for Linear-Elastic Simulation (prototype)
64
3rd ACES WG mtg. June 2003.
Technical Issues

Common Data Structure
 Flexibility
vs. Efficiency
 Our data structure is efficient ...
 How to keep user's original data structure

Interface to Other Toolkits
 PETSc
(ANL), Aztec/Trillinos (Sandia)
 ACTS Toolkit (LBNL/DOE)
 DRAMA (NEC Europe), Zoltan (Sandia)
65
3rd ACES WG mtg. June 2003.
 Background
 Basic
Strategy
 Development
 On-Going Works/Collaborations
XML-based
System for User-Defined Data
Structure
 Future
Works
 Examples
66
3rd ACES WG mtg. June 2003.
Public Use/Commercialization

Very important issues in this project.

Industry
Education
Research


Commercial Codes
67
3rd ACES WG mtg. June 2003.
Strategy for Public Use


General Purpose Parallel FEM Code
Environment for Development (1): for Legacy Code
 "Parallelization"
 F77->F90,COMMON
-> Module
 Parallel Data Structure, Linear Solvers, Visualization

Environment for Development (2): from Scratch
 Education

Various type of collaboration
68
3rd ACES WG mtg. June 2003.
On-going Collaboration

Parallelization of Legacy Codes
 CFD
Grp. in FSIS project
 Mitsubishi Material: Groundwater Flow
 JNC (Japan Nuclear Cycle Development Inst.): HLW
 others: research, educations.

Part of HPC-MW
 Coupling
Interface for Pump Simulation
 Parallel Visualization: Takashi Furumura (ERI/U.Tokyo)
69
3rd ACES WG mtg. June 2003.
On-going Collaboration

Environment for Development
 ACcESS(Australian
Computational Earth Systems
Simulator) Group

Research Collaboration
 ITBL/JAERI
 DOE ACTS
Toolkit (Lawrence Berkeley National
Laboratory)
 NEC Europe (Dynamic Load Balancing)
 ACES/iSERVO GRID
70
3rd ACES WG mtg. June 2003.
(Fluid+Vibrarion) Simulation
for Boiler Pump

Hitachi
 Suppression

of Noise
Collaboration in FSIS project
 Fluid
 Structure
 PSE
 HPC-MW:

Coupling Interface
Experiment, Measurement
 accelerometers
on surface
71
3rd ACES WG mtg. June 2003.
(Fluid+Vibrarion) Simulation
for Boiler Pump

Hitachi
 Suppression

of Noise
Fluid
Collaboration in FSIS project
 Fluid
Coupler
 Structure
 PSE
 HPC-MW:

Coupling Interface
Structure
Experiment, Measurement
 accelerometers
on surface
72
3rd ACES WG mtg. June 2003.
Parallelization of Legacy Codes


Many cases !!
Under Investigation through real collaboration
 Optimum
Procedure: Document, I/F, Work Assignment
 FEM: suitable for this type of procedure

Works
 to
introduce new matrix storage manner for parallel iterative
solvers in HPC-MW.
 to add subroutine calls for using parallel visualization
functions in HPC-MW.
 to change data-structure is a big issue !!

flexibility vs. efficiency
 problem
specific/general
73
3rd ACES WG mtg. June 2003.
Element Connectivity

In HPC-MW
do icel= 1, ICELTOT
iS= elem_index(icel-1)
in1= elem_ptr(iS+1)
in2= elem_ptr(iS+2)
in3= elem_ptr(iS+3)
enddo

Sometimes...
do icel= 1, ICELTOT
in1= elem_node(1,icel)
in2= elem_node(2,icel)
in3= elem_node(3,icel)
enddo
74
3rd ACES WG mtg. June 2003.
Parallelization of Legacy Codes
Original Code
F
E
M
HPC-MW
optimized
Parallel Code
Input
Input
Input
?
Mat. Conn.
Mat. Conn.
Mat. Conn.
?
Mat. Assem.
Mat. Assem.
Linear Solver
Linear Solver
Visualization
Visualization
Output
Output
Output
Comm.
Comm.
F
E
M
Mat. Assem. ?
Linear Solver
Visualization
?
not optimum
but easy to do.
75
3rd ACES WG mtg. June 2003.
Works with CFD grp. in FSIS

Original CFD Code
 3D

Finite-Volume, Serial, Fortran90
Strategy
 use
Poisson solver in HPC-MW
 keep ORIGINAL data structure: We (HPC-MW) developed
new partitioner.
 CFD people do matrix assembling using HPC-MW's
format: Matrix assembling part is intrinsically parallel

Schedule
 April
3, 2003:
 April 24, 2003:
 May 2, 2003:
 May 12, 2003:
1st meeting, overview.
2nd meeting, decision of strategy
show I/F for Poisson solver
completed new partitioner
76
3rd ACES WG mtg. June 2003.
CFD Code: 2 types of comm.
(1) Inter-Domain Communication
77
3rd ACES WG mtg. June 2003.
CFD Code: 2 types of comm.
(1) Inter-Domain Communication
78
3rd ACES WG mtg. June 2003.
CFD Code: 2 types of comm.
(2) Wall-Law Communication
79
3rd ACES WG mtg. June 2003.
CFD Code: 2 types of comm.
(2) Wall-Law Communication
80
3rd ACES WG mtg. June 2003.
(Common) Data Structure



Users like to keep their original data structure.
But they wan to parallelize the code.
Compromise at this stage
 keep
original data structure
 We (I, more precisely) develop partitioners for individual
users (1day work after I understand the data structure).
81
3rd ACES WG mtg. June 2003.
Domain Partitioning Util. for
User’s Original Data Structure
very important for parallelization of legacy codes

Functions
 Input:
Initial Entire Mesh in ORIGINAL Format
 Output: Distributed Local Meshes in ORIGINAL Format
 Communication Information (Separate File)

Merit
 Original
I/O routines can be utilized.
 Operations for communication are hidden.

Technical Issues
 Basically,
individual support (developing) ... BIG WORK.
 Various data structures for individual user code.
 Problem specific information.
82
3rd ACES WG mtg. June 2003.
Mesh Partitioning
for Original Data Structure
• Original I/O for local
distributed data.
• I/F for Comm.Info. provided
by HPC-MW
common
cntl.
mesh
distributed
Comm. Info.
common
cntl.
mesh
distributed
Comm. Info.
common
cntl.
mesh
distributed
Comm. Info.
common
cntl.
mesh
distributed
Comm. Info.
cntl.
mesh
Partitioner
Initial Entire Data
Local Distributed Data
83
3rd ACES WG mtg. June 2003.
Distributed Mesh + Comm. Info.
Local Distributed Data
common
cntl.
mesh
distributed
Comm. Info.
Original Input
Subroutines
HPCMW_COMM_INIT
This part is hidden except
"CALL HPCMW_COMM_INIT".
84
3rd ACES WG mtg. June 2003.
 Background
 Basic
Strategy
 Development
 On-Going Works/Collaborations
XML-based
System for User-Defined Data
Structure
 Future
Works
 Examples
85
3rd ACES WG mtg. June 2003.
XML-based I/O Func. Generator
K.Sakane (RIST) 8th-JSCES Conf., 2003.
 User's
original data structure can be described
by certain XML-based definition information.
 Generate I/O subroutines (C, F90) for partitioning
utilities according to XML-based definition
information.
 Substitute existing I/O subroutines for partitioning
utilities in HPC-MW with generated codes.
86
3rd ACES WG mtg. June 2003.
XML-based I/O Func. Generator
K.Sakane (RIST) 8th-JSCES Conf., 2003.

Utility partitioning reads initial entire mesh in HPC-MW
format and writes distributed local mesh files in HPC-MW
format with communication information.
Utility for
Partitioning
HPCMW
initial mesh
I/O for Mesh Data
This is ideal for us...
But all of the users are not necessarily
happy with that...
HPCMW
local mesh
comm
HPCMW
local mesh
comm
HPCMW
local mesh
comm
HPCMW
local mesh
comm
87
3rd ACES WG mtg. June 2003.
XML-based I/O Func. Generator
K.Sakane (RIST) 8th-JSCES Conf., 2003.

Generate I/O subroutines (C, F90) for partitioning utilities
according to XML-based definition information.
Definition Data
in XML
Utility for
Partitioning
XML-based System
I/O Func. Generator
for Orig. Data Structure
I/O for Mesh Data
User-Def. Data
Structure
I/O for Mesh Data
88
3rd ACES WG mtg. June 2003.
XML-based I/O Func. Generator
K.Sakane (RIST) 8th-JSCES Conf., 2003.

Substitute existing I/O subroutines for partitioning utilities
in HPC-MW with generated codes.
Definition Data
in XML
Utility for
Partitioning
XML-based System
I/O Func. Generator
for Orig. Data Structure
I/O for Mesh Data
User-Def. Data
Structure
I/O for Mesh Data
89
3rd ACES WG mtg. June 2003.
XML-based I/O Func. Generator
K.Sakane (RIST) 8th-JSCES Conf., 2003.

Substitute existing I/O subroutines for partitioning utilities
in HPC-MW with generated codes.
Utility for
Partitioning
User Defined
initial mesh
I/O for Mesh Data
User-Def. Data
Structure
User Defined
local mesh
comm
HPCMW
local mesh
comm
HPCMW
local mesh
comm
HPCMW
local mesh
comm
90
3rd ACES WG mtg. June 2003.
TAGS in Definition Info. File
 usermesh
 parameter
 define
token
 mesh
ref
starting point
parameter definition
sub-structure definition
smallest definition unit (number, label etc.)
entire structure definition
reference of sub-structure
91
3rd ACES WG mtg. June 2003.
Example
NODE 1 0.
NODE 2 1.
NODE 3 0.
NODE 4 0.
ELEMENT 1
0. 0.
0. 0.
1. 0.
0. 1.
TETRA 1 2 3 4
Tetrahedron
Connectivity
92
3rd ACES WG mtg. June 2003.
Example (ABAQUS, NASTRAN)
** ABAQUS format
*NODE
1, 0., 0., 0.
2, 1., 0., 0.
3, 0., 1., 0.
4, 0., 0., 1.
*ELEMENT, TYPE=C3D4
1 1 2 3 4
$ NASTRAN format
GRID
GRID
GRID
GRID
CTETRA
1
2
3
4
1
0
0
0
0
1
0.
1.
0.
0.
1
0.
0.
1.
0.
2
0.
0.
0.
1.
3
0
0
0
0
4
93
3rd ACES WG mtg. June 2003.
Ex.: Definition Info. File (1/2)
!! HPC-MW format
!NODE
node.start
<?xml version="1.0" encoding="EUC-JP"?>
User can define
<usermesh>
1, 0.0, 0.0, 0.0these ...
#User format
NODE 1 name=“TETRA">4</parameter>
0. 0. 0.
<parameter
Parameters
node.id node.x node.y
node.z
<parameter name=“PENTA">5</parameter>
<parameter name=“HEXA">8</parameter>
<define name="mynode">
<token means="node.start">NODE</token>
<token means="node.id"/>
<token means="node.x"/>
<token means="node.y"/>
<token means="node.z"/>
<token means="node.end"/>
</define>
Sub-structure
94
3rd ACES WG mtg. June 2003.
Ex.: Definition Info. File (2/2)
<define name="myelement">
Sub-Structure
<token means="element.start">ELEMENT</token>
<token means="element.id"/>
<token means="element.type"/>
<token means="element.node" times="$element.type"/>
<token means="element.end"/>
</define>
Corresponding to
#User format
parameter
in
!! <mesh>
HPC-MW format
Entire
Structure
ELEMENT 1"element.type"
TETRA 1 2 3 4
<ref name="mynode"/>
!ELEMENT,
TYPE=311
<ref name="myelement"/>
1, 1,
2, 3, 4
element.type
</mesh>
element.start
element.id
</usermesh>
element.node
95
3rd ACES WG mtg. June 2003.
 Background
 Basic
Strategy
 Development
 On-Going Works/Collaborations
XML-based
System for User-Defined Data
Structure
 Future
Works
 Examples
96
3rd ACES WG mtg. June 2003.
Further Study/Works

Develop HPC-MW
 Collaboration
 Simplified
I/F for Non-Experts
 Interaction is important !!: Procedure for Collaboration
97
3rd ACES WG mtg. June 2003.
System for Development
HW Vendors
HW Info.
HPC-MW
Public Release
Public Users
Library-Type Compiler-Type Network-Type
I/O
Vis.
AMR
Solvers Coupler
DLB
Feed-back,
Comments
Mat.Ass.
Feed-back
Appl. Info.
Infrastructure
FEM Code on HPC-MW
Solid
Fluid
Thermal
98
3rd ACES WG mtg. June 2003.
Further Study/Works

Develop HPC-MW
 Collaboration
 Simplified
I/F for Non-Experts
 Interaction is important !!: Procedure for Collaboration
 Extension to DEM etc.

Promotion for Public Use
 Parallel
FEM Applications
 Parallelization of Legacy Codes
 Environment for Development
99
3rd ACES WG mtg. June 2003.
Current Remarks

If you want to parallelize your legacy code on PC
cluster ...
 Keep your own data structure.
 customized partitioner will be provided (or automatically
generated by the XML system)
 Rewrite
your matrix assembling part.
 Introduce linear solvers and visualization in HPC-MW
HPC-MW
User’s Original
Input
Linear Solver
Mat. Conn.
Visualization
Mat. Assem.
Comm.
Output
100
3rd ACES WG mtg. June 2003.
Current Remarks (cont.)

If you want to optimize your legacy code on the
Earth Simulator
 Use
HPC-MW's data structure
 Anyway, you have to rewrite your code for ES.
 Utilize all components of HPC-MW

hpcmw-workers@tokyo.rist.or.jp
User’s Original
HPC-MW
Input
Linear Solver
Mat. Conn.
Visualization
Mat. Assem.
Comm.
Output
101
3rd ACES WG mtg. June 2003.
Some Examples
102
3rd ACES WG mtg. June 2003.
Simple Interface: Communication
&
&
&
call SOLVER_SEND_RECV_3
( NP, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT,
STACK_EXPORT, NOD_EXPORT, WS, WR, WW(1,ZP) , SOLVER_COMM,
my_rank)
&
&
&
GeoFEM's
Original I/F
module solver_SR_3
contains
subroutine SOLVER_SEND_RECV_3
&
&
( N, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT, &
&
STACK_EXPORT, NOD_EXPORT, &
&
WS, WR, X, SOLVER_COMM,my_rank)
implicit REAL*8 (A-H,O-Z)
include 'mpif.h'
include 'precision.inc'
integer(kind=kint )
integer(kind=kint )
integer(kind=kint ),
integer(kind=kint ),
integer(kind=kint ),
integer(kind=kint ),
integer(kind=kint ),
real
(kind=kreal),
real
(kind=kreal),
real
(kind=kreal),
integer
integer
...
, intent(in)
:: N
, intent(in)
:: NEIBPETOT
pointer :: NEIBPE
(:)
pointer :: STACK_IMPORT(:)
pointer :: NOD_IMPORT (:)
pointer :: STACK_EXPORT(:)
pointer :: NOD_EXPORT (:)
dimension(3*N), intent(inout):: WS
dimension(3*N), intent(inout):: WR
dimension(3*N), intent(inout):: X
, intent(in)
::SOLVER_COMM
, intent(in)
:: my_rank
103
3rd ACES WG mtg. June 2003.
Simple Interface: Communication
&
&
&
call SOLVER_SEND_RECV_3
&
( NP, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT,
&
STACK_EXPORT, NOD_EXPORT, WS, WR, WW(1,ZP) , SOLVER_COMM, &
my_rank)
module solver_SR_3
contains
subroutine SOLVER_SEND_RECV_3
&
&
( N, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT, &
&
STACK_EXPORT, NOD_EXPORT, &
&
WS, WR, X, SOLVER_COMM,my_rank)
implicit REAL*8 (A-H,O-Z)
include 'mpif.h'
include 'precision.inc'
integer(kind=kint )
integer(kind=kint )
integer(kind=kint ),
integer(kind=kint ),
integer(kind=kint ),
integer(kind=kint ),
integer(kind=kint ),
real
(kind=kreal),
real
(kind=kreal),
real
(kind=kreal),
integer
integer
...
, intent(in)
:: N
, intent(in)
:: NEIBPETOT
pointer :: NEIBPE
(:)
pointer :: STACK_IMPORT(:)
pointer :: NOD_IMPORT (:)
pointer :: STACK_EXPORT(:)
pointer :: NOD_EXPORT (:)
dimension(3*N), intent(inout):: WS
dimension(3*N), intent(inout):: WR
dimension(3*N), intent(inout):: X
, intent(in)
::SOLVER_COMM
, intent(in)
:: my_rank
104
3rd ACES WG mtg. June 2003.
Simple Interface: Communication
&
&
&
call SOLVER_SEND_RECV_3
( NP, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT,
STACK_EXPORT, NOD_EXPORT, WS, WR, WW(1,ZP) , SOLVER_COMM,
my_rank)
&
&
&
module solver_SR_3
contains
subroutine SOLVER_SEND_RECV_3
&
&
( N, NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT, &
&
STACK_EXPORT, NOD_EXPORT, &
&
WS, WR, X, SOLVER_COMM,my_rank)
implicit REAL*8 (A-H,O-Z)
include 'mpif.h'
include 'precision.inc'
integer(kind=kint )
integer(kind=kint )
integer(kind=kint ),
integer(kind=kint ),
integer(kind=kint ),
integer(kind=kint ),
integer(kind=kint ),
real
(kind=kreal),
real
(kind=kreal),
real
(kind=kreal),
integer
integer
...
, intent(in)
:: N
, intent(in)
:: NEIBPETOT
pointer :: NEIBPE
(:)
pointer :: STACK_IMPORT(:)
pointer :: NOD_IMPORT (:)
pointer :: STACK_EXPORT(:)
pointer :: NOD_EXPORT (:)
dimension(3*N), intent(inout):: WS
dimension(3*N), intent(inout):: WR
dimension(3*N), intent(inout):: X
, intent(in)
::SOLVER_COMM
, intent(in)
:: my_rank
use hpcmw_util
use dynamic_grid
use dynamic_cntl
type (hpcmw_local_mesh) :: local_mesh
N = local_mesh%n_internal
NP= local_mesh%n_node
ICELTOT= local_mesh%n_elem
NEIBPETOT= local_mesh%n_neighbor_pe
NEIBPE => local_mesh%neighbor_pe
STACK_IMPORT=>
NOD_IMPORT=>
STACK_EXPORT=>
NOD_EXPORT=>
local_mesh%import_index
local_mesh%import_node
local_mesh%export_index
local_mesh%export_node
105
3rd ACES WG mtg. June 2003.
Simple Interface: Communication
use hpcmw_util
type (hpcmw_local_mesh) :: local_mesh
...
call SOLVER_SEND_RECV_3 ( local_mesh, WW(1,ZP))
...
module solver_SR_3
contains
subroutine SOLVER_SEND_RECV_3 (local_mesh, X)
use hpcmw_util
type (hpcmw_local_mesh) :: local_mesh
real(kind=kreal), dimension(:), allocatable, save:: WS, WR
...
use hpcmw_util
use dynamic_grid
use dynamic_cntl
type (hpcmw_local_mesh) :: local_mesh
N = local_mesh%n_internal
NP= local_mesh%n_node
ICELTOT= local_mesh%n_elem
NEIBPETOT= local_mesh%n_neighbor_pe
NEIBPE => local_mesh%neighbor_pe
STACK_IMPORT=>
NOD_IMPORT=>
STACK_EXPORT=>
NOD_EXPORT=>
local_mesh%import_index
local_mesh%import_node
local_mesh%export_index
local_mesh%export_node
106
3rd ACES WG mtg. June 2003.
Preliminary Study in FY.2002

FEM Procedure for 3D Elastic Problem
 Parallel
I/O
 Iterative Linear Solvers (ICCG, ILU-BiCGSTAB)
 FEM Procedures (Matrix Connectivity/Assembling)

Linear Solvers
 SMP
Cluster/Distributed Memory
 Vector/Scalar Processors
 CM-RCM/MC Reordering
107
3rd ACES WG mtg. June 2003.
Method of Matrix Storage

Scalar/Distributed
Memory
 CRS
with
Ordering

PDJDS/CM-RCM
Natural
PDCRS/CM-RCM CRS no re-ordering
short innermost loop
Scalar/SMP Cluster
 PDCRS/CM-RCM
 PDCRS/MC

Vector/Distributed
SMP Cluster
&
 PDJDS/CM-RCM
 PDJDS/MC
108
3rd ACES WG mtg. June 2003.
Main Program
program SOLVER33_TEST
use solver33
use hpcmw_all
implicit REAL*8(A-H,O-Z)
call HPCMW_INIT
call INPUT_CNTL
call INPUT_GRID
call MAT_CON0
call MAT_CON1
call MAT_ASS_MAIN (valA,valB,valX)
call MAT_ASS_BC
call SOLVE33 (hpcmwIarray, hpcmwRarray)
call HPCMW_FINALIZE
FEM program developed by
users
• call same subroutines
• interfaces are same
• NO MPI !!
Procedure of each subroutine
is different in the individual
library
end program SOLVER33_TEST
109
3rd ACES WG mtg. June 2003.
HPCMW_INIT ?
for MPI
subroutine HPCMW_INIT
use hpcmw_all
implicit REAL*8(A-H,O-Z)
call MPI_INIT (ierr)
call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr)
call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, err)
end subroutine HPCMW_INIT
for NO-MPI (SMP)
subroutine HPCMW_INIT
use hpcmw_all
implicit REAL*8(A-H,O-Z)
ierr= 0
end subroutine HPCMW_INIT
110
3rd ACES WG mtg. June 2003.
Solve33 for SMP Cluster/Scalar
module SOLVER33
contains
subroutine SOLVE33 (hpcmwIarray, hpcmwRarray)
use hpcmw_solver_matrix
use hpcmw_solver_cntl
use hpcmw_fem_mesh
use solver_CG_3_SMP_novec
use solver_BiCGSTAB_3_SMP_novec
implicit REAL*8 (A-H,O-Z)
real(kind=kreal), dimension(3,3) :: ALU
real(kind=kreal), dimension(3)
:: PW
!C
!C +------------------+
!C | ITERATIVE solver |
!C +------------------+
!C===
if (METHOD.eq.1) then
call CG_3_SMP_novec
&
( N, NP, NPL, NPU, PEsmpTOT, NHYP, IVECT, STACKmc,
&
NEWtoOLD, OLDtoNEW,
&
D, AL, indexL, itemL, AU, indexU, itemU,
&
B, X, ALUG, RESID, ITER, ERROR,
&
my_rank, NEIBPETOT, NEIBPE,
&
NOD_STACK_IMPORT, NOD_IMPORT,
&
NOD_STACK_EXPORT, NOD_EXPORT,
&
SOLVER_COMM , PRECOND, iterPREmax)
endif
integer :: ERROR, ICFLAG
character(len=char_length) :: BUF
if (METHOD.eq.2) then
call BiCGSTAB_3_SMP_novec
&
( N, NP, NPL, NPU, PEsmpTOT, NHYP, IVECT, STACKmc,
&
NEWtoOLD, OLDtoNEW,
&
D, AL, indexL, itemL, AU, indexU, itemU,
&
B, X, ALUG, RESID, ITER, ERROR,
&
my_rank, NEIBPETOT, NEIBPE,
&
NOD_STACK_IMPORT, NOD_IMPORT,
&
NOD_STACK_EXPORT, NOD_EXPORT,
&
SOLVER_COMM , PRECOND, iterPREmax)
endif
data ICFLAG/0/
integer(kind=kint ), dimension(:) :: hpcmwIarray
real
(kind=kreal), dimension(:) :: hpcmwRarray
!C
!C +------------+
!C | PARAMETERs |
!C +------------+
!C===
ITER
=
METHOD
=
PRECOND
=
NSET
=
iterPREmax=
hpcmwIarray(1)
hpcmwIarray(2)
hpcmwIarray(3)
hpcmwIarray(4)
hpcmwIarray(5)
!C===
&
&
&
&
&
&
&
&
&
&
&
&
&
&
&
&
ITERactual= ITER
end subroutine SOLVE33
end module SOLVER33
RESID
= hpcmwRarray(1)
SIGMA_DIAG= hpcmwRarray(2)
!C===
if (iterPREmax.lt.1) iterPREmax= 1
if (iterPREmax.gt.4) iterPREmax= 4
!C
!C +-----------+
!C | BLOCK LUs |
!C +-----------+
!C===
(skipped)
!C===
111
3rd ACES WG mtg. June 2003.
Solve33 for SMP Cluster/Vector
module SOLVER33
contains
subroutine SOLVE33 (hpcmwIarray, hpcmwRarray)
use hpcmw_solver_matrix
use hpcmw_solver_cntl
use hpcmw_fem_mesh
use solver_VCG33_DJDS_SMP
use solver_VBiCGSTAB33_DJDS_SMP
implicit REAL*8 (A-H,O-Z)
real(kind=kreal), dimension(3,3) :: ALU
real(kind=kreal), dimension(3)
:: PW
integer :: ERROR, ICFLAG
character(len=char_length) :: BUF
data ICFLAG/0/
integer(kind=kint ), dimension(:) :: hpcmwIarray
real
(kind=kreal), dimension(:) :: hpcmwRarray
!C
!C +------------+
!C | PARAMETERs |
!C +------------+
!C===
ITER
=
METHOD
=
PRECOND
=
NSET
=
iterPREmax=
!C
!C +------------------+
!C | ITERATIVE solver |
!C +------------------+
!C===
if (METHOD.eq.1) then
call VCG33_DJDS_SMP
&
&
( N, NP, NLmax, NUmax, NPL, NPU, NHYP, PEsmpTOT,
&
&
STACKmcG, STACKmc, NLmaxHYP, NUmaxHYP, IVECT,
&
&
NEWtoOLD, OLDtoNEW_L, OLDtoNEW_U, NEWtoOLD_U, LtoU,&
&
D, PAL, indexL, itemL, PAU, indexU, itemU, B, X,
&
&
ALUG_L, ALUG_U, RESID, ITER, ERROR, my_rank,
&
&
NEIBPETOT, NEIBPE, NOD_STACK_IMPORT, NOD_IMPORT,
&
&
NOD_STACK_EXPORT, NOD_EXPORT,
&
&
SOLVER_COMM, PRECOND, iterPREmax)
endif
if (METHOD.eq.2) then
call VBiCGSTAB33_DJDS_SMP
&
&
( N, NP, NLmax, NUmax, NPL, NPU, NHYP, PEsmpTOT,
&
&
STACKmcG, STACKmc, NLmaxHYP, NUmaxHYP, IVECT,
&
&
NEWtoOLD, OLDtoNEW_L, OLDtoNEW_U, NEWtoOLD_U, LtoU,&
&
D, PAL, indexL, itemL, PAU, indexU, itemU, B, X,
&
&
ALUG_L, ALUG_U, RESID, ITER, ERROR, my_rank,
&
&
NEIBPETOT, NEIBPE, NOD_STACK_IMPORT, NOD_IMPORT,
&
&
NOD_STACK_EXPORT, NOD_EXPORT,
&
&
SOLVER_COMM, PRECOND, iterPREmax)
endif
!C===
hpcmwIarray(1)
hpcmwIarray(2)
hpcmwIarray(3)
hpcmwIarray(4)
hpcmwIarray(5)
ITERactual= ITER
end subroutine SOLVE33
end module SOLVER33
RESID
= hpcmwRarray(1)
SIGMA_DIAG= hpcmwRarray(2)
!C===
if (iterPREmax.lt.1) iterPREmax= 1
if (iterPREmax.gt.4) iterPREmax= 4
!C
!C +-----------+
!C | BLOCK LUs |
!C +-----------+
!C===
(skipped)
!C===
112
3rd ACES WG mtg. June 2003.
Mat.Ass. for SMP Cluster/Scalar
do ie=
ip =
do je=
jp =
1, 8
nodLOCAL(ie)
1, 8
nodLOCAL(je)
kk= 0
if (jp.gt.ip) then
iiS= indexU(ip-1) + 1
iiE= indexU(ip )
do k= iiS, iiE
if ( itemU(k).eq.jp ) then
kk= k
exit
endif
enddo
endif
if (jp.lt.ip) then
iiS= indexL(ip-1) + 1
iiE= indexL(ip )
do k= iiS, iiE
if ( itemL(k).eq.jp) then
kk= k
exit
endif
enddo
endif
PNXi=
PNYi=
PNZi=
PNXj=
PNYj=
PNZj=
0.d0
0.d0
0.d0
0.d0
0.d0
0.d0
VOL= 0.d0
do kpn= 1, 2
do jpn= 1, 2
do ipn= 1, 2
coef= dabs(DETJ(ipn,jpn,kpn))*WEI(ipn)*WEI(jpn)*WEI(kpn)
VOL= VOL + coef
PNXi= PNX(ipn,jpn,kpn,ie)
PNYi= PNY(ipn,jpn,kpn,ie)
PNZi= PNZ(ipn,jpn,kpn,ie)
PNXj= PNX(ipn,jpn,kpn,je)
PNYj= PNY(ipn,jpn,kpn,je)
PNZj= PNZ(ipn,jpn,kpn,je)
a11= valX*(PNXi*PNXj+valB*(PNYi*PNYj+PNZi*PNZj))*coef
a22= valX*(PNYi*PNYj+valB*(PNZi*PNZj+PNXi*PNXj))*coef
a33= valX*(PNZi*PNZj+valB*(PNXi*PNXj+PNYi*PNYj))*coef
a12= (valA*PNXi*PNYj + valB*PNXj*PNYi)*coef
a13= (valA*PNXi*PNZj + valB*PNXj*PNZi)*coef
...
if (jp.gt.ip) then
PAU(9*kk-8)= PAU(9*kk-8) + a11
...
PAU(9*kk )= PAU(9*kk ) + a33
endif
if (jp.lt.ip) then
PAL(9*kk-8)= PAL(9*kk-8) + a11
...
PAL(9*kk )= PAL(9*kk ) + a33
endif
if (jp.eq.ip) then
D(9*ip-8)= D(9*ip-8) + a11
...
D(9*ip )= D(9*ip ) + a33
endif
enddo
enddo
enddo
enddo
endif
enddo
enddo
113
3rd ACES WG mtg. June 2003.
Mat.Ass. for SMP Cluster/Vector
do kpn= 1, 2
do jpn= 1, 2
do ipn= 1, 2
coef= dabs(DETJ(ipn,jpn,kpn))*WEI(ipn)*WEI(jpn)*WEI(kpn)
do ie= 1, 8
ip = nodLOCAL(ie)
if (ip.le.N) then
do je= 1, 8
jp = nodLOCAL(je)
kk= 0
if (jp.gt.ip) then
ipU= OLDtoNEW_U(ip)
jpU= OLDtoNEW_U(jp)
kp=
PEon(ipU)
iv= COLORon(ipU)
nn= ipU - STACKmc((iv-1)*PEsmpTOT+kp-1)
VOL= VOL + coef
PNXi= PNX(ipn,jpn,kpn,ie)
PNYi= PNY(ipn,jpn,kpn,ie)
PNZi= PNZ(ipn,jpn,kpn,ie)
PNXj= PNX(ipn,jpn,kpn,je)
PNYj= PNY(ipn,jpn,kpn,je)
PNZj= PNZ(ipn,jpn,kpn,je)
do k= 1, NUmaxHYP(iv)
iS= indexU(npUX1*(iv-1)+PEsmpTOT*(k-1)+kp-1) + nn
if ( itemU(iS).eq.jpU) then
kk= iS
exit
endif
enddo
endif
if (jp.lt.ip) then
ipL= OLDtoNEW_L(ip)
jpL= OLDtoNEW_L(jp)
kp=
PEon(ipL)
iv= COLORon(ipL)
nn= ipL - STACKmc((iv-1)*PEsmpTOT+kp-1)
do k= 1, NLmaxHYP(iv)
iS= indexL(npLX1*(iv-1)+PEsmpTOT*(k-1)+kp-1) + nn
if ( itemL(iS).eq.jpL) then
kk= iS
exit
endif
enddo
endif
PNXi=
PNYi=
PNZi=
PNXj=
PNYj=
PNZj=
0.d0
0.d0
0.d0
0.d0
0.d0
0.d0
VOL= 0.d0
a11= valX*(PNXi*PNXj+valB*(PNYi*PNYj+PNZi*PNZj))*coef
a22= valX*(PNYi*PNYj+valB*(PNZi*PNZj+PNXi*PNXj))*coef
a33= valX*(PNZi*PNZj+valB*(PNXi*PNXj+PNYi*PNYj))*coef
a12= (valA*PNXi*PNYj + valB*PNXj*PNYi)*coef
a13= (valA*PNXi*PNZj + valB*PNXj*PNZi)*coef
...
if (jp.gt.ip) then
PAU(9*kk-8)= PAU(9*kk-8) + a11
...
PAU(9*kk )= PAU(9*kk ) + a33
endif
if (jp.lt.ip) then
PAL(9*kk-8)= PAL(9*kk-8) + a11
...
PAL(9*kk )= PAL(9*kk ) + a33
endif
if (jp.eq.ip) then
D(9*ip-8)= D(9*ip-8) + a11
...
D(9*ip )= D(9*ip ) + a33
endif
enddo
enddo
enddo
enddo
endif
enddo
enddo
114
3rd ACES WG mtg. June 2003.
Hardware

Earth Simulator
 SMP
Cluster, 8 PE/node
 Vector Processor

Hitachi SR8000/128
 SMP
Cluster, 8 PE/node
 Pseudo-Vector

Xeon 2.8 GHz Cluster
2
PE/node, Myrinet
 Flat MPI only

Hitachi SR2201
 Pseudo-Vector
 Flat
MPI
115
3rd ACES WG mtg. June 2003.
Simple 3D Cubic Model
z
Uniform Distributed Force in
z-dirrection @ z=Zmin
Uy=0 @ y=Ymin
Ux=0 @ x=Xmin
Nz-1
Ny-1
Uz=0 @ z=Zmin
y
Nx-1
x
116
3rd ACES WG mtg. June 2003.
Earth Simulator, DJDS.
64x64x64/SMP node, up to 125,829,120 DOF
Parallel Work Ratio
GFLOPS rate
100
○Hybrid
GFLOPS
3000
2000
1000
●Flat MPI
0
Parallel Work Ratio: %
4000
90
80
70
60
50
40
0
16
32
48
64
80
96 112 128 144 160 176 192
0
16
32
48
64
80
96
112 128 144 160 176 192
NODE#
NODE#
●:Flat MPI, ○:Hybrid
117
3rd ACES WG mtg. June 2003.
Earth Simulator, DJDS.
100x100x100/SMP node, up to 480,000,000 DOF
Parallel Work Ratio
GFLOPS rate
100
○Hybrid
GFLOPS
3000
2000
●Flat MPI
1000
0
Parallel Work Ratio: %
4000
90
80
70
60
50
40
0
16
32
48
64
80
96 112 128 144 160 176 192
0
16
NODE#
32
48
64
80
96 112 128 144 160 176 192
NODE#
●:Flat MPI, ○:Hybrid
118
3rd ACES WG mtg. June 2003.
Earth Simulator, DJDS.
256x128x128/SMP node, up to 2,214,592,512 DOF
Parallel Work Ratio
GFLOPS rate
4000
100
GFLOPS
3000
●Flat MPI
2000
1000
0
Parallel Work Ratio: %
○Hybrid
90
80
70
60
50
40
0
16
32
48
64
80
96 112 128 144 160 176 192
NODE#
0
16
32
48
64
80
96 112 128 144 160 176 192
NODE#
3.8TFLOPS
for 2.2G DOF
●:Flat MPI, ○:Hybrid
176 nodes (33.8%)
119
3rd ACES WG mtg. June 2003.
Hitachi SR8000/128
PDJDS
PDJDS/CM-RCM
PDJDS
CRS
PDCRS/CM-RCM CRS no re-ordering
short innermost loop
8 PEs/1-SMP node
Flat-MPI
2.50
2.50
2.00
2.00
GFLOPS
GFLOPS
SMP
1.50
1.00
1.50
1.00
0.50
0.50
0.00
0.00
1.E+04
1.E+05
1.E+06
DOF
1.E+07
1.E+04
1.E+05
1.E+06
1.E+07
DOF
●:PDJDS, ○:PDCRS, ▲:CRS-Natural
120
3rd ACES WG mtg. June 2003.
Hitachi SR8000/128
PDJDS
PDJDS/CM-RCM
PDJDS
CRS
PDCRS/CM-RCM CRS no re-ordering
short innermost loop
8 PEs/1-SMP node
PDJDS
2.50
●:SMP
○:Flat-MPI
GFLOPS
2.00
1.50
1.00
0.50
0.00
1.E+04
1.E+05
1.E+06
1.E+07
DOF
121
3rd ACES WG mtg. June 2003.
PDJDS
Xeon & SR2201
PDJDS/CM-RCM
PDJDS
CRS
PDCRS/CM-RCM CRS no re-ordering
short innermost loop
8 PEs
●:PDJDS, ○:PDCRS, ▲:CRS-Natural
Xeon
SR2201
3.00
1.00
2.50
GFLOPS
GFLOPS
0.80
2.00
0.60
0.40
0.20
1.50
1.E+04
1.E+05
1.E+06
DOF
1.E+07
0.00
1.E+04
1.E+05
1.E+06
1.E+07
DOF
122
3rd ACES WG mtg. June 2003.
PDJDS
PDJDS/CM-RCM
Xeon: Speed UP
PDJDS
CRS
PDCRS/CM-RCM CRS no re-ordering
short innermost loop
1-24 PEs
323 nodes/PE
8.00
8.00
6.00
6.00
GFLOPS
GFLOPS
163 nodes/PE
4.00
4.00
2.00
2.00
0.00
0.00
0
8
16
24
32
0
8
PE#
●:PDJDS, ○:PDCRS, ▲:CRS-Natural
16
24
32
PE#
123
3rd ACES WG mtg. June 2003.
PDJDS
PDJDS/CM-RCM
Xeon: Speed UP
PDJDS
CRS
PDCRS/CM-RCM CRS no re-ordering
short innermost loop
1-24 PEs
323 nodes/PE
32
32
24
24
Speed UP
Speed UP
163 nodes/PE
16
16
8
8
0
0
0
8
16
PE#
24
32
0
8
16
24
32
PE#
●:PDJDS, ○:PDCRS, ▲:CRS-Natural
124
3rd ACES WG mtg. June 2003.
PDJDS
PDJDS/CM-RCM
Hitachi SR2201:
Speed UP
PDJDS
CRS
PDCRS/CM-RCM CRS no re-ordering
short innermost loop
1-64 PEs
323 nodes/PE
6.00
6.00
5.00
5.00
4.00
4.00
GFLOPS
GFLOPS
163 nodes/PE
3.00
2.00
3.00
2.00
1.00
1.00
0.00
0.00
0
8
16
24
32
40
48
56
64
0
8
16
24
PE#
●:PDJDS, ○:PDCRS, ▲:CRS-Natural
32
40
48
56
64
PE#
125
Download