Secrets of Supercomputing - Supercomputing Challenge

advertisement
LA-UR-07-6793
Approved for public release;
distribution is unlimited
Secrets of Supercomputing
The Conservation Laws
Supercomputing Challenge Kickoff
October 21-23, 2007
I. Background to Supercomputing
II. Get Wet! With the Shallow Water Equations
Bob Robey - Los Alamos National Laboratory
Randy Roberts – Los Alamos National Laboratory
Cleve Moler -- Mathworks
1
Introductions
• Bob Robey -- Los Alamos National Lab, X division
– brobey@lanl.gov, 665-9052 or home: robeyfamily@earthlink.net,
662-2018
– 3D Hydrocodes and parallel numerical software
– Helped found UNM and Maui High Performance Computing
Centers and Supercomputing Tutorials
• Randy Roberts -- Los Alamos National Lab, D Division
– Java, C++, Numerical and Agent Based Modeling
– rsqrd@lanl.gov
• Cleve Moler
– Matlab Founder
– Former UNM CS Dept Chair
– SIAM President
– Author of “Numerical Computing with Matlab” and “Experiments with
Matlab”
2
Conservation Laws
• Formulated as a conserved quantity
– mass
– momentum
– energy
Conserved
variable
Change
• Good reference is Leveque’s book and his
freely available software package
CLAWPACK (Fortran/MPI) and a 2D
shallow water version Tsunamiclaw
Leveque, Randall, Numerical Methods for Conservation Laws
Leveque, Randall, Finite Volume Methods for Hyperbolic Problems
CLAWPACK http://www.amath.washington.edu/~claw/
Tsunamiclaw http://www.math.utah.edu/~george/tsunamiclaw.html
3
I. Intro to Supercomputing
• Classical Definition of Supercomputing
– Harnessing lots of processors to do lots of small
calculations
• There are many other definitions which usually
include any computing beyond the norm
– Includes new techniques in modeling, visualization,
and higher level languages.
• Question for thought: With greater CPU
resources is it better to save programmer work
or to make the computer do bigger problems?
4
II. Calculus Quickstart
Decoding the Language of
Wizards
5
Calculus Quickstart Goals
• Calculus is a language of mathematical wizards.
It is convenient shorthand, but not easy to
understand until you learn the secrets to the
code.
• Our goal is for you to be able to
READ calculus and
TALK calculus.
• Goal is not to ANALYTICALLY SOLVE calculus
using traditional methods. In supercomputing we
generally solve problems by brute force.
6
Calculus Terminology
• Two branches of Calculus
– Integral Calculus
– Derivative Calculus
• P = f(x, y, t)
– Population is a function of x, y, and t
• ∫f(x)dx – definite integral, area under the curve, or
summation
• dP/dx – derivative, instantaneous rate of change, or
slope of a function
• ∂P/∂x – partial derivative implying that P is a function of
more than one variable
7
Matrix Notation
 a  c 


0
b  d 
 t   x
This is just a system of equations
The first set of terms are state variables at time t and
usually called U. The second set of terms are the flux
variables in space x and usually referred to as F.
a+c=0
b+d=0
F
U
8
Patterns for Parallel Programming
Parallel Algorithms
• Data Parallel -– most
common with MPI
• Master/Worker – one
process hands out the
work to the other
processes – great load
balance, good with
threads
• Pipeline – bucket brigade
Implementation Patterns
•
•
•
•
Message Passing
Threads
Shared Memory
Distributed Arrays, Global
Arrays
Patterns for Parallel Programming, Mattson, Sanders, and Massingill, 2005
9
Writing a Program
Data Parallel Model
Serial operations are done
on every processor so that
replicated data is the same
on every processor.
P(400) – distributed
Sections of distributed data are
“owned” by each processor.
This is where the parallel
speedups occur.
Ptot -- replicated
Often ghost cells around each
processor’s data is a way to
handle communication.
This may seem like a waste
of work, but it is easier than
synchronizing data values.
Proc 1
Proc 2
Proc 3
Proc 4
P(1-100)
P(101-200)
P(201–300)
P(301-400)
Ptot
Ptot
Ptot
Ptot
10
2007-2008 Sample
Supercomputing Project
• Evaluation Criteria – Expo (Report slightly
different). Use these to evaluate the
following project.
– 15% Problem Statement
– 25% Mathematical/Algorithmic Model
– 25% Computational Model
– 15% Results and Conclusions
– 10% Code
Evaluate Us!!
– 10% Display
11
Get Wet!
With the Shallow Water Equations
• The shallow water model for wave motion is
important for water flow, seashore waves, and
flooding
• Goal of this project is to model the wave motion
in the shallow water tank
• With slight modifications this model can be
applied to:
– ocean or lake currents
– weather
– glacial movement
12
Output from a shallow water equation
model of water in a bathtub.
The water experiences 5 splashes which generate surface
gravity waves that propagate away from the splash locations
and reflect off of the bathtub walls. Wikipedia commons,
Author Dan Copsey
Go to shallow water movie.
http://en.wikipedia.org/wiki/Image:Shallow_water_waves.gif
13
Mathematical
Equations
Mathematical
Model
Shallow Water Equations
Notes: mass equals
height because width,
depth and density are
all constant
Conservation of Mass
ht  (hu ) x  0
Conservation of Momentum
Note: Force term,
Pressure P=½gh2
(hu )t  (hu  gh ) x  0
2
1
2
2
h -> height
u -> velocity
g -> gravity
References: Leveque, Randall, Finite Volume Methods for Hyperbolic
Problems, p. 254
14
Shallow Water Equations
Matrix Notation
uh

h  
hu    2 1 2   0.
  t hu  2 gh  x
wave speed  gh0
The maximum time
step is calculated so
as to keep a wave
from completely
crossing a cell.
15
Numerical Model
• Lax-Wendroff two-step, a predictor-corrector
method
– Predictor step estimates the values at the zone
boundaries at half a time step advanced in time
– Corrector step fluxes the variables using the predictor
step values
• Mathematical Notes for next slide:
– U is a state variable such as mass or height.
– F is a flux term – the velocity times the state variable
at the interface
– superscripts are time
– subscripts are space
16
The Lax-Wendroff Method
Half Step
1
n 2
U i  1  0.5(U
n
i 1
2
t
n
n
U ) 
( Fi 1  Fi )
2x
n
i
Whole Step
n 1
Ui
t n
n
 U i  ( Fi   Fi  )
x
1
2
n
1
2
1
2
1
2
Explanation graphic courtesy of Jon Robey and Dov Shlacter, 2006-2007
Supercomputing Challenge
17
Explanation of Lax-Wendroff Model
Data assumed to be
at the center of cell.
Ghost cell
Physical model
t
i
Original
Space index
t+.5
i+.5
Half-step
Full step
t+1
i
Explanation graphic courtesy of Jon Robey and Dov Shlacter, 2006-2007
Supercomputing Challenge. See appendix for 2D index explanation.
18
Extension to 2D
• The extension of the shallow water
equations to 2D is shown in the following
slides.
– First slide shows the matrix form of the 2D
shallow water equations
– Second slide shows the 2D form of the LaxWendroff numerical method
19
2D Shallow Water Equations
uh

 
vh
h  

hu   hu 2  1 gh 2    huv
2
  0.
 
  
 hv 2  1 gh 2 
hv  t  huv
x 
2
y
U
F
G
Note the addition of fluxes in the y direction and a flux cross term
in the momentum equation. The U, F, and G are shorthand for
the numerical equations on the next slide. The U terms are the
state variables. F and G are the flux terms in x and y.
20
The Lax-Wendroff Method
Half Step
1
n 2
U i 1 , j
2
1
n 2
U i, j 1
2
t
 0.5(U  U ) 
( Fi n1, j  Fi ,nj )
2x
t
n
n
 0.5(U i , j 1  U i , j ) 
(Gin, j 1  Gin, j )
2y
n
i 1, j
n
i, j
Whole Step
U
n 1
i, j
U
n
i, j
1
1
t n  12
t n 12
n 2
n 2

( Fi  1 , j  Fi  1 , j ) 
(Gi , j  1  Gi , j  1 )
2
2
2
2
x
y
21
2D Shallow Water Equations
Transformed for Programming
Letting H = h, U = hu and V = hv so that our
main variables are the state variables in the first
column gives the following set of equations.
U

 
V
H  

U   U 2 / H  1 gH 2    UV / H
2
  0.
 
  
 V 2 / H  1 gH 2 
V  t  UV / H
x 
2
y
H is height (same as mass for constant width, depth and density)
U is x momentum (x velocity times mass)
V is y momentum (y velocity times mass)
22
Sample Programs
• The numerical method was extracted from
the McCurdy team’s model (team 62) from
last year and reprogrammed from serial
Fortran to C/MPI using the programming
style from one of the Los Alamos team’s
project (team 51) with permission from
both teams.
• Additional versions of the program were
made in Java/Threads and Matlab
23
Programming Tools
Three options
1.
Matlab
–
2.
Computation and graphics integrated into Matlab desktop
Java/Threads
–
–
3.
Eclipse or Netbeans workbench
Graphics via Java 2D and Java Free Chart
C/MPI
–
–
–
–
Eclipse workbench -- An open-source Programmers
Workbench http://www.eclipse.org.
PTP (parallel tools plug-in) – adds MPI support to Eclipse
(developed partly at LANL)
OpenMPI – a MPI implementation (developed partly at LANL)
MPE -- graphics calls that come with MPICH. Graphics calls
are done in parallel from each processor!
24
Initial Conditions and Boundary
Conditions
• Initial conditions
– velocity (u and v) are 0 throughout the mesh
– height is 2 with a ramp to the height of 10 at the right
hand boundary starting at the mid-point in the x
dimension
• Boundary conditions are reflective, slip
– hbound=hinterior; uxbound=0; vxbound=vinterior
– hbound=hinterior; uybound=uinterior; vybound=0
– If using ghost cells, force zero velocity at the
boundary by setting Uxghost= -Uinterior
25
Results/Conclusions
• The Lax-Wendroff model accurately
models the experimental wave tank
– matches wave speed across the tank
• Some of the oscillations in the simulation
are an artifact of the numerical model
– OK as long as initial wave is not too steep
– numerical damping technique could be added
but is beyond the scope of this effort
26
Acknowledgements
Work used by permission:
• Awash: Modeling Wave Movement in a Ripple Tank,
Team 62, McCurdy High School, 2006-2007
Supercomputing Challenges
• A Lot of Hot Air: Modeling Compressible Fluid Dynamics,
Team 51, Los Alamos High School, 2006-2007
Supercomputing Challenge
We all have bugs and thanks to those who found mine
• Randy Roberts and Jon Robey for finding and fixing a
bug in the second pass
• Randy Leveque for finding a missing square in the
gravity forcing term
27
Lab Exercises
• TsunamiClaw
• Matlab
• Experimental demonstration
• Java Serial
• Java Parallel
• C/MPI
28
Java Wave Structure
• Wave class does most of the work
– main(String[] args) calls start()
– start() creates a WaveProblemSetup
– start() calls methods to do initialization and
boundary conditions
– start() calls methods to iterate and update the
display
Java Wave Structure (continued)
• WaveProblemSetup stores the new and old
arrays
• swaps the new and old arrays when asked
to by Wave
Java Wave Program Flow
• Create arrays for new, old, and temporary
data
• Initialize data
• Set boundary data to represent correct
boundary conditions
• Iterate for the given number of iterations
Java Wave Iteration Flow
• Update physics into new arrays from data
in old arrays
• Set boundary data to represent correct
boundary conditions with updated arrays
• Update display
• Swap new arrays with old arrays
Java Threads
• How do you take advantage of new MultiCore processors?
• Run parts of the problem on different cores
at the same time!
Java Threads (continued)
• WaveThreaded program
– partitions the problem into domains using
SubWaveProblemSetup objects
– runs calculations on each domain in separate
threads using WaveWorker objects
– adds complexity with synchronization of
thread's access to data
C/MPI Program Diagram
Allocate memory
Set Initial Conditions
Initial Display
Update Boundary Cells
MPI Communication
External Boundaries
First Pass
x half step
y half step
Second Pass
Swap new/old
Graphics Output
Conservation Check
Repeat
Calculate Runtime
Close Display, MPI & exit
35
MPI Quick Start
•
•
#include <mpi.h>
MPI_Init(&argc, &argv)
•
•
MPI_Comm_size(Comm, &nprocs) // get number of processors
MPI_Comm_rank(Comm, &myrank) // get processor rank 0 to nproc-1
•
•
// Broadcast from source processor to all processors
MPI_Bcast(buffer, count, MPI_type, source, Comm)
•
•
•
•
// Used to update ghost cells
MPI_ISend(buffer, count, MPI_type, dest, tag, Comm, req)
MPI_IRecv(buffer, count, MPI_type, source, tag, Comm, req+1)
MPI_Waitall(num, req, status)
•
•
// Used for sum, max, and min such as total mass or minimum timestep
MPI_Allreduce(&num_local, &num_global, count, MPI_type, MPI_op, Comm)
•
MPI_Finalize()
•
Web pages for MPI and MPE at Argonne National Lab (ANL) -- http://wwwunix.mcs.anl.gov/mpi/www/
36
Setup
• The software is already setup on the
computers
• For setup on home computers, there are
two parts. First download the files from the
Supercomputing Challenge website for the
lab in C/MPI if you haven’t already done
that.
• Untar the lab files with
“tar –xzvf Wave_Lab.tgz”
37
Setting up Software
Instructions in the README file
Setting up System Software
• Need Java, OpenMPI and
MPE package from
MPICH
• Download and install
according to instructions
in openmpi_setup.sh
• Can install in user’s
directory with some
modifications
Setting up User’s
workspace
• Download eclipse
software including
eclipse, PTP and PLDT
• Install according to
instructions in
eclipse_setup.sh
• Import wave source files
and setup eclipse
according to instructions
in eclipse_setup.sh
38
Lab Exercises
• Try modifying the sample program (Java and/or C
versions)
– Change initial distribution. How sharp can it be before it goes
unstable?
– Change number of cells
– Change graphics output
– Try running 1, 2, or 4 processes and time the runs. Note that you
can run 4 processes even if you are on a one processor system.
– Switch to PTP debug or Java debug perspective and try stepping
through the program
• Comparing to data is critical
– Are there other unrealistic behaviors of the model?
– Design an experiment to isolate variable effects. This can greatly
improve your model.
39
Appendix A.
Calculus and Supercomputing
• Calculus and Supercomputing are intertwined.
Why?
• Here is a simple problem – Add up the volume of
earth above sea-level for an island 500 ft high by
half a mile wide and twenty miles long.
• Typical science homework problem using simple
algebra. Can be done by hand. Not appropriate
for supercomputing. Not enough complexity.
40
Add Complexity
• The island profile is a jagged mountainous terrain
cut by deep canyons. How do we add up the
volume?
• Calculus – language of complexity
– Addition – summing numbers
– Multiplication – summing numbers with a constant
magnitude
– Integration – summing numbers with an irregular
magnitude
41
Divide and Conquer
• In discrete form
∑ -- Summation symbol
∆ -- delta symbol or x2-x1
• Divide the island into small pieces and sum up
the volume of each piece.
• Approaches the solution as the size of the
intervals grows smaller for a jagged profile.
42
Divide and Conquer
• In Continuous Form – Integration
• Think of the integral symbols as describing a shape
that is continuously varying
• The accuracy of the solution can be improved by
summing over smaller increments
• Lots of arithmetic operations – now you have a
“computing” problem. Add more work and you have a
“supercomputing” problem.
43
Derivative Calculus
Describing Change
• Derivatives describe the change in a variable
(numerator or top variable) relative to another variable
(denominator or bottom). These three derivatives
describe the change in population versus time, xdirection and y-direction.
P P
P
, and
t x
y
44
Appendix B.
Computational Methods
1. Eulerian and Lagrangian
2. Explicit and Implicit
45
Two Main Approaches to Divide up
Problem
• Eulerian – divide up by spatial coordinates
– Track populations in a location
– Observer frame of reference
• Lagrangian – divide up by objects
– Object frame of reference
– Easier to track attributes of population since they
travel with the objects
– Agent based modeling of Star Logo uses this
approach
– Can tangle mesh in 2 and 3 dimensions
46
Eulerian
Eulerian – The area stays fixed and has a
Population per area. We observe the
change in population across the
boundaries of the area.
Population moves out of cell
Eulerian
Lagrangian – The population stays
Population moves and so does region
constant. The population moves with
velocity vx and vy and we move with
them. The size of the area will change if
the four vertexes of the rectangle move at Lagrangian
different velocities. Changes in area will
result in different densities.
47
Explicit versus Implicit
• Explicit – In mathematical shorthand, Un+1=
f(Un). This means that the next timestep values
can be expressed entirely on the previous
timestep values.
• Implicit – Un+1=f(Un+1,Un). Next timestep values
must be solved iteratively. Often uses a matrix or
iterative solver.
• We will stick with explicit methods here. You
need more math to attempt implicit methods.
48
Appendix C
Index Explanation for 2D Lax
Wendroff
49
Programming
• Most difficult part of programming this method is
to keep track of indices – half step grid indices
cannot be represented by ½ in the code so they
have to be offset one way or the other.
• Errors are very difficult to find so it is important
to be very methodical in the coding.
• Next two slides show the different sizes of the
staggered half-step grid and the relationships
between the indices in the calculation (courtesy
Jon Robey).
50
i
0
1
2
3
y
y
y
4
1st Pass
0
1
x
x
y
j
2
x
y
x
y
3
x
x
y
x
y
x
y
x
x
j,i
-- 1,0 | 1,1
-- j+1,i | j+1,i+1
y
x
y
0,0
x
y
X step grid
Main grid
4
0,0
j,i
-- 0,1 | 1,1
-- j,i+1 | j+1,i+1
Y step grid
Main grid
51
i
0
1
2
3
y
y
y
4
2nd Pass
0
1
x
x
y
j
2
x
y
x
y
3
x
x
y
x
y
x
y
x
x
j,i
-- 0,0 | 0,1
-- j-1,i-1 | j-1,i
y
x
y
1,1
x
y
Main grid
X step grid
4
1,1
j,i
-- 0,0 | 1,0
-- j-1,i-1 | j,i-1
Main grid
Y step grid
52
Download