Parallel Weather Prediction University of Illinois at Chicago Jeevan Joseph Meriya Susan Thomas

advertisement
Parallel Weather Prediction
University of Illinois at Chicago
Jeevan Joseph
Meriya Susan Thomas
Weather prediction- ancient times
•
Based on:
o
Knowledge of local weather.
o
Depended on experience, memory and a variety of
empirical rules.
•
o
Guesswork based on intuition.
o
Animal reading/response.
What about oceanic weather predictions?
Weather Forecasting- early times
•
Advection: Transport of fluid characteristics and properties by
the movement of fluid itself.
•
•
Cons:
o
Advection is non-linear
o
Assumption of constant wind
Advancements: Vilhelm Bjerknes (1890 )
o
Diagnostic Step
o
Prognostic Step
Early Numerical Weather Prediction
•
Richardson: Dream of parallel weather factory [Lynch, CUP, 2006]
o Expressed physical principles as system of equations, and
used finite difference method to solve the system.
•
Jules Charney: Meteorology Project [Charney, J.Meteor, 1947]
o
Lack of computing power
o
Quasi-Geostrophic system.

"Eliminate unimportant acoustic and shearinggravitational oscillations"
General Circulation Model
General wind directions.
Image from NASA
Numerical Weather Prediction
•
•
Uses mathematical models of the atmosphere and oceans
Includes massive datasets, complex calculations should be
performed on supercomputers.
o IFS grid containing 8 x 105 points on the surface, 91
levels, 5 prognostic variables => 3 x 108 data values.
•
Fundamental Problem - chaotic nature of the partial
differential equations used to simulate the atmosphere.
"Numerical Weather Prediction." Wikipedia: The Free Encyclopedia. Wikimedia Foundation, Inc., 9 May 2013.
Web. 5 May 2013
Elements of NWP
•
Initialization
The process of entering observation data into the model to generate initial
conditions is called initialization.
o Methods to gather Observational Data - weather satellites, radiosonde, some
research projects use reconnaissance aircraft
What is an Atmospheric Model?
An atmospheric model is a computer program that produces meteorological
information for future times at given locations and altitudes. Within any modern model
is a set of equations, known as the primitive equations, used to predict the future state
of the atmosphere.
"Atmospheric Model." Wikipedia: The Free Encyclopedia. Wikimedia Foundation, Inc., 15 March 2013. Web. 5
May 2013
Elements of NWP - Contd.
•
Computation
The equations used are nonlinear partial differential equations which are impossible
to solve exactly through analytical methods.
o Different solution methods used by global weather models:
 Finite Difference Method - world is represented as discrete points on a
regularly spaced grid of latitude and longitude and is applied for all
three spatial dimensions.
 Spectral Method - solve for a range of wavelengths, applied for the
horizontal dimensions and finite-difference methods in the vertical.
o Equations are initialized from data and rates of change are determined
o Examples: UKMET Unified Model run 6 days into future, ECMWF's IFS and
EC's GEM Model run 10 days into future and GFS by EMC run 16 days into
future. [Wiley, SISSDAPDP, 1991]
"Numerical Weather Prediction." Wikipedia: The Free Encyclopedia. Wikimedia Foundation, Inc., 9 May
2013. Web. 5 May 2013
Grid Point Calculation
Credit: from Mann & Kump, Dire Predictions
Integrated Forecasting System
IFS [Barros, Parallel Computing, 1995.]
•
T799 Triangular truncation at wavenumber 213 [Ritchie, Monthly Weather
Review, 1995]
o
Uniform spatial resolution (25 km) over the surface of the
sphere
•
•
•
91 levels in the vertical axis.
3 x 10 8 data values collected for each calculations.
Three different discrete function spaces:
o
Grid-point Space
o
Fourier Space
IFS- contd.
•
•
Data dependencies.
o
FFT: Zonal wave no(m) are dependent on grid-point latitude data
o
Legendre transforms: Zonal wave no(m) are dependent on longitude data
o
Physical computations involve vertical data(Z)
Advection Schemes
o
Basic: Eulerian. All grid columns can be considered to be independent, allowing
simple data decomposition.
o
Semi-Lagrangian. Access to data from nearby grid columns.
IFS- contd.
•
Approach to Parallelization
o
o
Native macro-tasking used whenever possible.

Pro: Make use of the existing partitioning, reducing transposition

Con: Limited parallelism.
Uses PARMACS, a portable message passing library which can be used in
distributed memory computers.
o
Transposition: Complete data is redistributed to processes at various stages so
that interprocess communication can be minimized.

Data dependencies exist only within one coordinate direction, this being
different for various algorithmic components.
IFS- contd.
•
•
Transposition strategy
Grid point
Transposition
Fourier Space
Spectral
[Thole, Parallel Computing, 1995]
Transposition
IFS- Transposition Strategy
•
•
Partitioning levels across NB sectors. All transforms between spectral and fourier
space can be carried out independently within A set.
NB A-Sets, each comprising of NA processors
IFS- Transposition Strategy, contd..
•
•
•
After Inverse FT, levels divided among NB A-Sets. Latitudes divided within A-Sets.
Full grid columns available for physical computations.
Transposition carried out independently in the B-Set
IFS- Transposition Strategy, contd..
•
•
After grid point computation, Fourier Transform carried out. Data is then transposed
back to Fourier Space.
This is the inverse transposition of the previous step.
IFS- Transposition Strategy, contd..
•
•
•
Transposition No.4 carried out between Fast- Fourier stage and Legendre.
In order to allow Legendre transforms to be carried out locally.
This is the inverse transposition of the first step(Transposition 4).
IFS- Transposition Strategy, contd..
•
•
•
•
Timestep is calculated in spectral space.
Coefficients are wave number(m) and levels.
Entails vertical dependencies.
After transposition, each proc. has all levels of only part of the subset of spectral
coefficients handled by its own
set, enabling vector-matrix
multiplication locally.
IFS- Transposition Strategy, contd..
•
•
•
Spectral data transposed to previous distribution
Time step is completed.
Both Transpositions 5, 6 are done independently with the B set.
Semi-Lagrangian Advection(SLT)
•
•
SLT happens in grid point phase before physics calculation.
Each processor is assigned the computation of a number of
grid columns, referred to as core dataset
•
Halo dataset is now constructed comprising of all data likely
to be used by proc for SLT computation.
•
•
•
Halo extend criteria: timestep, maximum wind etc.
Halo obtained through message exchange.
Amount of data depends on grid column decomposition.
SLT Grid Column Decomposition
•
Apple Decomposition
o
Assigns grid points along latitude
o
Long and thin
o
High amount of data
Barros, Parallel Computing, 1995.
SLT Grid Column
Decomposition,Contd.
•
ORANGE Decomposition
o
Assigns grid points along latitude, longitudinal boundaries
creates boxes.
o
Box structure
o
Low amount of data transfer
Barros, Parallel Computing, 1995.
SLT Grid Column
Decomposition,Contd.
•
ORANGE at Poles: IGLOO
o
Assigns grid points along latitude, longitudinal boundaries
creates boxes.
o
Significant closer to poles
o
Reduces data duplication
Barros, Parallel Computing, 1995.
SLT Grid Column
Decomposition,Contd.
S.R.M. Barros, Parallel Computing, 1995.
IFS conclusion
•
•
•
•
•
The data distribution is handled by a high level transposition
strategy, isolating message passing into routines.
Ensembles.
Parallel efficiency for transposition method is very high.
SLT tuned to get lower data communication in a halo region.
Vast majority of IFS contain no specialized code related to
parallelism, hence making maintenance and optimization
easier.
Medium Range Weather Forecasting Model on PARAM - a
Parallel Machine [Kaginalkar, CDAC Technical Report, 1995]
•
•
T80L18, global weather model is parallelized.
Includes triangular truncation and spectral resolution of 80
waves, 128 latitudes and 256 longitudes with 18 varying
pressure levels in vertical direction.
•
Equations derived from conservation laws of mass,
momentum and energy.
•
Finite difference method is used to approximate the
derivatives in vertical direction.
Medium Range Weather Forecasting Model on PARAM - a
Parallel Machine [Kaginalkar, CDAC Technical Report, 1995]
•
Basic Algorithm Idea
o Step 1: Input spectral coefficients for all m, n and k
o
Step 2: Compute Fourier coefficients using inverse
Legendre's transform for all m, j and k.
o
Step 3: Compute Gaussian grid point values using the
inverse Fourier transform for all l,j and k.
o
Step 4: Compute non-linear terms and physics in grid
point domain on Gaussian grid.
o
Step 5: Compute Fourier coefficient using the direct
Fourier transform
o
Step 6: Compute spectral coefficients by direct
Medium Range Weather Forecasting Model on PARAM - a
Parallel Machine [Kaginalkar, CDAC Technical Report, 1995]
0 <= m, n <= M, spectral indices and range
1 <= j <= 128, latitude index and range
1 <= k <= 18, vertical level index and range
0 <= l <= 256, longitude index and range
Medium Range Weather Forecasting Model on PARAM - a
Parallel Machine [Kaginalkar, CDAC Technical Report, 1995]
•
•
•
•
PARAM - a parallel Machine
PARAM8600 has 16 processors and 64 transputers.
PARAM9000SS has 64MByte memory for each processor
PARAM OpenFrame is fastest with ethernet link speed of
100Mbits/s and myrinet with speed of 1.2Gbits/s, for 8 dual
Core Processors
Medium Range Weather Forecasting Model on PARAM - a
Parallel Machine [Kaginalkar, CDAC Technical Report, 1995]
•
•
•
•
•
Parallelisation Strategy
Main Task - Identify the nature of parallelisation involved
and which was the most compute intensive parts and their
data dependencies.
Most obvious and clear data independent and compute
intensive work is across the latitudes.
Latitude wise decomposition. Pair of latitudes will be placed
in each processor.
Major Computation is done in Fourier and Legendre
transform.
Medium Range Weather Forecasting Model on PARAM - a
Parallel Machine [Kaginalkar, CDAC Technical Report, 1995]
•
•
•
•
•
•
Parallelisation Strategy - Contd.
Each processor does the localized work which is computing
Fourier coefficients and gaussian grid point values, non-linear
terms and physics in grid point domain.
Then, each processor computes Fourier coefficients only for
its assigned latitudes - FFT.
FFT is computed independently, without any inter-processor
communication.
Partial sum of Legendre Transform computed in each
processor for the given latitudes.
These sums are circulated among all processors to have
global sum on each processor.
Medium Range Weather Forecasting Model on PARAM - a
Parallel Machine [Kaginalkar, CDAC Progress Report, 19951996]
•
Communication Pattern
Medium Range Weather Forecasting Model on PARAM - a
Parallel Machine [Kaginalkar, CDAC Progress Report, 19951996]
•
•
Results and Performance
Evaluated the scientific accuracy of parallel T80 code against
CRAY forecast output. The partial double precision results of
T80 on PARAM approximate the full double precision results
of CRAY within 5% variation.
Medium Range Weather Forecasting Model on PARAM - a
Parallel Machine [Kaginalkar, CDAC Progress Report, 1995-1996]
Medium Range Weather Forecasting Model on PARAM - a
Parallel Machine [Kaginalkar, CDAC Technical Report, 1995]
•
Future Work
•
Exploiting fast global communication routines to
reduce communication overheads as number of
processors increase.
•
Exploration of faster intrinsic functions and
better utilization of cache
GPU Acceleration Of Numerical Weather
Prediction [Michalakes, IPDPS 2008]
•
Even though there is large-scale parallelism in weather
models, performance increase has come from processor speed
than increased parallelism.
•
Alternative method - exploiting emerging architectures using
fine grained parallelism once used in vector machines.
•
This paper demonstrates a nearly 8x speedup for a
computationally intensive module of Weather Research and
Forecast (WRF) model on a variety of NVIDIA Graphics
Processing Units (GPU)
GPU Acceleration Of Numerical Weather
Prediction [Michalakes, IPDPS 2008]
•
•
GPU-based Computing
•
•
Today's Scenario
•
•
Low cost, low power (watts per flop) and very high
performance alternative.
CPUs unable to exploit parallelism much finer than one subdomain i.e, one geographic region assigned to one processor.
GPUs introduce layers of concurrency between data-parallel
threads with fast context switching.
Also have dedicated memories to provide the bandwidth
needed for high FLOPs
GPU Acceleration Of Numerical Weather
Prediction [Michalakes, IPDPS 2008]
•
Weather Research and Forecast model - most widely used,
uses explicit finite difference approximation method,
represents atmosphere over a 3-dimensional grid.
•
WRF Single Moment 5-tracer (WSM5) - computationally
intensive physics module, is only 0.4% of total source code
but takes a quarter of total run time on single processor.
•
On an average, WSM5 involves 2400 floating point multiplyequivalent operations per cell per invocation
GPU Acceleration Of Numerical Weather
Prediction [Michalakes, IPDPS 2008]
•
NVIDIA GPUs - NVIDIA 8800 GTX, Quadro
5600 and a pre-release of GTX 200
•
NVIDIA 8800 GTX: Eight physical processors
as a SIMD unit in one multiprocessor and there
are 16 multiprocessors. 768 MB multiported
SDRAM with each multiprocessor. A local 16 kb
thread shared memory.
GPU Acceleration Of Numerical Weather
Prediction [Michalakes, IPDPS 2008]
•
•
•
•
•
•
One writes a CUDA kernel for the GPU
Each kernel is a collection of threads arranged to blocks and
grids.
Each block is bound to a virtual multiprocessor. The
hardware will time-share the multiprocessor among the
blocks.
More threads per block, better the performance will be.
Kernel should have enough blocks to simultaneously utilize
the multiprocessors in a given NVIDIA GPU.
Memory - data that does not fit in fast shared memory must
be stored in slower DRAM device memory.
GPU Acceleration Of Numerical Weather
Prediction [Michalakes, IPDPS 2008]
•
Validation
GPU Acceleration Of Numerical Weather
Prediction [Michalakes, IPDPS 2008]
•
Performance
GPU Acceleration Of Numerical Weather
Prediction [Michalakes, IPDPS 2008]
•
Cost per call
Conclusion
•
•
Pre-computer era weather forecast was a nightmare.
•
Computation introduced massive speed, but without
parallelization, we computation was still behind the weather
change.
•
Today, with parallel computing and advanced models, a
complete prediction of up to 51 ensembles can be calculated
in 10 hours.
•
Sudden weather changes still unpredictable.
References
•
•
[Kaginalkar, CDAC Progress Report, 1995]: A. Kaginalkar and S. Purohit.
"Benchmarking of Medium Range Weather Forecasting Model on PARAM -A
parallel machine". CDAC Technical Report, CDAC, India, 1995.
• [Barros, Parallel Computing, 1995.] S.R.M. Barros, et. al., The IFS model: A
parallel production weather code, Parallel Computing, vol.21, No.10, p. 1621,
1995.
• [Michalakes, IPDPS, 2008]: J. Michalakes and M. Vachharajani. GPU
acceleration of numerical weather prediction. IPDPS 2008: IEEE Int’l Symp. on
Parallel and Distributed Processing, pages 1–7, 2008.
• [Wiley, SISSDAPDP, 1991]: R. L. Wiley. Parallel Processing and Numeric
Weather Prediction. Second International Specialist Seminar on the Design and
Application of Parallel Digital Processors, pages 15-19, 1991.
• [Lynch, JCP, 2008]: P. Lynch, The origins of computer weather prediction and
References
•
•
•
[Charney, J.Meteor, 1947]: J.G. Charney, The dynamics of long waves in a
baroclinic westerly current, J. Meteor. 4 pages 135–162., 1947.
[Lynch, CUP, 2006]: P. Lynch, The Emergence of Numerical Weather Prediction:
Richardson’s Dream, Cambridge University Press, Cambridge, 2006.
[Ritchie, Monthly Weather Review, 1995]: H. Ritchie, C. Temperton, A.
Simmons, et. al., Implementation of the semi-Lagrangian method in a high
resolution version of the ECMWF forecast model, Monthly Weather Review 123
(1995) 489-514.
Download